Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
 


On Monday, December 7, 2015 10:37 AM, DB Tsai  wrote:
 

 Only beginning and ending part of data. The rest in the partition can
be compared without shuffle.


Would you help write a few  pseudo-code about it...It seems that there is not 
shuffle related  API , or repartition ?
Thanks a lot in advance!


Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu  wrote:
>
>
>
>
> On Saturday, December 5, 2015 3:00 PM, DB Tsai  wrote:
>
>
> This is tricky. You need to shuffle the ending and beginning elements
> using mapPartitionWithIndex.
>
>
> Does this mean that I need to shuffle the all elements in different
> partitions into one partition, then compare them by way of any two adjacent
> elements?
> It seems good, if it is like that.
>
> One more issue, will it loss parallelism since there become only one
> partition ...
>
> Thanks very much in advance!
>
>
>
>
>
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
>> Hi All,
>>
>> I would like to compare any two adjacent elements in one given rdd, just
>> as
>> the single machine code part:
>>
>> int a[N] = {...};
>> for (int i=0; i < N - 1; ++i) {
>>    compareFun(a[i], a[i+1]);
>> }
>> ...
>>
>> mapPartitions may work for some situations, however, it could not compare
>> elements in different  partitions.
>> foreach also seems not work.
>>
>> Thanks,
>> Zhiliang
>
>>
>>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread DB Tsai
Only beginning and ending part of data. The rest in the partition can
be compared without shuffle.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu  wrote:
>
>
>
>
> On Saturday, December 5, 2015 3:00 PM, DB Tsai  wrote:
>
>
> This is tricky. You need to shuffle the ending and beginning elements
> using mapPartitionWithIndex.
>
>
> Does this mean that I need to shuffle the all elements in different
> partitions into one partition, then compare them by way of any two adjacent
> elements?
> It seems good, if it is like that.
>
> One more issue, will it loss parallelism since there become only one
> partition ...
>
> Thanks very much in advance!
>
>
>
>
>
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
>> Hi All,
>>
>> I would like to compare any two adjacent elements in one given rdd, just
>> as
>> the single machine code part:
>>
>> int a[N] = {...};
>> for (int i=0; i < N - 1; ++i) {
>>compareFun(a[i], a[i+1]);
>> }
>> ...
>>
>> mapPartitions may work for some situations, however, it could not compare
>> elements in different  partitions.
>> foreach also seems not work.
>>
>> Thanks,
>> Zhiliang
>
>>
>>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu

 


On Saturday, December 5, 2015 3:00 PM, DB Tsai  wrote:
 

 This is tricky. You need to shuffle the ending and beginning elements
using mapPartitionWithIndex.

Does this mean that I need to shuffle the all elements in different partitions 
into one partition, then compare them by way of any two adjacent elements?It 
seems good, if it is like that.
One more issue, will it loss parallelism since there become only one partition 
...
Thanks very much in advance!





Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
>    compareFun(a[i], a[i+1]);
> }
> ...
>
> mapPartitions may work for some situations, however, it could not compare
> elements in different  partitions.
> foreach also seems not work.
>
> Thanks,
> Zhiliang
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Re: the way to compare any two adjacent elements in one rdd

2015-12-05 Thread Zhiliang Zhu

For this, mapWithPartitionsWithIndex would also properly work for filter.
Here is the code copied for stack-overflow, which is used to remove the first 
line of a csv file:
JavaRDD rawInputRdd = sparkContext.textFile(dataFile);

Function2 removeHeader= new Function2, 
Iterator>() {
@Override
public Iterator call(Integer index, Iterator iterator) 
throws Exception {
if(index == 0 && iterator.hasNext()) { //for my usage, 
iterator.next();   //compare any two adjacent elements, 
or do filter,
return iterator; //then index parameter is useless here, 
just is OK to view iterator as from one logical iterator/partition
 // is it
} else
return iterator;
}
};
JavaRDD inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, 
false);On Saturday, December 5, 2015 3:52 PM, Zhiliang Zhu 
 wrote:
 

 Hi DB Tsai,
Thanks very much for your kind reply!
Sorry that for one more issue, as tested it seems that filter could only return 
JavaRDD but not any JavaRDD , is it ?Then it is not much convenient 
to do general filter for RDD, mapPartitions could work some, but if some 
partition will left and return none element after filter by mapPartitions, some 
problemwill be there. 
Best Wishes!Zhiliang
 


On Saturday, December 5, 2015 3:00 PM, DB Tsai  wrote:
 

 This is tricky. You need to shuffle the ending and beginning elements
using mapPartitionWithIndex.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
>    compareFun(a[i], a[i+1]);
> }
> ...
>
> mapPartitions may work for some situations, however, it could not compare
> elements in different  partitions.
> foreach also seems not work.
>
> Thanks,
> Zhiliang
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



   

  

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi DB Tsai,
Thanks very much for your kind reply!
Sorry that for one more issue, as tested it seems that filter could only return 
JavaRDD but not any JavaRDD , is it ?Then it is not much convenient 
to do general filter for RDD, mapPartitions could work some, but if some 
partition will left and return none element after filter by mapPartitions, some 
problemwill be there. 
Best Wishes!Zhiliang
 


On Saturday, December 5, 2015 3:00 PM, DB Tsai  wrote:
 

 This is tricky. You need to shuffle the ending and beginning elements
using mapPartitionWithIndex.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
>    compareFun(a[i], a[i+1]);
> }
> ...
>
> mapPartitions may work for some situations, however, it could not compare
> elements in different  partitions.
> foreach also seems not work.
>
> Thanks,
> Zhiliang
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread DB Tsai
This is tricky. You need to shuffle the ending and beginning elements
using mapPartitionWithIndex.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu  wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
>compareFun(a[i], a[i+1]);
> }
> ...
>
> mapPartitions may work for some situations, however, it could not compare
> elements in different  partitions.
> foreach also seems not work.
>
> Thanks,
> Zhiliang
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi All,
I would like to compare any two adjacent elements in one given rdd, just as the 
single machine code part:
int a[N] = {...};for (int i=0; i < N - 1; ++i) {   compareFun(a[i], a[i+1]);}...
mapPartitions may work for some situations, however, it could not compare 
elements in different  partitions. foreach also seems not work.
Thanks,Zhiliang