Hi Archit Thanks for reply. How can I don the costom compilation so reduce it to 4 bytes.I want to make it to 4 bytes in any case can you please guide?
I am applying flatMapvalue in each step after ZipWithIndex it should be in same Node right? Why its suffling? Also I am running with very less records currently still its shuffling ? regards jeetendra On 17 April 2015 at 15:58, Archit Thakur <archit279tha...@gmail.com> wrote: > I dont think you can change it to 4 bytes without any custom compilation. > To make same key go to same node, you'll have to repartition the data, > which is shuffling anyway. Unless your raw data is such that the same key > is on same node, you'll have to shuffle atleast once to make same key on > same node. > > On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <gangele...@gmail.com> > wrote: > >> Hi All >> >> I have a RDD which has 1 million keys and each key is repeated from >> around 7000 values so total there will be around 1M*7K records in RDD. >> >> and each key is created from ZipWithIndex so key start from 0 to M-1 >> the problem with ZipWithIndex is it take long for key which is 8 bytes. >> can I reduce it to 4 bytes? >> >> Now how Can I make sure the record with same key will go the same node so >> that I can avoid shuffling. Also how default partition-er will work here. >> >> Regards >> jeetendra >> >> >