Hi Archit Thanks for reply.
How can I don the costom compilation so reduce it to 4 bytes.I want to make
it to 4 bytes in any case can you please guide?

I am applying flatMapvalue in each step after ZipWithIndex it should be in
same Node right? Why its suffling?
Also I am running with very less records currently still its shuffling ?

regards
jeetendra



On 17 April 2015 at 15:58, Archit Thakur <archit279tha...@gmail.com> wrote:

> I dont think you can change it to 4 bytes without any custom compilation.
> To make same key go to same node, you'll have to repartition the data,
> which is shuffling anyway. Unless your raw data is such that the same key
> is on same node, you'll have to shuffle atleast once to make same key on
> same node.
>
> On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <gangele...@gmail.com>
> wrote:
>
>> Hi All
>>
>> I have a RDD which has 1 million keys and each key is repeated from
>> around 7000 values so total there will be around 1M*7K records in RDD.
>>
>> and each key is created from ZipWithIndex so key start from 0 to M-1
>> the problem with ZipWithIndex is it take long for key which is 8 bytes.
>> can I reduce it to 4 bytes?
>>
>> Now how Can I make sure the record with same key will go the same node so
>> that I can avoid shuffling. Also how default partition-er will work here.
>>
>> Regards
>> jeetendra
>>
>>
>

Reply via email to