Yes you can. Use partitionby method and pass partitioner to it.
On Apr 17, 2015 8:18 PM, "Jeetendra Gangele" <gangele...@gmail.com> wrote:

> Ok is there a way, I can use  hash Partitioning so that I can improve the
> performance?
>
>
> On 17 April 2015 at 19:33, Archit Thakur <archit279tha...@gmail.com>
> wrote:
>
>> By custom installation, I meant change the code and build it. I have not
>> done the complete impact analysis, just had a look on the code.
>>
>> When you say, same key goes to same node, It would need shuffling unless
>> the raw data you are reading is present that way.
>> On Apr 17, 2015 6:30 PM, "Jeetendra Gangele" <gangele...@gmail.com>
>> wrote:
>>
>>> Hi Archit Thanks for reply.
>>> How can I don the costom compilation so reduce it to 4 bytes.I want to
>>> make it to 4 bytes in any case can you please guide?
>>>
>>> I am applying flatMapvalue in each step after ZipWithIndex it should be
>>> in same Node right? Why its suffling?
>>> Also I am running with very less records currently still its shuffling ?
>>>
>>> regards
>>> jeetendra
>>>
>>>
>>>
>>> On 17 April 2015 at 15:58, Archit Thakur <archit279tha...@gmail.com>
>>> wrote:
>>>
>>>> I dont think you can change it to 4 bytes without any custom
>>>> compilation.
>>>> To make same key go to same node, you'll have to repartition the data,
>>>> which is shuffling anyway. Unless your raw data is such that the same key
>>>> is on same node, you'll have to shuffle atleast once to make same key on
>>>> same node.
>>>>
>>>> On Thu, Apr 16, 2015 at 10:16 PM, Jeetendra Gangele <
>>>> gangele...@gmail.com> wrote:
>>>>
>>>>> Hi All
>>>>>
>>>>> I have a RDD which has 1 million keys and each key is repeated from
>>>>> around 7000 values so total there will be around 1M*7K records in RDD.
>>>>>
>>>>> and each key is created from ZipWithIndex so key start from 0 to M-1
>>>>> the problem with ZipWithIndex is it take long for key which is 8
>>>>> bytes. can I reduce it to 4 bytes?
>>>>>
>>>>> Now how Can I make sure the record with same key will go the same node
>>>>> so that I can avoid shuffling. Also how default partition-er will work 
>>>>> here.
>>>>>
>>>>> Regards
>>>>> jeetendra
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>
>
>
>

Reply via email to