Re: Is there any way to set the output location for each partition for the RDD?

Wenlei Xie Mon, 04 Nov 2013 00:49:29 -0800

Thank you for this suggestion! :)

On Thu, Oct 31, 2013 at 7:53 PM, dachuan <[email protected]> wrote:


> I guess it could be solved by extending from existing RDD and override the
> getPreferredLocations() definition.
>
> But I am not sure, I will wait for the answer.
>
>
> On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <[email protected]> wrote:
>
>> Hi,
>>
>> My iterative program written in Spark got quite various running time for
>> each iterations, although the computation load is supposed to
>> be roughly the same. My program logic would add a batch of tuples and
>> delete roughly same number of tuples in each iteration.
>>
>> I suspect part of the reason is because the partitions are not allocated
>> evenly between the machines. Is there any easy way to fix the output
>> location for each partition? (say, each time I create a new RDD with 32
>> partitions when running on 4 machines, I would like to fix the first 8
>> partitions to the first machine, the second 8 partitions to the second
>> machine, etc). I just want to verify whether my assumption is correct. :)
>>
>> Thank you!
>>
>> Best Regards,
>> WEnlei
>>
>
>
>
> --
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
>

Re: Is there any way to set the output location for each partition for the RDD?

Reply via email to