Re: Is there any way to set the output location for each partition for the RDD?

Wenlei Xie Mon, 04 Nov 2013 00:47:29 -0800

Any official answer from the developers? Is the partition guaranteed to be
generated on the preferred location?


Best,
Wenlei


On Thu, Oct 31, 2013 at 7:53 PM, dachuan <[email protected]> wrote:

> I guess it could be solved by extending from existing RDD and override the
> getPreferredLocations() definition.
>
> But I am not sure, I will wait for the answer.
>
>
> On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <[email protected]> wrote:
>
>> Hi,
>>
>> My iterative program written in Spark got quite various running time for
>> each iterations, although the computation load is supposed to
>> be roughly the same. My program logic would add a batch of tuples and
>> delete roughly same number of tuples in each iteration.
>>
>> I suspect part of the reason is because the partitions are not allocated
>> evenly between the machines. Is there any easy way to fix the output
>> location for each partition? (say, each time I create a new RDD with 32
>> partitions when running on 4 machines, I would like to fix the first 8
>> partitions to the first machine, the second 8 partitions to the second
>> machine, etc). I just want to verify whether my assumption is correct. :)
>>
>> Thank you!
>>
>> Best Regards,
>> WEnlei
>>
>
>
>
> --
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
>



-- 
Wenlei Xie (谢文磊)

Department of Computer Science
5132 Upson Hall, Cornell University
Ithaca, NY 14853, USA
Phone: (607) 255-5577
Email: [email protected]

Re: Is there any way to set the output location for each partition for the RDD?

Reply via email to