Re: Is there any way to set the output location for each partition for the RDD?

dachuan Thu, 31 Oct 2013 19:54:27 -0700

I guess it could be solved by extending from existing RDD and override the
getPreferredLocations() definition.


But I am not sure, I will wait for the answer.


On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <[email protected]> wrote:

> Hi,
>
> My iterative program written in Spark got quite various running time for
> each iterations, although the computation load is supposed to
> be roughly the same. My program logic would add a batch of tuples and
> delete roughly same number of tuples in each iteration.
>
> I suspect part of the reason is because the partitions are not allocated
> evenly between the machines. Is there any easy way to fix the output
> location for each partition? (say, each time I create a new RDD with 32
> partitions when running on 4 machines, I would like to fix the first 8
> partitions to the first machine, the second 8 partitions to the second
> machine, etc). I just want to verify whether my assumption is correct. :)
>
> Thank you!
>
> Best Regards,
> WEnlei
>



-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

Re: Is there any way to set the output location for each partition for the RDD?

Reply via email to