Is there any way to set the output location for each partition for the RDD?

Wenlei Xie Thu, 31 Oct 2013 19:46:35 -0700

Hi,

My iterative program written in Spark got quite various running time for
each iterations, although the computation load is supposed to
be roughly the same. My program logic would add a batch of tuples and
delete roughly same number of tuples in each iteration.


I suspect part of the reason is because the partitions are not allocated
evenly between the machines. Is there any easy way to fix the output
location for each partition? (say, each time I create a new RDD with 32
partitions when running on 4 machines, I would like to fix the first 8
partitions to the first machine, the second 8 partitions to the second
machine, etc). I just want to verify whether my assumption is correct. :)

Thank you!

Best Regards,
WEnlei

Is there any way to set the output location for each partition for the RDD?

Reply via email to