Thank you for this suggestion! :) On Thu, Oct 31, 2013 at 7:53 PM, dachuan <[email protected]> wrote:
> I guess it could be solved by extending from existing RDD and override the > getPreferredLocations() definition. > > But I am not sure, I will wait for the answer. > > > On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <[email protected]> wrote: > >> Hi, >> >> My iterative program written in Spark got quite various running time for >> each iterations, although the computation load is supposed to >> be roughly the same. My program logic would add a batch of tuples and >> delete roughly same number of tuples in each iteration. >> >> I suspect part of the reason is because the partitions are not allocated >> evenly between the machines. Is there any easy way to fix the output >> location for each partition? (say, each time I create a new RDD with 32 >> partitions when running on 4 machines, I would like to fix the first 8 >> partitions to the first machine, the second 8 partitions to the second >> machine, etc). I just want to verify whether my assumption is correct. :) >> >> Thank you! >> >> Best Regards, >> WEnlei >> > > > > -- > Dachuan Huang > Cellphone: 614-390-7234 > 2015 Neil Avenue > Ohio State University > Columbus, Ohio > U.S.A. > 43210 >
