Any official answer from the developers? Is the partition guaranteed to be generated on the preferred location?
Best, Wenlei On Thu, Oct 31, 2013 at 7:53 PM, dachuan <[email protected]> wrote: > I guess it could be solved by extending from existing RDD and override the > getPreferredLocations() definition. > > But I am not sure, I will wait for the answer. > > > On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie <[email protected]> wrote: > >> Hi, >> >> My iterative program written in Spark got quite various running time for >> each iterations, although the computation load is supposed to >> be roughly the same. My program logic would add a batch of tuples and >> delete roughly same number of tuples in each iteration. >> >> I suspect part of the reason is because the partitions are not allocated >> evenly between the machines. Is there any easy way to fix the output >> location for each partition? (say, each time I create a new RDD with 32 >> partitions when running on 4 machines, I would like to fix the first 8 >> partitions to the first machine, the second 8 partitions to the second >> machine, etc). I just want to verify whether my assumption is correct. :) >> >> Thank you! >> >> Best Regards, >> WEnlei >> > > > > -- > Dachuan Huang > Cellphone: 614-390-7234 > 2015 Neil Avenue > Ohio State University > Columbus, Ohio > U.S.A. > 43210 > -- Wenlei Xie (谢文磊) Department of Computer Science 5132 Upson Hall, Cornell University Ithaca, NY 14853, USA Phone: (607) 255-5577 Email: [email protected]
