I have a similar experience.
Using 32 machines, I can see than number of tasks (partitions) assigned to
executors (machines) is not even. Moreover, the distribution change every
stage (iteration).
I wonder why Spark needs to move partitions around any way, should not the
scheduler reduce network
Hi all,
I wonder if the optimizations mentioned in the GraphX paper (
https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf ) are
currently implemented. In particular, I am looking for mrTriplets
optimizations and memory-based shuffle.
--
Thanks,
-Khaled