from:"Khaled Ammar"

Re: RDD Partitions not distributed evenly to executors

2016-04-05 Thread Khaled Ammar

I have a similar experience. Using 32 machines, I can see than number of tasks (partitions) assigned to executors (machines) is not even. Moreover, the distribution change every stage (iteration). I wonder why Spark needs to move partitions around any way, should not the scheduler reduce network

GraphX optimizations

2016-03-04 Thread Khaled Ammar

Hi all, I wonder if the optimizations mentioned in the GraphX paper ( https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf ) are currently implemented. In particular, I am looking for mrTriplets optimizations and memory-based shuffle. -- Thanks, -Khaled