Sorry for spam. To complete the my previous post:
The map action sometimes creates 4 tasks which are all executed by the same executor. I believe that if a task dispatch like: executor_0 : 1 task; executor_1 : 1 task; executor_2 : 2 task; it will give a better performance. Can we force this kind of schedule in Spark ? Thank you. 2013/12/2 Hao REN <[email protected]> > Hi, > > When running some tests on EC2 with spark, I notice that: the tasks are > not fairly distributed to executor. > > For example, a map action produces 4 tasks, but they all go to the > > > Executors (3) > > - *Memory:* 0.0 B Used (19.0 GB Total) > - *Disk:* 0.0 B Used > > Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed > tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B > / 6.3 GB0.0 B40041 > ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 > 2ip-10-62-35-223.ec2.internal:4050000.0 B / 6.3 GB0.0 B0000 > > > > > > > -- REN Hao Data Engineer @ ClaraVista Paris, France Tel: +33 06 14 54 57 24
