Hi Hao, Where tasks go is influenced by where the data they operate on resides. If the data is on one executor, it may make more sense to do all the computation on that node rather than ship data across the network. How was the data distributed across your cluster?
Andrew On Mon, Dec 2, 2013 at 7:52 AM, Hao REN <[email protected]> wrote: > Sorry for spam. > > To complete the my previous post: > > The map action sometimes creates 4 tasks which are all executed by the > same executor. > > I believe that if a task dispatch like: > executor_0 : 1 task; > executor_1 : 1 task; > executor_2 : 2 task; > it will give a better performance. > > Can we force this kind of schedule in Spark ? > > Thank you. > > > > 2013/12/2 Hao REN <[email protected]> > >> Hi, >> >> When running some tests on EC2 with spark, I notice that: the tasks are >> not fairly distributed to executor. >> >> For example, a map action produces 4 tasks, but they all go to the >> >> >> Executors (3) >> >> - *Memory:* 0.0 B Used (19.0 GB Total) >> - *Disk:* 0.0 B Used >> >> Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed >> tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B >> / 6.3 GB0.0 B40041 >> ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 >> 2ip-10-62-35-223.ec2.internal:4050000.0 B / 6.3 GB0.0 B0000 >> >> >> >> >> >> >> > > > -- > REN Hao > > Data Engineer @ ClaraVista > > Paris, France > > Tel: +33 06 14 54 57 24 >
