Sorry for spam.

To complete the my previous post:

The map action sometimes creates 4 tasks which are all executed by the same
executor.

I believe that if a task dispatch like:
executor_0 : 1 task;
executor_1 : 1 task;
executor_2 : 2 task;
it will give a better performance.

Can we force this kind of schedule in Spark ?

Thank you.



2013/12/2 Hao REN <[email protected]>

> Hi,
>
> When running some tests on EC2 with spark, I notice that: the tasks are
> not fairly distributed to executor.
>
> For example, a map action produces 4 tasks, but they all go to the
>
>
> Executors (3)
>
>    - *Memory:* 0.0 B Used (19.0 GB Total)
>    - *Disk:* 0.0 B Used
>
>  Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed
> tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B 
> / 6.3 GB0.0 B40041
> ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 
> 2ip-10-62-35-223.ec2.internal:4050000.0 B / 6.3 GB0.0 B0000
>
>
>
>
>
>
>


-- 
REN Hao

Data Engineer @ ClaraVista

Paris, France

Tel:  +33 06 14 54 57 24

Reply via email to