Re: How to balance task load

Andrew Ash Thu, 05 Dec 2013 01:56:06 -0800

Hi Hao,

Where tasks go is influenced by where the data they operate on resides.  If
the data is on one executor, it may make more sense to do all the
computation on that node rather than ship data across the network.  How was
the data distributed across your cluster?


Andrew


On Mon, Dec 2, 2013 at 7:52 AM, Hao REN <[email protected]> wrote:

> Sorry for spam.
>
> To complete the my previous post:
>
> The map action sometimes creates 4 tasks which are all executed by the
> same executor.
>
> I believe that if a task dispatch like:
> executor_0 : 1 task;
> executor_1 : 1 task;
> executor_2 : 2 task;
> it will give a better performance.
>
> Can we force this kind of schedule in Spark ?
>
> Thank you.
>
>
>
> 2013/12/2 Hao REN <[email protected]>
>
>> Hi,
>>
>> When running some tests on EC2 with spark, I notice that: the tasks are
>> not fairly distributed to executor.
>>
>> For example, a map action produces 4 tasks, but they all go to the
>>
>>
>> Executors (3)
>>
>>    - *Memory:* 0.0 B Used (19.0 GB Total)
>>    - *Disk:* 0.0 B Used
>>
>>  Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed
>> tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B 
>> / 6.3 GB0.0 B40041
>> ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 
>> 2ip-10-62-35-223.ec2.internal:4050000.0 B / 6.3 GB0.0 B0000
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> REN Hao
>
> Data Engineer @ ClaraVista
>
> Paris, France
>
> Tel:  +33 06 14 54 57 24
>

Re: How to balance task load

Reply via email to