>
> In yarn mode, only two executor are assigned to process the task, since
> one executor can process one task only, they need 6 min in total.
>

This is not true. You should set --executor-cores/--num-executors to
increase the task parallelism for executor. To be fair, Spark application
should have same resources (cpu/memory) when comparing between local and
yarn mode.

2018-04-10 10:05 GMT+08:00 Junfeng Chen <darou...@gmail.com>:

> I found the potential reason.
>
> In local mode, all tasks in one stage runs concurrently, while tasks in
> yarn mode runs in sequence.
>
> For example, in one stage, each task costs 3 mins. If in local mode, they
> will run together, and cost 3 min in total. In yarn mode, only two executor
> are assigned to process the task, since one executor can process one task
> only, they need 6 min in total.
>
>
> Regard,
> Junfeng Chen
>
> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Probably network / shuffling cost? Or broadcast variables? Can you
>> provide more details what you do and some timings?
>>
>> > On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote:
>> >
>> > I have wrote an spark streaming application reading kafka data and
>> convert the json data to parquet and save to hdfs.
>> > What make me puzzled is, the processing time of app in yarn mode cost
>> 20% to 50% more time than in local mode. My cluster have three nodes with
>> three node managers, and all three hosts have same hardware, 40cores and
>> 256GB memory. .
>> >
>> > Why? How to solve it?
>> >
>> > Regard,
>> > Junfeng Chen
>>
>
>

Reply via email to