Yeah, I have increase the executor number and executor cores, and it runs
normally now.  The hdp spark 2 have only 2 executor and 1 executor cores by
default.


Regard,
Junfeng Chen

On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao <[email protected]>
wrote:

> In yarn mode, only two executor are assigned to process the task, since
>> one executor can process one task only, they need 6 min in total.
>>
>
> This is not true. You should set --executor-cores/--num-executors to
> increase the task parallelism for executor. To be fair, Spark application
> should have same resources (cpu/memory) when comparing between local and
> yarn mode.
>
> 2018-04-10 10:05 GMT+08:00 Junfeng Chen <[email protected]>:
>
>> I found the potential reason.
>>
>> In local mode, all tasks in one stage runs concurrently, while tasks in
>> yarn mode runs in sequence.
>>
>> For example, in one stage, each task costs 3 mins. If in local mode, they
>> will run together, and cost 3 min in total. In yarn mode, only two executor
>> are assigned to process the task, since one executor can process one task
>> only, they need 6 min in total.
>>
>>
>> Regard,
>> Junfeng Chen
>>
>> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <[email protected]> wrote:
>>
>>> Probably network / shuffling cost? Or broadcast variables? Can you
>>> provide more details what you do and some timings?
>>>
>>> > On 9. Apr 2018, at 07:07, Junfeng Chen <[email protected]> wrote:
>>> >
>>> > I have wrote an spark streaming application reading kafka data and
>>> convert the json data to parquet and save to hdfs.
>>> > What make me puzzled is, the processing time of app in yarn mode cost
>>> 20% to 50% more time than in local mode. My cluster have three nodes with
>>> three node managers, and all three hosts have same hardware, 40cores and
>>> 256GB memory. .
>>> >
>>> > Why? How to solve it?
>>> >
>>> > Regard,
>>> > Junfeng Chen
>>>
>>
>>
>

Reply via email to