But I still have one question. I find the task number in stage is 3. So
where is this 3 from? How to increase the parallelism?


Regard,
Junfeng Chen

On Tue, Apr 10, 2018 at 11:31 AM, Junfeng Chen <darou...@gmail.com> wrote:

> Yeah, I have increase the executor number and executor cores, and it runs
> normally now.  The hdp spark 2 have only 2 executor and 1 executor cores by
> default.
>
>
> Regard,
> Junfeng Chen
>
> On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> In yarn mode, only two executor are assigned to process the task, since
>>> one executor can process one task only, they need 6 min in total.
>>>
>>
>> This is not true. You should set --executor-cores/--num-executors to
>> increase the task parallelism for executor. To be fair, Spark application
>> should have same resources (cpu/memory) when comparing between local and
>> yarn mode.
>>
>> 2018-04-10 10:05 GMT+08:00 Junfeng Chen <darou...@gmail.com>:
>>
>>> I found the potential reason.
>>>
>>> In local mode, all tasks in one stage runs concurrently, while tasks in
>>> yarn mode runs in sequence.
>>>
>>> For example, in one stage, each task costs 3 mins. If in local mode,
>>> they will run together, and cost 3 min in total. In yarn mode, only two
>>> executor are assigned to process the task, since one executor can process
>>> one task only, they need 6 min in total.
>>>
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> Probably network / shuffling cost? Or broadcast variables? Can you
>>>> provide more details what you do and some timings?
>>>>
>>>> > On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote:
>>>> >
>>>> > I have wrote an spark streaming application reading kafka data and
>>>> convert the json data to parquet and save to hdfs.
>>>> > What make me puzzled is, the processing time of app in yarn mode cost
>>>> 20% to 50% more time than in local mode. My cluster have three nodes with
>>>> three node managers, and all three hosts have same hardware, 40cores and
>>>> 256GB memory. .
>>>> >
>>>> > Why? How to solve it?
>>>> >
>>>> > Regard,
>>>> > Junfeng Chen
>>>>
>>>
>>>
>>
>

Reply via email to