subject:"spark application running in yarn client mode is slower than in local mode."

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-10 Thread Junfeng Chen

But I still have one question. I find the task number in stage is 3. So
where is this 3 from? How to increase the parallelism?


Regard,
Junfeng Chen

On Tue, Apr 10, 2018 at 11:31 AM, Junfeng Chen  wrote:

> Yeah, I have increase the executor number and executor cores, and it runs
> normally now.  The hdp spark 2 have only 2 executor and 1 executor cores by
> default.
>
>
> Regard,
> Junfeng Chen
>
> On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao 
> wrote:
>
>> In yarn mode, only two executor are assigned to process the task, since
>>> one executor can process one task only, they need 6 min in total.
>>>
>>
>> This is not true. You should set --executor-cores/--num-executors to
>> increase the task parallelism for executor. To be fair, Spark application
>> should have same resources (cpu/memory) when comparing between local and
>> yarn mode.
>>
>> 2018-04-10 10:05 GMT+08:00 Junfeng Chen :
>>
>>> I found the potential reason.
>>>
>>> In local mode, all tasks in one stage runs concurrently, while tasks in
>>> yarn mode runs in sequence.
>>>
>>> For example, in one stage, each task costs 3 mins. If in local mode,
>>> they will run together, and cost 3 min in total. In yarn mode, only two
>>> executor are assigned to process the task, since one executor can process
>>> one task only, they need 6 min in total.
>>>
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke 
>>> wrote:
>>>
 Probably network / shuffling cost? Or broadcast variables? Can you
 provide more details what you do and some timings?

 > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
 >
 > I have wrote an spark streaming application reading kafka data and
 convert the json data to parquet and save to hdfs.
 > What make me puzzled is, the processing time of app in yarn mode cost
 20% to 50% more time than in local mode. My cluster have three nodes with
 three node managers, and all three hosts have same hardware, 40cores and
 256GB memory. .
 >
 > Why? How to solve it?
 >
 > Regard,
 > Junfeng Chen

>>>
>>>
>>
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen

Yeah, I have increase the executor number and executor cores, and it runs
normally now.  The hdp spark 2 have only 2 executor and 1 executor cores by
default.


Regard,
Junfeng Chen

On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao 
wrote:

> In yarn mode, only two executor are assigned to process the task, since
>> one executor can process one task only, they need 6 min in total.
>>
>
> This is not true. You should set --executor-cores/--num-executors to
> increase the task parallelism for executor. To be fair, Spark application
> should have same resources (cpu/memory) when comparing between local and
> yarn mode.
>
> 2018-04-10 10:05 GMT+08:00 Junfeng Chen :
>
>> I found the potential reason.
>>
>> In local mode, all tasks in one stage runs concurrently, while tasks in
>> yarn mode runs in sequence.
>>
>> For example, in one stage, each task costs 3 mins. If in local mode, they
>> will run together, and cost 3 min in total. In yarn mode, only two executor
>> are assigned to process the task, since one executor can process one task
>> only, they need 6 min in total.
>>
>>
>> Regard,
>> Junfeng Chen
>>
>> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke  wrote:
>>
>>> Probably network / shuffling cost? Or broadcast variables? Can you
>>> provide more details what you do and some timings?
>>>
>>> > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
>>> >
>>> > I have wrote an spark streaming application reading kafka data and
>>> convert the json data to parquet and save to hdfs.
>>> > What make me puzzled is, the processing time of app in yarn mode cost
>>> 20% to 50% more time than in local mode. My cluster have three nodes with
>>> three node managers, and all three hosts have same hardware, 40cores and
>>> 256GB memory. .
>>> >
>>> > Why? How to solve it?
>>> >
>>> > Regard,
>>> > Junfeng Chen
>>>
>>
>>
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Saisai Shao

>
> In yarn mode, only two executor are assigned to process the task, since
> one executor can process one task only, they need 6 min in total.
>

This is not true. You should set --executor-cores/--num-executors to
increase the task parallelism for executor. To be fair, Spark application
should have same resources (cpu/memory) when comparing between local and
yarn mode.

2018-04-10 10:05 GMT+08:00 Junfeng Chen :

> I found the potential reason.
>
> In local mode, all tasks in one stage runs concurrently, while tasks in
> yarn mode runs in sequence.
>
> For example, in one stage, each task costs 3 mins. If in local mode, they
> will run together, and cost 3 min in total. In yarn mode, only two executor
> are assigned to process the task, since one executor can process one task
> only, they need 6 min in total.
>
>
> Regard,
> Junfeng Chen
>
> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke  wrote:
>
>> Probably network / shuffling cost? Or broadcast variables? Can you
>> provide more details what you do and some timings?
>>
>> > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
>> >
>> > I have wrote an spark streaming application reading kafka data and
>> convert the json data to parquet and save to hdfs.
>> > What make me puzzled is, the processing time of app in yarn mode cost
>> 20% to 50% more time than in local mode. My cluster have three nodes with
>> three node managers, and all three hosts have same hardware, 40cores and
>> 256GB memory. .
>> >
>> > Why? How to solve it?
>> >
>> > Regard,
>> > Junfeng Chen
>>
>
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen

I found the potential reason.

In local mode, all tasks in one stage runs concurrently, while tasks in
yarn mode runs in sequence.

For example, in one stage, each task costs 3 mins. If in local mode, they
will run together, and cost 3 min in total. In yarn mode, only two executor
are assigned to process the task, since one executor can process one task
only, they need 6 min in total.

Regard,
Junfeng Chen

On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke  wrote:

> Probably network / shuffling cost? Or broadcast variables? Can you provide
> more details what you do and some timings?
>
> > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
> >
> > I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> > What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> >
> > Why? How to solve it?
> >
> > Regard,
> > Junfeng Chen
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen

Hi Jorn,

I checked the log info of my application:
The ResultStage3 (parquet writing) cost a very long time,nearly 300s, where
the total processing time of this loop is 6 mins.


Regard,
Junfeng Chen

On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke  wrote:

> Probably network / shuffling cost? Or broadcast variables? Can you provide
> more details what you do and some timings?
>
> > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
> >
> > I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> > What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> >
> > Why? How to solve it?
> >
> > Regard,
> > Junfeng Chen
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen

hi,

My kafka topic has three partitions.  The time cost I mentioned means ,
each streaming loop cost more time with yarn client mode. For example yarn
mode cost 300 seconds to process some data, and local mode just cost 200
seconds  to process similar amount of data.


Regard,
Junfeng Chen

On Mon, Apr 9, 2018 at 2:20 PM, Gopala Krishna Manchukonda <
gopala_krishna_manchuko...@apple.com> wrote:

> Hi Junfeng ,
>
> Is your kafka topic partitioned?
>
> Are you referring to the duration or the CPU time spent by the job as
> being 20% - 50% higher than running in local?
>
> Thanks & Regards
> Gopal
>
>
> > On 09-Apr-2018, at 11:42 AM, Jörn Franke  wrote:
> >
> > Probably network / shuffling cost? Or broadcast variables? Can you
> provide more details what you do and some timings?
> >
> >> On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
> >>
> >> I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> >> What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> >>
> >> Why? How to solve it?
> >>
> >> Regard,
> >> Junfeng Chen
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen

I read json string value from kafka, then transform them to df:

Dataset df = spark.read().json(stringjavaRDD);


Then add some new data to each row:

> JavaRDD rowJavaRDD = df.javaRDD().map(...)
> StructType type = df.schema().add()
> Dataset newdf = spark.createDataFrame(rowJavaRDD,type);


...

At last write the dataset to parquet file

newdf.write().mode(SaveMode.Append).partitionedBy("stream","appname","year","month","day","hour").parquet(savePath);


How to determine if it is caused by shuffle or broadcast？


Regard,
Junfeng Chen

On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke  wrote:

> Probably network / shuffling cost? Or broadcast variables? Can you provide
> more details what you do and some timings?
>
> > On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
> >
> > I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> > What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> >
> > Why? How to solve it?
> >
> > Regard,
> > Junfeng Chen
>

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Gopala Krishna Manchukonda

Hi Junfeng ,

Is your kafka topic partitioned? 

Are you referring to the duration or the CPU time spent by the job as being 20% 
- 50% higher than running in local? 

Thanks & Regards
Gopal 


> On 09-Apr-2018, at 11:42 AM, Jörn Franke  wrote:
> 
> Probably network / shuffling cost? Or broadcast variables? Can you provide 
> more details what you do and some timings?
> 
>> On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
>> 
>> I have wrote an spark streaming application reading kafka data and convert 
>> the json data to parquet and save to hdfs. 
>> What make me puzzled is, the processing time of app in yarn mode cost 20% to 
>> 50% more time than in local mode. My cluster have three nodes with three 
>> node managers, and all three hosts have same hardware, 40cores and 256GB 
>> memory. .
>> 
>> Why? How to solve it? 
>> 
>> Regard,
>> Junfeng Chen
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Jörn Franke

Probably network / shuffling cost? Or broadcast variables? Can you provide more 
details what you do and some timings?

> On 9. Apr 2018, at 07:07, Junfeng Chen  wrote:
> 
> I have wrote an spark streaming application reading kafka data and convert 
> the json data to parquet and save to hdfs. 
> What make me puzzled is, the processing time of app in yarn mode cost 20% to 
> 50% more time than in local mode. My cluster have three nodes with three node 
> managers, and all three hosts have same hardware, 40cores and 256GB memory. .
> 
> Why? How to solve it? 
> 
> Regard,
> Junfeng Chen

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

spark application running in yarn client mode is slower than in local mode.

2018-04-08 Thread Junfeng Chen

I have wrote an spark streaming application reading kafka data and convert
the json data to parquet and save to hdfs.
What make me puzzled is, the processing time of app in yarn mode cost 20%
to 50% more time than in local mode. My cluster have three nodes with three
node managers, and all three hosts have same hardware, 40cores and 256GB
memory. .

Why? How to solve it?

Regard,
Junfeng Chen

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

Re: spark application running in yarn client mode is slower than in local mode.

spark application running in yarn client mode is slower than in local mode.

10 matches

Site Navigation

Mail list logo

Footer information