hi,

My kafka topic has three partitions.  The time cost I mentioned means ,
each streaming loop cost more time with yarn client mode. For example yarn
mode cost 300 seconds to process some data, and local mode just cost 200
seconds  to process similar amount of data.


Regard,
Junfeng Chen

On Mon, Apr 9, 2018 at 2:20 PM, Gopala Krishna Manchukonda <
gopala_krishna_manchuko...@apple.com> wrote:

> Hi Junfeng ,
>
> Is your kafka topic partitioned?
>
> Are you referring to the duration or the CPU time spent by the job as
> being 20% - 50% higher than running in local?
>
> Thanks & Regards
> Gopal
>
>
> > On 09-Apr-2018, at 11:42 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> >
> > Probably network / shuffling cost? Or broadcast variables? Can you
> provide more details what you do and some timings?
> >
> >> On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote:
> >>
> >> I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> >> What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> >>
> >> Why? How to solve it?
> >>
> >> Regard,
> >> Junfeng Chen
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>

Reply via email to