hi, My kafka topic has three partitions. The time cost I mentioned means , each streaming loop cost more time with yarn client mode. For example yarn mode cost 300 seconds to process some data, and local mode just cost 200 seconds to process similar amount of data.
Regard, Junfeng Chen On Mon, Apr 9, 2018 at 2:20 PM, Gopala Krishna Manchukonda < [email protected]> wrote: > Hi Junfeng , > > Is your kafka topic partitioned? > > Are you referring to the duration or the CPU time spent by the job as > being 20% - 50% higher than running in local? > > Thanks & Regards > Gopal > > > > On 09-Apr-2018, at 11:42 AM, Jörn Franke <[email protected]> wrote: > > > > Probably network / shuffling cost? Or broadcast variables? Can you > provide more details what you do and some timings? > > > >> On 9. Apr 2018, at 07:07, Junfeng Chen <[email protected]> wrote: > >> > >> I have wrote an spark streaming application reading kafka data and > convert the json data to parquet and save to hdfs. > >> What make me puzzled is, the processing time of app in yarn mode cost > 20% to 50% more time than in local mode. My cluster have three nodes with > three node managers, and all three hosts have same hardware, 40cores and > 256GB memory. . > >> > >> Why? How to solve it? > >> > >> Regard, > >> Junfeng Chen > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: [email protected] > > > >
