Hi Junfeng , Is your kafka topic partitioned?
Are you referring to the duration or the CPU time spent by the job as being 20% - 50% higher than running in local? Thanks & Regards Gopal > On 09-Apr-2018, at 11:42 AM, Jörn Franke <jornfra...@gmail.com> wrote: > > Probably network / shuffling cost? Or broadcast variables? Can you provide > more details what you do and some timings? > >> On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote: >> >> I have wrote an spark streaming application reading kafka data and convert >> the json data to parquet and save to hdfs. >> What make me puzzled is, the processing time of app in yarn mode cost 20% to >> 50% more time than in local mode. My cluster have three nodes with three >> node managers, and all three hosts have same hardware, 40cores and 256GB >> memory. . >> >> Why? How to solve it? >> >> Regard, >> Junfeng Chen > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org