Hi Jorn, I checked the log info of my application: The ResultStage3 (parquet writing) cost a very long time,nearly 300s, where the total processing time of this loop is 6 mins.
Regard, Junfeng Chen On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Probably network / shuffling cost? Or broadcast variables? Can you provide > more details what you do and some timings? > > > On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote: > > > > I have wrote an spark streaming application reading kafka data and > convert the json data to parquet and save to hdfs. > > What make me puzzled is, the processing time of app in yarn mode cost > 20% to 50% more time than in local mode. My cluster have three nodes with > three node managers, and all three hosts have same hardware, 40cores and > 256GB memory. . > > > > Why? How to solve it? > > > > Regard, > > Junfeng Chen >