> > In yarn mode, only two executor are assigned to process the task, since > one executor can process one task only, they need 6 min in total. >
This is not true. You should set --executor-cores/--num-executors to increase the task parallelism for executor. To be fair, Spark application should have same resources (cpu/memory) when comparing between local and yarn mode. 2018-04-10 10:05 GMT+08:00 Junfeng Chen <[email protected]>: > I found the potential reason. > > In local mode, all tasks in one stage runs concurrently, while tasks in > yarn mode runs in sequence. > > For example, in one stage, each task costs 3 mins. If in local mode, they > will run together, and cost 3 min in total. In yarn mode, only two executor > are assigned to process the task, since one executor can process one task > only, they need 6 min in total. > > > Regard, > Junfeng Chen > > On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <[email protected]> wrote: > >> Probably network / shuffling cost? Or broadcast variables? Can you >> provide more details what you do and some timings? >> >> > On 9. Apr 2018, at 07:07, Junfeng Chen <[email protected]> wrote: >> > >> > I have wrote an spark streaming application reading kafka data and >> convert the json data to parquet and save to hdfs. >> > What make me puzzled is, the processing time of app in yarn mode cost >> 20% to 50% more time than in local mode. My cluster have three nodes with >> three node managers, and all three hosts have same hardware, 40cores and >> 256GB memory. . >> > >> > Why? How to solve it? >> > >> > Regard, >> > Junfeng Chen >> > >
