Yeah, I have increase the executor number and executor cores, and it runs normally now. The hdp spark 2 have only 2 executor and 1 executor cores by default.
Regard, Junfeng Chen On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao <[email protected]> wrote: > In yarn mode, only two executor are assigned to process the task, since >> one executor can process one task only, they need 6 min in total. >> > > This is not true. You should set --executor-cores/--num-executors to > increase the task parallelism for executor. To be fair, Spark application > should have same resources (cpu/memory) when comparing between local and > yarn mode. > > 2018-04-10 10:05 GMT+08:00 Junfeng Chen <[email protected]>: > >> I found the potential reason. >> >> In local mode, all tasks in one stage runs concurrently, while tasks in >> yarn mode runs in sequence. >> >> For example, in one stage, each task costs 3 mins. If in local mode, they >> will run together, and cost 3 min in total. In yarn mode, only two executor >> are assigned to process the task, since one executor can process one task >> only, they need 6 min in total. >> >> >> Regard, >> Junfeng Chen >> >> On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <[email protected]> wrote: >> >>> Probably network / shuffling cost? Or broadcast variables? Can you >>> provide more details what you do and some timings? >>> >>> > On 9. Apr 2018, at 07:07, Junfeng Chen <[email protected]> wrote: >>> > >>> > I have wrote an spark streaming application reading kafka data and >>> convert the json data to parquet and save to hdfs. >>> > What make me puzzled is, the processing time of app in yarn mode cost >>> 20% to 50% more time than in local mode. My cluster have three nodes with >>> three node managers, and all three hosts have same hardware, 40cores and >>> 256GB memory. . >>> > >>> > Why? How to solve it? >>> > >>> > Regard, >>> > Junfeng Chen >>> >> >> >
