I found the potential reason.
In local mode, all tasks in one stage runs concurrently, while tasks in
yarn mode runs in sequence.
For example, in one stage, each task costs 3 mins. If in local mode, they
will run together, and cost 3 min in total. In yarn mode, only two executor
are assigned to process the task, since one executor can process one task
only, they need 6 min in total.
On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke <jornfra...@gmail.com> wrote:
> Probably network / shuffling cost? Or broadcast variables? Can you provide
> more details what you do and some timings?
> > On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote:
> > I have wrote an spark streaming application reading kafka data and
> convert the json data to parquet and save to hdfs.
> > What make me puzzled is, the processing time of app in yarn mode cost
> 20% to 50% more time than in local mode. My cluster have three nodes with
> three node managers, and all three hosts have same hardware, 40cores and
> 256GB memory. .
> > Why? How to solve it?
> > Regard,
> > Junfeng Chen