Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-08 Thread Sandy Ryza
gt; usedCapacity=0.982 absoluteUsedCapacity=0.982 used= vCores:26> cluster= > 2015-02-04 18:18:28,646 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler > (ResourceManager Event Processor): Skipping scheduling since node > ip-10-171-0-122.ec

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Michael Albert
From: Sandy Ryza To: Imran Rashid Cc: Michael Albert ; "user@spark.apache.org" Sent: Wednesday, February 4, 2015 12:54 PM Subject: Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job? Also, do you see any lines in the YARN NodeManager logs where it says

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Sandy Ryza
Also, do you see any lines in the YARN NodeManager logs where it says that it's killing a container? -Sandy On Wed, Feb 4, 2015 at 8:56 AM, Imran Rashid wrote: > Hi Michael, > > judging from the logs, it seems that those tasks are just working a really > long time. If you have long running tas

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Imran Rashid
Hi Michael, judging from the logs, it seems that those tasks are just working a really long time. If you have long running tasks, then you wouldn't expect the driver to output anything while those tasks are working. What is unusual is that there is no activity during all that time the tasks are

advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-03 Thread Michael Albert
Greetings! First, my sincere thanks to all who have given me advice.Following previous discussion, I've rearranged my code to try to keep the partitions to more manageable sizes.Thanks to all who commented. At the moment, the input set I'm trying to work with is about 90GB (avro parquet format).