Not quiet sure, but it could be the GC Pause, if you are holding too much objects in memory. You can check this tuning <http://spark.apache.org/docs/1.2.0/tuning.html> part if you haven't already been through it.
Thanks Best Regards On Sat, Jan 31, 2015 at 7:22 AM, Corey Nolet <cjno...@gmail.com> wrote: > We have a series of spark jobs which run in succession over various cached > datasets, do small groups and transforms, and then call > saveAsSequenceFile() on them. > > Each call to save as a sequence file appears to have done its work, the > task says it completed in "xxx.xxxxx seconds" but then it pauses and the > pauses are quite significant- sometimes up to 2 minutes. We are trying to > figure out what's going on during this pause- if the executors are really > still writing to the sequence files or if maybe a race condition is > happening on an executor which is causing timeouts. > > Any ideas? Anyone else seen this happening? > > > We also tried running all the saveAsSequenceFile calls in separate futures > to see if maybe the waiting would still only take 1-2 minutes but it looks > like the waiting still takes the sum of the amount of time it would have > originally (several several minutes). Our job runs, in its entirety, 35 > minutes and we're estimating that we're spending at least 16 minutes in > this paused state. What I haven't been able to do is figure out how to > trace through all the executors. Is there a way to do this? The event logs > in yarn don't seem to help much with this. >