Thanks for the thoughts Matei! I poked at this some more. I ran top on each
of the workers during the job (I'm testing with the example kmeans), and
confirmed that the run dies when memory usage (of the java process) is still
around 30%. I do notice it going up, from around 20% after the first
iteration, to 30% by the time it dies, so definitely stays under 50%. Also,
memory is around 30% when running KMeans in scala, and I never get the
error.

I can't find anything suspect in any of the worker logs (I'm looking at
stdout and stderr in spark.local.dir). The only error is that one reported
to the driver.

Still haven't tried reproducing on EC2, will let you know if I can...

-- Jeremy



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Stalling-during-large-iterative-PySpark-jobs-tp492p792.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to