Jeremy, do you happen to have a small test case that reproduces it? Is it with the kmeans example that comes with PySpark?
Matei On Jan 22, 2014, at 3:03 PM, Jeremy Freeman <[email protected]> wrote: > Thanks for the thoughts Matei! I poked at this some more. I ran top on each > of the workers during the job (I'm testing with the example kmeans), and > confirmed that the run dies when memory usage (of the java process) is still > around 30%. I do notice it going up, from around 20% after the first > iteration, to 30% by the time it dies, so definitely stays under 50%. Also, > memory is around 30% when running KMeans in scala, and I never get the > error. > > I can't find anything suspect in any of the worker logs (I'm looking at > stdout and stderr in spark.local.dir). The only error is that one reported > to the driver. > > Still haven't tried reproducing on EC2, will let you know if I can... > > -- Jeremy > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Stalling-during-large-iterative-PySpark-jobs-tp492p792.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
