Try turning on the Kryo serializer as described at http://spark.apache.org/docs/latest/tuning.html. Also, are there any exceptions in the driver program’s log before this happens?
Matei On Apr 28, 2014, at 9:19 AM, Buttler, David <buttl...@llnl.gov> wrote: > Hi, > I am trying to run the K-means code in mllib, and it works very nicely with > small K (less than 1000). However, when I try for a larger K (I am looking > for 2000-4000 clusters), it seems like the code gets part way through > (perhaps just the initialization step) and freezes. The compute nodes stop > doing any CPU / network / IO and nothing happens for hours. I had done > something similar back in the days of Spark 0.6, and I didn’t have any > trouble going up to 4000 clusters with similar data. > > This happens with both a standalone cluster, and in local multi-core mode > (with the node given 200GB of heap), but eventually completes in local > single-core mode. > > Data statistics: > Rows: 166248 > Columns: 108 > > This is a test run before trying it out on much larger data > > Any ideas on what might be the cause of this? > > Thanks, > Dave