David, Just curious to know what kind of use cases demand such large k clusters
Chester Sent from my iPhone On Apr 28, 2014, at 9:19 AM, "Buttler, David" <buttl...@llnl.gov> wrote: > Hi, > I am trying to run the K-means code in mllib, and it works very nicely with > small K (less than 1000). However, when I try for a larger K (I am looking > for 2000-4000 clusters), it seems like the code gets part way through > (perhaps just the initialization step) and freezes. The compute nodes stop > doing any CPU / network / IO and nothing happens for hours. I had done > something similar back in the days of Spark 0.6, and I didn’t have any > trouble going up to 4000 clusters with similar data. > > This happens with both a standalone cluster, and in local multi-core mode > (with the node given 200GB of heap), but eventually completes in local > single-core mode. > > Data statistics: > Rows: 166248 > Columns: 108 > > This is a test run before trying it out on much larger data > > Any ideas on what might be the cause of this? > > Thanks, > Dave