Re: K-means with large K

Chester Chen Mon, 28 Apr 2014 09:32:25 -0700

David, 
  Just curious to know what kind of use cases demand such large k clusters


Chester

Sent from my iPhone

On Apr 28, 2014, at 9:19 AM, "Buttler, David" <buttl...@llnl.gov> wrote:

> Hi,
> I am trying to run the K-means code in mllib, and it works very nicely with 
> small K (less than 1000).  However, when I try for a larger K (I am looking 
> for 2000-4000 clusters), it seems like the code gets part way through 
> (perhaps just the initialization step) and freezes.  The compute nodes stop 
> doing any CPU / network / IO and nothing happens for hours.  I had done 
> something similar back in the days of Spark 0.6, and I didn’t have any 
> trouble going up to 4000 clusters with similar data.
>  
> This happens with both a standalone cluster, and in local multi-core mode 
> (with the node given 200GB of heap), but eventually completes in local 
> single-core mode.
>  
> Data statistics:
> Rows: 166248
> Columns: 108
>  
> This is a test run before trying it out on much larger data
>  
> Any ideas on what might be the cause of this?
>  
> Thanks,
> Dave

Re: K-means with large K

Reply via email to