Pat, You may be interested in the code at https://github.com/tdunning/knn
This includes some high speed clustering code that could help you with your issues. To wit, - there aren't as many knobs to tweak on the algorithm (you still have data scaling tricks to do) - the speed should be 10-100x current Mahout implementations - it will go into Mahout before too long The big downsides right now are - no history yet - not compatible with Mahout clustering API's yet - it doesn't have the final pass of in-memory clustering so it really just gives you an indifferent quality clustering with a huge number of weighted clusters. With the final pass, it will give you a high quality clustering with your specified number of clusters. On Sun, May 6, 2012 at 1:49 PM, Pat Ferrel <[email protected]> wrote: > What would cause kmeans to not return k clusters? As I tweak parameters I > get different numbers of clusters but it's usually less than the k I pass > in. Since I am not using canopies at present I would expect k to always be > honored but the quality of the clusters would depend on the convergence > amount and number of iterations allowed. No? >
