Mattie, Would this help?
https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/BallKmeans.java and https://github.com/tdunning/knn/blob/master/docs/scaling-k-means/scaling-k-means.pdf On Wed, Aug 15, 2012 at 10:45 AM, Whitmore, Mattie <[email protected]>wrote: > Hi! > > I have been using RandomSeedGenerator, and was hoping it had a patch like > that described in Mahout-279 since I want only 10 vectors out of a set of > more than 100,000,000. I have been using canopy clustering for better > results, but still need to do a few passes of kmeans to determine my T, and > the random seed does take a long time. > > The comments say that you are working on a kmeans++, I searched around but > couldn't confirm any more information about it. Is a scalable kmeans++ in > the works? (I know research on the subject is quite new) > > Thanks! > > > > Mattie Whitmore > Mathematician/IR&D Software Engineer > HARRIS Corporation - Advanced Information Solutions > 301.837.5278 > [email protected]<mailto:[email protected]> > > > >
