Hi! I have been using RandomSeedGenerator, and was hoping it had a patch like that described in Mahout-279 since I want only 10 vectors out of a set of more than 100,000,000. I have been using canopy clustering for better results, but still need to do a few passes of kmeans to determine my T, and the random seed does take a long time.
The comments say that you are working on a kmeans++, I searched around but couldn't confirm any more information about it. Is a scalable kmeans++ in the works? (I know research on the subject is quite new) Thanks! Mattie Whitmore Mathematician/IR&D Software Engineer HARRIS Corporation - Advanced Information Solutions 301.837.5278 [email protected]<mailto:[email protected]>
