Hi!

I have been using RandomSeedGenerator, and was hoping it had a patch like that 
described in Mahout-279 since I want only 10 vectors out of a set of more than 
100,000,000.  I have been using canopy clustering for better results, but still 
need to do a few passes of kmeans to determine my T, and the random seed does 
take a long time.

The comments say that you are working on a kmeans++, I searched around but 
couldn't confirm any more information about it.  Is a scalable kmeans++ in the 
works? (I know research on the subject is quite new)

Thanks!



Mattie Whitmore
Mathematician/IR&D Software Engineer
HARRIS  Corporation - Advanced Information Solutions
301.837.5278
[email protected]<mailto:[email protected]>



Reply via email to