Hi, the algorithm uses memory proportianal to the number of your centers. By default, it sets "k.means.caching.enabled" to true, which caches your vectors to cluster in heap and thus you would need 1tb of ram. I would suggest you to set this to false (you will need to recompile the KMeansBSP class in the ml package, the line you have to change is 347).
Good luck and let us know if you have problems. 2012/8/27 HuYuesheng <[email protected]> > Hi, > > I want to know, if I want to test a 1TB K-means dataset, dose it mean I > need at least 1TB RAM(all of the cluster)? > Thank you! > > Best Regards! > > Yuesheng Hu > China >
