KMeans with ASFEmail archive data set

Joshi, Shrinivas Mon, 18 Jun 2012 09:33:52 -0700

Hi,

I have been looking at KMeans clustering of ASFEmail archive data set using the 
script that is part of the examples directory. This is with Mahout 0.6, Hadoop 
1.0.3 and JDK 7 u4 stack. I have noticed that sometimes the algorithm converges 
in 1 iteration (randomSeed iteration + a clustering iteration) and sometimes it 
takes 5 iterations. This is probably due to how the initial centroids get 
picked. Is this expected behavior? Is there any way to make the initial 
centroid selection uniformly random?


Thanks,
-Shrinivas

KMeans with ASFEmail archive data set

Reply via email to