Hi, I have been looking at KMeans clustering of ASFEmail archive data set using the script that is part of the examples directory. This is with Mahout 0.6, Hadoop 1.0.3 and JDK 7 u4 stack. I have noticed that sometimes the algorithm converges in 1 iteration (randomSeed iteration + a clustering iteration) and sometimes it takes 5 iterations. This is probably due to how the initial centroids get picked. Is this expected behavior? Is there any way to make the initial centroid selection uniformly random?
Thanks, -Shrinivas
