RE: KMeans with ASFEmail archive data set

Paritosh Ranjan Mon, 18 Jun 2012 11:15:45 -0700

Canopy Clustering can be used to find the initial centroids. This might give 
some stability in the result ( number of iterations taken to converge, and also 
the clusters found ) . 
However, its not guaranteed that each time the centroids found by Canopy 
Clustering would be same.
________________________________________
From: Joshi, Shrinivas [[email protected]]
Sent: Monday, June 18, 2012 6:33 PM
To: [email protected]
Subject: KMeans with ASFEmail archive data set


Hi,

I have been looking at KMeans clustering of ASFEmail archive data set using the 
script that is part of the examples directory. This is with Mahout 0.6, Hadoop 
1.0.3 and JDK 7 u4 stack. I have noticed that sometimes the algorithm converges 
in 1 iteration (randomSeed iteration + a clustering iteration) and sometimes it 
takes 5 iterations. This is probably due to how the initial centroids get 
picked. Is this expected behavior? Is there any way to make the initial 
centroid selection uniformly random?

Thanks,
-Shrinivas

RE: KMeans with ASFEmail archive data set

Reply via email to