Hi,
I am trying to understand the Kmeans implementation in Mahout.
Few questions appear in my mind:

 1.  In the ClusterIteration.IterateMR(), no combiner class has been declared. 
Looking at CIMapper and CIReducer, I could not find out where the new centroids 
are computed at the end of each iteration?
    *   I expected at some point the "SUM" (as in Cluster.S1) of the points 
assigned to a cluster will be divided by the number of points (Cluster.S0). The 
computeCentroid() method in AbstractCluster class does that but I could not 
find whether it was called or not.
 2.  While generating the cluster centroids as initial guess i.e 
RandomSeedGenerator.buildRandom(), why the observer() method was called for 
each cluster? I noticed this observe() method records the sum of points 
assigned to that cluster. Then, is not that point (which was chosen as 
clusterCenter) counted twice ?

Can someone please help me answering these questions.

Regards,
Aniruddha

Reply via email to