It is possible to run the M/R jobs inside Eclipse or another IDE with
small datasets. I learned a lot from single-stepping through some of
the more complex code.

On Wed, Aug 15, 2012 at 10:08 AM, Aniruddha Basak <[email protected]> wrote:
> Hi,
> I am trying to understand the Kmeans implementation in Mahout.
> Few questions appear in my mind:
>
>  1.  In the ClusterIteration.IterateMR(), no combiner class has been 
> declared. Looking at CIMapper and CIReducer, I could not find out where the 
> new centroids are computed at the end of each iteration?
>     *   I expected at some point the "SUM" (as in Cluster.S1) of the points 
> assigned to a cluster will be divided by the number of points (Cluster.S0). 
> The computeCentroid() method in AbstractCluster class does that but I could 
> not find whether it was called or not.
>  2.  While generating the cluster centroids as initial guess i.e 
> RandomSeedGenerator.buildRandom(), why the observer() method was called for 
> each cluster? I noticed this observe() method records the sum of points 
> assigned to that cluster. Then, is not that point (which was chosen as 
> clusterCenter) counted twice ?
>
> Can someone please help me answering these questions.
>
> Regards,
> Aniruddha



-- 
Lance Norskog
[email protected]

Reply via email to