It is possible to run the M/R jobs inside Eclipse or another IDE with small datasets. I learned a lot from single-stepping through some of the more complex code.
On Wed, Aug 15, 2012 at 10:08 AM, Aniruddha Basak <[email protected]> wrote: > Hi, > I am trying to understand the Kmeans implementation in Mahout. > Few questions appear in my mind: > > 1. In the ClusterIteration.IterateMR(), no combiner class has been > declared. Looking at CIMapper and CIReducer, I could not find out where the > new centroids are computed at the end of each iteration? > * I expected at some point the "SUM" (as in Cluster.S1) of the points > assigned to a cluster will be divided by the number of points (Cluster.S0). > The computeCentroid() method in AbstractCluster class does that but I could > not find whether it was called or not. > 2. While generating the cluster centroids as initial guess i.e > RandomSeedGenerator.buildRandom(), why the observer() method was called for > each cluster? I noticed this observe() method records the sum of points > assigned to that cluster. Then, is not that point (which was chosen as > clusterCenter) counted twice ? > > Can someone please help me answering these questions. > > Regards, > Aniruddha -- Lance Norskog [email protected]
