For most of the tasks one can force the number of reducers with mapred.reduce.tasks=<N> where <N> the desired number of reducers.
It will not necessary increase the performance though - with kmeans and fuzzykmeans combiners do reducers job and increasing the number of reducers won't usually affect performance. With the canopy the distributed algorithm<http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java?revision=1134456&view=markup>has no combiners and has 1 reducer hardcoded - trying to increase #reducers won't have any effect as the algorithm doesn't work with >1 reducer. My experience that the canopy won't scale to large data and need improvement. -- Konstantin On Sun, Sep 18, 2011 at 10:50 AM, Paritosh Ranjan <[email protected]> wrote: > Hi, > > I have been trying to cluster some hundreds of millions of records using > Mahout Clustering techniques. > > The number of reducers is always one which I am not able to change. This is > effecting the performance. I am using Mahout 0.5 > > In 0.6-SNAPSHOT, I see that the MeanShiftCanopyDriver has been changed to > use any number of reducers. Will other ClusterDrivers also get changed to > use any number of reducers in 0.6? > > Thanks and Regards, > Paritosh Ranjan > > > -- ksh:
