Re: Clustering : Number of Reducers

Konstantin Shmakov Sun, 18 Sep 2011 13:18:46 -0700

For most of the tasks one can force the number of reducers with
mapred.reduce.tasks=<N>
where <N> the desired number of reducers.


It will not necessary increase the performance though - with kmeans and
fuzzykmeans combiners do reducers job and increasing the number of reducers
won't usually affect performance.

With the canopy the distributed
algorithm<http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java?revision=1134456&view=markup>has
no combiners and has 1 reducer hardcoded
- trying to increase #reducers won't have any effect as the algorithm
doesn't work with >1 reducer. My experience that the canopy won't scale to
large data and need improvement.

-- Konstantin



On Sun, Sep 18, 2011 at 10:50 AM, Paritosh Ranjan <[email protected]> wrote:

> Hi,
>
> I have been trying to cluster some hundreds of millions of records using
> Mahout Clustering techniques.
>
> The number of reducers is always one which I am not able to change. This is
> effecting the performance. I am using Mahout 0.5
>
> In 0.6-SNAPSHOT, I see that the MeanShiftCanopyDriver has been changed to
> use any number of reducers. Will other ClusterDrivers also get changed to
> use any number of reducers in 0.6?
>
> Thanks and Regards,
> Paritosh Ranjan
>
>
>


-- 
ksh:

Re: Clustering : Number of Reducers

Reply via email to