Re: Clarification with the Number of mappers in Canopy and Kmeans

Dhruv Kumar Sat, 27 Aug 2011 16:11:13 -0700

The number of mappers is governed by the block size of the DFS. The default
is 64MB, what is the value for your cluster?


On Sat, Aug 27, 2011 at 2:24 AM, Xiaomeng Wan <[email protected]> wrote:

> Hi Abhik,
>
> Looks like you need to set the hadoop job conf
> "-Dmapred.max.split.size=xxx(in bytes)" smaller than block size, if it
> is supported in mahout wrapper.
>
> Shawn
>
> On Thu, Aug 25, 2011 at 11:13 AM, Abhik Banerjee
> <[email protected]> wrote:
> > Hi ,
> >
> > I hope you are doing fine. I had a clarification to make , and thought
> > I shall shoot you a mail about the same. I am running Canopy and
> > Kmeans clustering on my Hadoop dev cluster at my organization. , but ,
> > each time I run these on my data set (which is around 55 MB to 70 MB
> > of sequence files ) , I only see , 1 mapper and 1 reducer running in
> > the job tracker , both for Canopy and K means CLustering (for each
> > iteration ) .
> >
> > Is it dependant on the data file size being passed , or is there any
> > way , I can configure the number of mappers being used by these
> > algorithms (Though I feel I cant do this and it has to be decided by
> > the job tracker about spawning the number of mappers . Because , with
> > one mapper it takes quite a while to run my canopy clustering aroud
> > 5-6 hours , and I am thinking if it can speed up if it can use
> > multiple mappers somehow. )
> >
> > The Kmeans also uses 1 mapper and 1 reducer but is it is comparatively
> > fast , as the centroid points are decided by the canopy output result.
> >
> > Thanks and Regards,
> > Abhik Banerjee
> >
> > 513 364 6591
> >
>

Re: Clarification with the Number of mappers in Canopy and Kmeans

Reply via email to