The number of mappers is governed by the block size of the DFS. The default
is 64MB, what is the value for your cluster?

On Sat, Aug 27, 2011 at 2:24 AM, Xiaomeng Wan <[email protected]> wrote:

> Hi Abhik,
>
> Looks like you need to set the hadoop job conf
> "-Dmapred.max.split.size=xxx(in bytes)" smaller than block size, if it
> is supported in mahout wrapper.
>
> Shawn
>
> On Thu, Aug 25, 2011 at 11:13 AM, Abhik Banerjee
> <[email protected]> wrote:
> > Hi ,
> >
> > I hope you are doing fine. I had a clarification to make , and thought
> > I shall shoot you a mail about the same. I am running Canopy and
> > Kmeans clustering on my Hadoop dev cluster at my organization. , but ,
> > each time I run these on my data set (which is around 55 MB to 70 MB
> > of sequence files ) , I only see , 1 mapper and 1 reducer running in
> > the job tracker , both for Canopy and K means CLustering (for each
> > iteration ) .
> >
> > Is it dependant on the data file size being passed , or is there any
> > way , I can configure the number of mappers being used by these
> > algorithms (Though I feel I cant do this and it has to be decided by
> > the job tracker about spawning the number of mappers . Because , with
> > one mapper it takes quite a while to run my canopy clustering aroud
> > 5-6 hours , and I am thinking if it can speed up if it can use
> > multiple mappers somehow. )
> >
> > The Kmeans also uses 1 mapper and 1 reducer but is it is comparatively
> > fast , as the centroid points are decided by the canopy output result.
> >
> > Thanks and Regards,
> > Abhik Banerjee
> >
> > 513 364 6591
> >
>

Reply via email to