The number of mappers is governed by the block size of the DFS. The default is 64MB, what is the value for your cluster?
On Sat, Aug 27, 2011 at 2:24 AM, Xiaomeng Wan <[email protected]> wrote: > Hi Abhik, > > Looks like you need to set the hadoop job conf > "-Dmapred.max.split.size=xxx(in bytes)" smaller than block size, if it > is supported in mahout wrapper. > > Shawn > > On Thu, Aug 25, 2011 at 11:13 AM, Abhik Banerjee > <[email protected]> wrote: > > Hi , > > > > I hope you are doing fine. I had a clarification to make , and thought > > I shall shoot you a mail about the same. I am running Canopy and > > Kmeans clustering on my Hadoop dev cluster at my organization. , but , > > each time I run these on my data set (which is around 55 MB to 70 MB > > of sequence files ) , I only see , 1 mapper and 1 reducer running in > > the job tracker , both for Canopy and K means CLustering (for each > > iteration ) . > > > > Is it dependant on the data file size being passed , or is there any > > way , I can configure the number of mappers being used by these > > algorithms (Though I feel I cant do this and it has to be decided by > > the job tracker about spawning the number of mappers . Because , with > > one mapper it takes quite a while to run my canopy clustering aroud > > 5-6 hours , and I am thinking if it can speed up if it can use > > multiple mappers somehow. ) > > > > The Kmeans also uses 1 mapper and 1 reducer but is it is comparatively > > fast , as the centroid points are decided by the canopy output result. > > > > Thanks and Regards, > > Abhik Banerjee > > > > 513 364 6591 > > >
