Hi Paritosh I think the block size may be the problem too, btw, do you mean the
block size of the HDFS? I know its default size is 64MB, but I haven't tried
some other size. Thanks Ramon> Date: Sun, 11 Mar 2012 13:18:52 +0530
> From: [email protected]
> To: [email protected]
> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means
> cluster
>
> Can you try reducing/increasing you block and see the impact?
> I am suspecting block size to be the problem.
>
> I have faced the same problem once ( for a different hadoop job, and it
> was very hard to debug it ). In that case, CompositeInputFormat was
> being used as input, which used to fix the block size to 64 MB, and
> hence, only few reducers were activated. So, trying different block
> sizes might give some clue.
>
> On 11-03-2012 11:04, WangRamon wrote:
> > Here is the configuration: <property>
> > <name>mapred.tasktracker.map.tasks.maximum</name>
> > <value>14</value>
> > </property>
> > <property>
> > <name>mapred.tasktracker.reduce.tasks.maximum</name>
> > <value>14</value>
> > </property>
> > <property>
> > <name>mapred.reduce.tasks</name>
> > <value>73</value>
> > </property>
> >
> > Each node has a RAM of 32GB, i think it should be fine to have the above
> > configuartion.
> > > Date: Sat, 10 Mar 2012 22:31:44 -0700
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means
> >> cluster
> >>
> >> What's your Hadoop config in terms of the maximum number of reducers?
> >> It's a function of your available RAM on each node and numbers of nodes.
> >>
> >> On 3/10/12 8:55 PM, WangRamon wrote:
> >>> Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the
> >>> same problem, the job i'm running is the buildClusters one, I can see
> >>> there are 73 reduce tasks created from the monitor GUI, but only 12 of
> >>> them are running at any time (the rest are in pending state), the task
> >>> finished very quickly, it's about no more than 18 seconds to finish every
> >>> reduce task, so maybe that's the cause? Thanks Cheers Ramon
> >>> > Date: Sun, 11 Mar 2012 09:14:15 +0530
> >>>> From: [email protected]
> >>>> To: [email protected]
> >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means
> >>>> cluster
> >>>>
> >>>> And to answer the question about KMeans configuration :
> >>>>
> >>>> Kmeans has two jobs :
> >>>> 1) builClusters : has a reducer and has no limitation on the number of
> >>>> reducer tasks
> >>>> 2) clusterData : executes if runClustering = true, has no reducer tasks
> >>>>
> >>>> On 11-03-2012 09:10, Paritosh Ranjan wrote:
> >>>>> Can you run K-means jobs again ( all with the same block size ) and give
> >>>>> same statistics for :
> >>>>>
> >>>>> a) only 1 job running
> >>>>> b) 2 jobs running simultaneously
> >>>>> c) 5 jobs running simultaneously
> >>>>>
> >>>>> On 10-03-2012 21:08, WangRamon wrote:
> >>>>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster
> >>>>>> have 42 map and 42 reduce slots configured, I set the default reduce
> >>>>>> task per job as 73 (42 * 1.75), I find there are always about 12 of
> >>>>>> the reduce tasks are running at any time although there are 73 reduce
> >>>>>> tasks created for each of the K-Means job and i do have 42 reduce
> >>>>>> slots, it means at anytime i have about 30 reduce slots free. So i
> >>>>>> tried RecommenderJob from mahout again, i remember that job will use
> >>>>>> all my slots in my previouse test, and YES for this time,
> >>>>>> "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42
> >>>>>> reduce and 42 map, so I'm wondering is that something configured in
> >>>>>> Mahout which cause this strange behavior? Any suggestions? Thanks in
> >>>>>> advance. Btw, i'm using mahout-0.6 release. Cheers Ramon
> >>>>>>
> >>>
> >
>