Hi Paritosh I think the block size may be the problem too, btw, do you mean the 
block size of the HDFS? I know its default size is 64MB, but I haven't tried 
some other size.   Thanks Ramon> Date: Sun, 11 Mar 2012 13:18:52 +0530
> From: [email protected]
> To: [email protected]
> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means 
> cluster
> 
> Can you try reducing/increasing you block and see the impact?
> I am suspecting block size to be the problem.
> 
> I have faced the same problem once ( for a different hadoop job, and it
> was very hard to debug it ). In that case, CompositeInputFormat was
> being used as input, which used to fix the block size to 64 MB, and
> hence, only few reducers were activated. So, trying different block
> sizes might give some clue.
> 
> On 11-03-2012 11:04, WangRamon wrote:
> > Here is the configuration:   <property>
> >         <name>mapred.tasktracker.map.tasks.maximum</name>
> >         <value>14</value>
> >     </property>
> >     <property>
> >         <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >         <value>14</value>
> >     </property>
> >     <property>
> >         <name>mapred.reduce.tasks</name>
> >         <value>73</value>
> >     </property>
> >  
> >   Each node has a RAM of 32GB, i think it should be fine to have the above 
> > configuartion.
> >  > Date: Sat, 10 Mar 2012 22:31:44 -0700
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means 
> >> cluster
> >>
> >> What's your Hadoop config in terms of the maximum number of reducers?
> >> It's a function of your available RAM on each node and numbers of nodes.
> >>
> >> On 3/10/12 8:55 PM, WangRamon wrote:
> >>> Hi Paritosh    I did the tests with 1 job and 5 jobs, they all have the 
> >>> same problem, the job i'm running is the buildClusters one, I can see 
> >>> there are 73 reduce tasks created from the monitor GUI, but only 12 of 
> >>> them are running at any time (the rest are in pending state), the task 
> >>> finished very quickly, it's about no more than 18 seconds to finish every 
> >>> reduce task, so maybe that's the cause? Thanks    Cheers  Ramon
> >>>  > Date: Sun, 11 Mar 2012 09:14:15 +0530
> >>>> From: [email protected]
> >>>> To: [email protected]
> >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means 
> >>>> cluster
> >>>>
> >>>> And to answer the question about KMeans configuration :
> >>>>
> >>>> Kmeans has two jobs :
> >>>> 1) builClusters : has a reducer and has no limitation on the number of
> >>>> reducer tasks
> >>>> 2) clusterData : executes if runClustering = true, has no reducer tasks
> >>>>
> >>>> On 11-03-2012 09:10, Paritosh Ranjan wrote:
> >>>>> Can you run K-means jobs again ( all with the same block size ) and give
> >>>>> same statistics for :
> >>>>>
> >>>>> a) only 1 job running
> >>>>> b) 2 jobs running simultaneously
> >>>>> c) 5 jobs running simultaneously
> >>>>>
> >>>>> On 10-03-2012 21:08, WangRamon wrote:
> >>>>>> Hi All  I submit 5  K-Means Jobs simultaneously, my Hadoop cluster 
> >>>>>> have 42 map and 42 reduce slots configured, I set the default reduce 
> >>>>>> task per job as 73 (42 * 1.75), I find there are always about 12 of 
> >>>>>> the reduce tasks are running at any time although there are 73 reduce 
> >>>>>> tasks created for each of the K-Means job and i do have 42 reduce 
> >>>>>> slots, it means at anytime i have about 30 reduce slots free. So i 
> >>>>>> tried RecommenderJob from mahout again, i remember that job will use 
> >>>>>> all my slots in my previouse test, and YES for this time, 
> >>>>>> "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 
> >>>>>> reduce and 42 map, so I'm wondering is that something configured in 
> >>>>>> Mahout which cause this strange behavior? Any suggestions? Thanks in 
> >>>>>> advance.   Btw, i'm using mahout-0.6 release. Cheers Ramon             
> >>>>>>                             
> >>>                                     
> >                                       
> 
                                          

Reply via email to