Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

Paritosh Ranjan Sat, 10 Mar 2012 23:49:53 -0800

Can you try reducing/increasing you block and see the impact?
I am suspecting block size to be the problem.


I have faced the same problem once ( for a different hadoop job, and it
was very hard to debug it ). In that case, CompositeInputFormat was
being used as input, which used to fix the block size to 64 MB, and
hence, only few reducers were activated. So, trying different block
sizes might give some clue.

On 11-03-2012 11:04, WangRamon wrote:
> Here is the configuration:   <property>
>         <name>mapred.tasktracker.map.tasks.maximum</name>
>         <value>14</value>
>     </property>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>14</value>
>     </property>
>     <property>
>         <name>mapred.reduce.tasks</name>
>         <value>73</value>
>     </property>
>  
>   Each node has a RAM of 32GB, i think it should be fine to have the above 
> configuartion.
>  > Date: Sat, 10 Mar 2012 22:31:44 -0700
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means 
>> cluster
>>
>> What's your Hadoop config in terms of the maximum number of reducers?
>> It's a function of your available RAM on each node and numbers of nodes.
>>
>> On 3/10/12 8:55 PM, WangRamon wrote:
>>> Hi Paritosh    I did the tests with 1 job and 5 jobs, they all have the 
>>> same problem, the job i'm running is the buildClusters one, I can see there 
>>> are 73 reduce tasks created from the monitor GUI, but only 12 of them are 
>>> running at any time (the rest are in pending state), the task finished very 
>>> quickly, it's about no more than 18 seconds to finish every reduce task, so 
>>> maybe that's the cause? Thanks    Cheers  Ramon
>>>  > Date: Sun, 11 Mar 2012 09:14:15 +0530
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means 
>>>> cluster
>>>>
>>>> And to answer the question about KMeans configuration :
>>>>
>>>> Kmeans has two jobs :
>>>> 1) builClusters : has a reducer and has no limitation on the number of
>>>> reducer tasks
>>>> 2) clusterData : executes if runClustering = true, has no reducer tasks
>>>>
>>>> On 11-03-2012 09:10, Paritosh Ranjan wrote:
>>>>> Can you run K-means jobs again ( all with the same block size ) and give
>>>>> same statistics for :
>>>>>
>>>>> a) only 1 job running
>>>>> b) 2 jobs running simultaneously
>>>>> c) 5 jobs running simultaneously
>>>>>
>>>>> On 10-03-2012 21:08, WangRamon wrote:
>>>>>> Hi All  I submit 5  K-Means Jobs simultaneously, my Hadoop cluster have 
>>>>>> 42 map and 42 reduce slots configured, I set the default reduce task per 
>>>>>> job as 73 (42 * 1.75), I find there are always about 12 of the reduce 
>>>>>> tasks are running at any time although there are 73 reduce tasks created 
>>>>>> for each of the K-Means job and i do have 42 reduce slots, it means at 
>>>>>> anytime i have about 30 reduce slots free. So i tried RecommenderJob 
>>>>>> from mahout again, i remember that job will use all my slots in my 
>>>>>> previouse test, and YES for this time, 
>>>>>> "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 
>>>>>> reduce and 42 map, so I'm wondering is that something configured in 
>>>>>> Mahout which cause this strange behavior? Any suggestions? Thanks in 
>>>>>> advance.   Btw, i'm using mahout-0.6 release. Cheers Ramon               
>>>>>>                             
>>>                                       
>

Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

Reply via email to