Re: Set reducer capacity for a specific M/R job

Nitin Pawar Tue, 30 Apr 2013 03:36:24 -0700

so basically if I understand correctly

you want to limit the # parallel execution of reducers only for this job?




On Tue, Apr 30, 2013 at 4:02 PM, Han JU <[email protected]> wrote:

> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cluster when it executes my
> specific job. Say I have 100 reduce tasks for this job, I want my cluster
> to execute 4 of them in the same time, not 8 of them in the same time, only
> for this specific job.
> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
> This conf is well received by the job, but ignored by hadoop ..
>
> Any idea why is this?
>
>
> 2013/4/30 Nitin Pawar <[email protected]>
>
>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
>> maximum number of reduce tasks that may be run by an individual TaskTracker
>> server at one time. This is not per job configuration.
>>
>> he number of map tasks for a given job is driven by the number of input
>> splits and not by the mapred.map.tasks parameter. For each input split a
>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>> a hint to the InputFormat for the number of maps
>>
>> If you want to set max number of maps or reducers per job then you can
>> set the hints by using the job object you created
>> job.setNumMapTasks()
>>
>> Note this is just a hint and again the number will be decided by the
>> input split size.
>>
>>
>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <[email protected]> wrote:
>>
>>> Thanks Nitin.
>>>
>>> What I need is to set slot only for a specific job, not for the whole
>>> cluster conf.
>>> But what I did does NOT work ... Have I done something wrong?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <[email protected]>
>>>
>>>> The config you are setting is for job only
>>>>
>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>> to edit tasktracker conf and restart tasktracker
>>>> On Apr 30, 2013 3:30 PM, "Han JU" <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>> I did:
>>>>>
>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>> ...
>>>>> Job job = new Job(conf, ...)
>>>>>
>>>>>
>>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>> datanode ... why is this?
>>>>>
>>>>> How could I achieve this?
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Reply via email to