Re: knowing the nodes on which reduce tasks will run

Abhay Ratnaparkhi Mon, 03 Sep 2012 08:37:17 -0700

How can I set  'mapred.tasktracker.reduce.tasks.maximum'  to "0" in a
running tasktracker?
Seems that I need to restart the tasktracker and in that case I'll loose
the output of map tasks by particular tasktracker.


Can I change   'mapred.tasktracker.reduce.tasks.maximum'  to "0"  without
restarting tasktracker?

~Abhay

On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks <[email protected]> wrote:

> HI Abhay
>
> The TaskTrackers on which the reduce tasks are triggered is chosen in
> random based on the reduce slot availability. So if you don't need the
> reduce tasks to be scheduled on some particular nodes you need to set
> 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The
> bottleneck here is that this property is not a job level one you need to
> set it on a cluster level.
>
> A cleaner approach will be to configure each of your nodes with the right
> number of map and reduce slots based on the resources available on each
> machine.
>
>
> On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi <
> [email protected]> wrote:
>
>> Hello,
>>
>> How can one get to know the nodes on which reduce tasks will run?
>>
>> One of my job is running and it's completing all the map tasks.
>> My map tasks write lots of intermediate data. The intermediate directory
>> is getting full on all the nodes.
>> If the reduce task take any node from cluster then It'll try to copy the
>> data to same disk and it'll eventually fail due to Disk space related
>> exceptions.
>>
>> I have added few more tasktracker nodes in the cluster and now want to
>> run reducer on new nodes only.
>> Is it possible to choose a node on which the reducer will run? What's the
>> algorithm hadoop uses to get a new node to run reducer?
>>
>> Thanks in advance.
>>
>> Bye
>> Abhay
>>
>
>

Re: knowing the nodes on which reduce tasks will run

Reply via email to