How can I set 'mapred.tasktracker.reduce.tasks.maximum' to "0" in a running tasktracker? Seems that I need to restart the tasktracker and in that case I'll loose the output of map tasks by particular tasktracker.
Can I change 'mapred.tasktracker.reduce.tasks.maximum' to "0" without restarting tasktracker? ~Abhay On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks <[email protected]> wrote: > HI Abhay > > The TaskTrackers on which the reduce tasks are triggered is chosen in > random based on the reduce slot availability. So if you don't need the > reduce tasks to be scheduled on some particular nodes you need to set > 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The > bottleneck here is that this property is not a job level one you need to > set it on a cluster level. > > A cleaner approach will be to configure each of your nodes with the right > number of map and reduce slots based on the resources available on each > machine. > > > On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi < > [email protected]> wrote: > >> Hello, >> >> How can one get to know the nodes on which reduce tasks will run? >> >> One of my job is running and it's completing all the map tasks. >> My map tasks write lots of intermediate data. The intermediate directory >> is getting full on all the nodes. >> If the reduce task take any node from cluster then It'll try to copy the >> data to same disk and it'll eventually fail due to Disk space related >> exceptions. >> >> I have added few more tasktracker nodes in the cluster and now want to >> run reducer on new nodes only. >> Is it possible to choose a node on which the reducer will run? What's the >> algorithm hadoop uses to get a new node to run reducer? >> >> Thanks in advance. >> >> Bye >> Abhay >> > >
