All of my map tasks are about to complete and there is not much processing to be done in reducer. The job is running from a week so I don't want the job to fail. Any other suggestion to tackle this is welcome.
~Abhay On Mon, Sep 3, 2012 at 9:26 PM, Hemanth Yamijala <[email protected]>wrote: > Hi, > > You are right that a change to mapred.tasktracker.reduce.tasks.maximum > will require a restart of the tasktrackers. AFAIK, there is no way of > modifying this property without restarting. > > On a different note, could you see if the amount of intermediate data can > be reduced using a combiner, or some other form of local aggregation ? > > Thanks > hemanth > > > On Mon, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi < > [email protected]> wrote: > >> How can I set 'mapred.tasktracker.reduce.tasks.maximum' to "0" in a >> running tasktracker? >> Seems that I need to restart the tasktracker and in that case I'll loose >> the output of map tasks by particular tasktracker. >> >> Can I change 'mapred.tasktracker.reduce.tasks.maximum' to "0" without >> restarting tasktracker? >> >> ~Abhay >> >> >> On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks <[email protected]> wrote: >> >>> HI Abhay >>> >>> The TaskTrackers on which the reduce tasks are triggered is chosen in >>> random based on the reduce slot availability. So if you don't need the >>> reduce tasks to be scheduled on some particular nodes you need to set >>> 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The >>> bottleneck here is that this property is not a job level one you need to >>> set it on a cluster level. >>> >>> A cleaner approach will be to configure each of your nodes with the >>> right number of map and reduce slots based on the resources available on >>> each machine. >>> >>> >>> On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> How can one get to know the nodes on which reduce tasks will run? >>>> >>>> One of my job is running and it's completing all the map tasks. >>>> My map tasks write lots of intermediate data. The intermediate >>>> directory is getting full on all the nodes. >>>> If the reduce task take any node from cluster then It'll try to copy >>>> the data to same disk and it'll eventually fail due to Disk space related >>>> exceptions. >>>> >>>> I have added few more tasktracker nodes in the cluster and now want to >>>> run reducer on new nodes only. >>>> Is it possible to choose a node on which the reducer will run? What's >>>> the algorithm hadoop uses to get a new node to run reducer? >>>> >>>> Thanks in advance. >>>> >>>> Bye >>>> Abhay >>>> >>> >>> >> >
