Hi Himanshu, Changing the ratio is definitely a reasonable thing to do. The capacities come from the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations. You can tweak these on your nodes to get your desired ratio.
-Sandy On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <[email protected]>wrote: > Hi, > > Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map > Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a > ratio of 2.7. We have a lot of variety of jobs running and we want to > increase the throughput. > > My manual observation was that we hit the Mapper capacity and hence many > jobs have to wait even though lot of room left in Reduce capacity. I mined > the jobtracker logs for the jobs that completed and saw that on a hourly > basis as well as daily basis the mapper:reducer ratio was 4-5. > > To increase the throughput I was thinking that I experiment changing the > Map and Reducer Task Capacity such that the ratio is increased from 2.7 to > ~4. > > Does this sound like a correct approach ? Is this something that I can > control or it's determined automatically by Hadoop ? > > Have any of you done this kind of exercise ? If yes can you please direct > how to go about changing this ratio. I am not finding much literature on > it. > > Note: Mapper and ReducerTask Capacity is the max total no. of > mappers/reducers you can run on the cluster at any point. > > Regards, > -Himanshu Vijay >
