Hi, The reducer copies map outputs progressively (as and when they complete) unless configured otherwise. It is normal hence, for the overall average (thats what it is currently, unfortunately), to show up lower than the actual value since there are periods where the reducer is idle in waiting for further map task waves to complete.
You can control the mapred.reduce.slowstart.completed.maps (5% (0.05) by default) to control the threshold of overall maps completion percentage the reducer should begin copying outputs at. An increased value, such as 80% (0.8) will let your Reducers copy more data continuously (since it does not have to wait much). On Thu, Nov 1, 2012 at 2:31 PM, john smith <[email protected]> wrote: > Hi list, > > I have jobs that generate huge amount of intermediate data. For eg: One of > my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1 > master. > > My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but > normal file transfers between my datanodes generally go up to 40-50 MBps. > Why is my shuffle speed so slow? > > Also how is that number calculated ? What exactly does that signify? (Is it > the avg speed of all mappers to that particular reducer? or anything else?) > Any suggestions? > > Thanks -- Harsh J
