I don't see a direct question asked, but here's a condition in the source code you want to take a look at (*): https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobInProgress.java#L2316
(*) - Yet to appear in MRv2 - See/help out with MAPREDUCE-2723. On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee <[email protected]> wrote: > Hi, > > I have one question related to the reduce phase of MR jobs. > > The intermediate outputs of map tasks are pulled in from the nodes which ran > map tasks to the node where reducers is going to run and those intermediate > data is written to the reducers local fs. My question is that if there is a > job processing huge amount of data and it has multiple mappers but only one > reducer , then its possible that the job would never complete successfully > as the single hosts disk might not be sufficient to hold all the map outputs > of the job. > > The job essentially would fail after retrying configured number of attempts. > > Thanks, > Rahul -- Harsh J
