Hi, I have one question related to the reduce phase of MR jobs.
The intermediate outputs of map tasks are pulled in from the nodes which ran map tasks to the node where reducers is going to run and those intermediate data is written to the reducers local fs. My question is that if there is a job processing huge amount of data and it has multiple mappers but only one reducer , then its possible that the job would never complete successfully as the single hosts disk might not be sufficient to hold all the map outputs of the job. The job essentially would fail after retrying configured number of attempts. Thanks, Rahul
