[ https://issues.apache.org/jira/browse/MAPREDUCE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Bean resolved MAPREDUCE-5323. ---------------------------------- Resolution: Not A Problem Misunderstood config mapreduce.map.combine.minspills as the number of spills to require before the first combine. Instead, it's the number of spills required for a second and subsequent combines on merge. > Min Spills For Combine Ignored > ------------------------------ > > Key: MAPREDUCE-5323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task > Reporter: Jeff Bean > Priority: Minor > > We've observed for some time that combiners always run when specified. > However there is a config called mapreduce.map.combine.minspills which sort > of implies that the developer or administrator ought to be able to control > when combiners are invoked. > I spelunked into the code and found this gem in MapTask.java: > if (combinerRunner == null || numSpills < minSpillsForCombine) { > Merger.writeFile(kvIter, writer, reporter, job); } else { > combineCollector.setWriter(writer); combinerRunner.combine(kvIter, > combineCollector); } > That looks way buggy to me. If ( A || B ) is made false by A then B is never > executed. I spelunked around the code some more and it looks like > combinerRunner is never null except on reflection failure. So it looks like > the intention is for minSpillsForCombine to be respected, but due to this > logic error it's totally ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira