[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean resolved MAPREDUCE-5323.
----------------------------------

    Resolution: Not A Problem

Misunderstood config mapreduce.map.combine.minspills as the number of spills to 
require before the first combine. Instead, it's the number of spills required 
for a second and subsequent combines on merge.
                
> Min Spills For Combine Ignored
> ------------------------------
>
>                 Key: MAPREDUCE-5323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Jeff Bean
>            Priority: Minor
>
> We've observed for some time that combiners always run when specified. 
> However there is a config called mapreduce.map.combine.minspills which sort 
> of implies that the developer or administrator ought to be able to control 
> when combiners are invoked.
> I spelunked into the code and found this gem in MapTask.java:
> if (combinerRunner == null || numSpills < minSpillsForCombine) { 
> Merger.writeFile(kvIter, writer, reporter, job); } else { 
> combineCollector.setWriter(writer); combinerRunner.combine(kvIter, 
> combineCollector); }
> That looks way buggy to me. If ( A || B ) is made false by A then B is never 
> executed. I spelunked around the code some more and it looks like 
> combinerRunner is never null except on reflection failure. So it looks like 
> the intention is for minSpillsForCombine to be respected, but due to this 
> logic error it's totally ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to