Re: Combiner applied on multiple map task outputs (like in Mahout SVD)

Sebastian Schelter Thu, 27 Sep 2012 02:19:01 -0700

Jake is absolutely right here, the combiner is also applied on the
reducers, I forgot to mention that.


The shuffle phase in Hadoop is basically a distributed merge-sort. When
the reducers start to merge the mapper outputs, they can also apply the
combiner. However this doesn't help with reducing network traffic.

The chapter 'Shuffle and Sort' in 'Hadoop: The definitive guide' has a
detailed chapter describing this process.

--sebastian


On 27.09.2012 11:11, Sigurd Spieckermann wrote:
> OK, I see. Makes sense. Thank you!
> 
> 2012/9/27 Sean Owen <[email protected]>
> 
>> I think he means that it is not only applied to the output of the
>> mapper, but to output of the combiners many times as well. It is not
>> used at the reducer.
>>
>> On Thu, Sep 27, 2012 at 9:56 AM, Sigurd Spieckermann
>> <[email protected]> wrote:
>>> @Jake: Could you please elaborate on how exactly the combiner can be
>> called
>>> before the reducer gets the data? Do you mean the combiner is called at
>> the
>>> datanode that instantiates reducer tasks? I thought the combiner is just
>>> called after the map task has finished and still on that datanode.
>>>
>>> 2012/9/26 Jake Mannix <[email protected]>
>>>
>>>> It should also be noted that the Combiner does not only run for the
>> mappers
>>>> -
>>>> they can be used one (or more) times after mapping, and then one or more
>>>> times before the reducer gets the results.  It's not quite so simple as
>> to
>>>> say that
>>>> you get combiners used only (and always) on the outputs of each map
>> task.
>>
>

Re: Combiner applied on multiple map task outputs (like in Mahout SVD)

Reply via email to