Re: mapping of shuffle outputs to reduce tasks

Josh Rosen Mon, 11 Nov 2013 10:55:45 -0800

Let's say that you're running a MapReduce job with *M* map tasks, *R *reduce
tasks, and *K* machines.  Each map task will produce *R* shuffle outputs
(so *M*R* shuffle blocks total).  When the reduce phase starts, pending
reduce tasks are pulled off a queue and scheduled on executors.  Reduce
tasks aren't assigned to particular machines in advance; they're scheduled
as executors become free.

If you have more reduce tasks than machines (*R > K)*, then some machines
will run multiple reduce tasks.  You might want to run more reduce tasks
than machines to a). limit an individual reduce task's memory requirements,
or b). adapt to skew and stragglers.  With smaller, more granular reduce
tasks, slower machines can simply run fewer tasks while the remaining work
can be divided among the other machines.  The trade-off here is increased
scheduling overhead and more reduce output partitions, although the
scheduling overhead may be negligible in many cases and the small
post-shuffle outputs could be combined using coalesce().

On Mon, Nov 11, 2013 at 8:54 AM, Umar Javed <[email protected]> wrote:

> Say that you have a taskSet of maps, each operating on one Hadoop
> partition. How does the scheduler decide which mapTask output (i.e., a
> shuffle block) goes to what reducer? Are the shuffle blocks evenly split
> among reducers?
>
>
> On Sun, Nov 10, 2013 at 9:50 PM, Aaron Davidson <[email protected]>wrote:
>
>> It is responsible for a subset of shuffle blocks. MapTasks split up their
>> data, creating one shuffle block for every reducer. During the shuffle
>> phase, the reducer will fetch all shuffle blocks that were intended for it.
>>
>>
>> On Sun, Nov 10, 2013 at 9:38 PM, Umar Javed <[email protected]>wrote:
>>
>>> I was wondering how does the scheduler assign the ShuffledRDD locations
>>> to the reduce tasks? Say that you have 4 reduce tasks, and a number of
>>> shuffle blocks across two machines. Is each reduce task responsible for a
>>> subset of individual keys or a subset of shuffle blocks?
>>>
>>> Umar
>>>
>>
>>
>

Re: mapping of shuffle outputs to reduce tasks

Reply via email to