Hi,

Have you considered using an in-mapper combining pattern? i.e Inside your
Mapper object you can create a Map object holding the intermediate
key-values whose state is preserved across multiple calls of map method.
The values are emitted periodically only when certain threshold
reached(threshold = ratio between block size and memory consumed). You can
make use of a counter to check the number of key-value pairs has been
processed. You can substantially avoid the problem: "reducer to be the
bottleneck when there are large volume of intermediate output" as you have
already a lesser number of intermediate keys in-memory which are flushed on
a specific bucket size.


Thanks
Sambit Tripathy



On Thu, Sep 20, 2012 at 6:42 PM, Jason Yang <[email protected]>wrote:

> Hi, all
>
> I have a question that whether all the intermediate output with the same
> key go to the same reducer or not?
>
> If it is, in case of only two keys are generated from mapper, but there
> are 3 reducer running in this job, what would happen?
>
> If not, how could I do some processing over the all data, like counting? I
> think some would suggest to set the number of reducer to 1, but I thought
> this would make the reducer to be the bottleneck when there are large
> volume of intermediate output, isn't it?
>
> --
> YANG, Lin
>
>

Reply via email to