Re: Ability to specify a combiner (with different signature than reducer)

Chao Shi Wed, 25 Sep 2013 01:57:37 -0700

Hi Som,

This approach does not make use of combiners. Suppose K is small, using
combiners may greatly reduce the shuffle traffic. (Correct me if I'm wrong.)



2013/9/25 Som Satpathy <[email protected]>

> Hi Chao,
>
> You could do a groupBy and then do a parallelDo to iterate over the key
> values to emit the top K values per key via Pair<K,V>.
>
> Som
>
>
> On Tue, Sep 24, 2013 at 7:59 PM, Chao Shi <[email protected]> wrote:
>
>> Hi guys,
>>
>> I need to have crunch generating a MR pipeline with a combiner and
>> reducer. My combiner and reducer have different logic. I wonder if this is
>> possible in crunch.
>>
>> The problem can be simplified as the following:
>>
>> Give a series of <string, int> pairs, output the largest K values per
>> key, and join them to a string. For example, suppose K=2, the output of
>> <"hello", 1>, <"hello", 2>, <"hello", 3>, <"world", 3> is <"hello", "2,
>> 3">, <"world", "3">.
>>
>> In raw MR, I would like to use a combiner to determine the locally
>> largest value per key.
>>
>> class MyCombiner extneds Reducer<Text, IntWritable, Text, intWritable> {
>>    void reduce(Text key, Iterable<IntWritable> values, Context context) {
>>     go over "values" and keep top K in memory
>>     emit top K
>>   }
>> }
>>
>> class MyReducer extends Reducer<Text, IntWritable, Text, Text> {
>>   void reduce(Text key, Iterable<IntWritable> values, Context context) {
>>     go over "values" and keep top K in memory, assuming saving to "int[]
>> top";
>>     context.write(key, join(top, ", "));
>>   }
>> }
>>
>> Could anyone give me a hint on how to do this in crunch? I see
>> PGroupedTable#combineValues, but I think it requires the reducer and
>> combiner has the same signature (generic types).
>>
>> Thanks,
>> Chao
>>
>
>

Re: Ability to specify a combiner (with different signature than reducer)

Reply via email to