Re: Join-package combiner number of input and output records the same

Sigurd Spieckermann Tue, 25 Sep 2012 02:58:30 -0700

I think I have tracked down the problem to the point that each split only
contains one big key-value pair and a combiner is connected to a map task.
Please correct me if I'm wrong, but I assume each map task takes one split
and the combiner operates only on the key-value pairs within one split.
That's why the combiner has no effect in my case. Is there a way to combine
the mapper outputs of multiple splits before they are sent off to the
reducer?


2012/9/25 Sigurd Spieckermann <[email protected]>

> Maybe one more note: the combiner and the reducer class are the same and
> in the reduce-phase the values get aggregated correctly. Why is this not
> happening in the combiner-phase?
>
>
> 2012/9/25 Sigurd Spieckermann <[email protected]>
>
>> Hi guys,
>>
>> I'm experiencing a strange behavior when I use the Hadoop join-package.
>> After running a job the result statistics show that my combiner has an
>> input of 100 records and an output of 100 records. From the task I'm
>> running and the way it's implemented, I know that each key appears multiple
>> times and the values should be combinable before getting passed to the
>> reducer. I'm running my tests in pseudo-distributed mode with one or two
>> map tasks. From using the debugger, I noticed that each key-value pair is
>> processed by a combiner individually so there's actually no list passed
>> into the combiner that it could aggregate. Can anyone think of a reason
>> that causes this undesired behavior?
>>
>> Thanks
>> Sigurd
>>
>
>

Re: Join-package combiner number of input and output records the same

Reply via email to