Out of curiosity : did you change the partitioner or the comparators? And how did you implement the equals and hash code methods of your objects
Regards Bertrand On Tue, Sep 25, 2012 at 3:32 PM, Björn-Elmar Macek <[email protected]>wrote: > Hi, > > i had this problem once too. Did you properly overwrite the reduce method > with the @override annotation? > Does your reduce method use OutputCollector or Context for gathering > outputs? If you are using current version, it has to be Context. > > The thing is: if you do NOT override the standart reduce function > (identity) is used and this results ofc in the same number of tuples as you > read as input. > > Good luck! > Elmar > > Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann < > [email protected]>: > > I think I have tracked down the problem to the point that each split only > contains one big key-value pair and a combiner is connected to a map task. > Please correct me if I'm wrong, but I assume each map task takes one split > and the combiner operates only on the key-value pairs within one split. > That's why the combiner has no effect in my case. Is there a way to combine > the mapper outputs of multiple splits before they are sent off to the > reducer? > > 2012/9/25 Sigurd Spieckermann <[email protected]> > >> Maybe one more note: the combiner and the reducer class are the same and >> in the reduce-phase the values get aggregated correctly. Why is this not >> happening in the combiner-phase? >> >> >> 2012/9/25 Sigurd Spieckermann <[email protected]> >> >>> Hi guys, >>> >>> I'm experiencing a strange behavior when I use the Hadoop join-package. >>> After running a job the result statistics show that my combiner has an >>> input of 100 records and an output of 100 records. From the task I'm >>> running and the way it's implemented, I know that each key appears multiple >>> times and the values should be combinable before getting passed to the >>> reducer. I'm running my tests in pseudo-distributed mode with one or two >>> map tasks. From using the debugger, I noticed that each key-value pair is >>> processed by a combiner individually so there's actually no list passed >>> into the combiner that it could aggregate. Can anyone think of a reason >>> that causes this undesired behavior? >>> >>> Thanks >>> Sigurd >>> >> >> > > -- Bertrand Dechoux
