I think I have tracked down the problem to the point that each split only contains one big key-value pair and a combiner is connected to a map task. Please correct me if I'm wrong, but I assume each map task takes one split and the combiner operates only on the key-value pairs within one split. That's why the combiner has no effect in my case. Is there a way to combine the mapper outputs of multiple splits before they are sent off to the reducer?
2012/9/25 Sigurd Spieckermann <[email protected]> > Maybe one more note: the combiner and the reducer class are the same and > in the reduce-phase the values get aggregated correctly. Why is this not > happening in the combiner-phase? > > > 2012/9/25 Sigurd Spieckermann <[email protected]> > >> Hi guys, >> >> I'm experiencing a strange behavior when I use the Hadoop join-package. >> After running a job the result statistics show that my combiner has an >> input of 100 records and an output of 100 records. From the task I'm >> running and the way it's implemented, I know that each key appears multiple >> times and the values should be combinable before getting passed to the >> reducer. I'm running my tests in pseudo-distributed mode with one or two >> map tasks. From using the debugger, I noticed that each key-value pair is >> processed by a combiner individually so there's actually no list passed >> into the combiner that it could aggregate. Can anyone think of a reason >> that causes this undesired behavior? >> >> Thanks >> Sigurd >> > >
