When I log the calls of the combiner function and print the number of
elements iterated over, it is all 1 during the spill-writing phase and the
combiner is called very often. Is this normal behavior? According to what
mentioned earlier, I would expect the combiner to combine all records with
the s
OK, I found the answer to one of my questions just now -- the location of
the spill files and their sizes. So, there's a discrepancy between what I
see and what you said about the compression. The total size of all spill
files of a single task matches with what I estimate for them to be
*without* c
OK, just wanted to confirm. Maybe there is another problem then. I just
looked at the task logs and there were ~200 spills recorded for a single
task, only afterwards there was a merge phase. In my case, 200 spills are
about 2GB (uncompressed). One map output record easily fits into the
in-memory b
Yes we do compress each spill output using the same codec as specified
for map (intermediate) output compression. However, the counted bytes
may be counting decompressed values of the records written, and not
post-compressed ones.
On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
wrote:
> Hi gu