Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
When I log the calls of the combiner function and print the number of elements iterated over, it is all 1 during the spill-writing phase and the combiner is called very often. Is this normal behavior? According to what mentioned earlier, I would expect the combiner to combine all records with the s

Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
OK, I found the answer to one of my questions just now -- the location of the spill files and their sizes. So, there's a discrepancy between what I see and what you said about the compression. The total size of all spill files of a single task matches with what I estimate for them to be *without* c

Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
OK, just wanted to confirm. Maybe there is another problem then. I just looked at the task logs and there were ~200 spills recorded for a single task, only afterwards there was a merge phase. In my case, 200 spills are about 2GB (uncompressed). One map output record easily fits into the in-memory b

Re: Spill file compression

2012-11-07 Thread Harsh J
Yes we do compress each spill output using the same codec as specified for map (intermediate) output compression. However, the counted bytes may be counting decompressed values of the records written, and not post-compressed ones. On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann wrote: > Hi gu