Re: splits and maps

2012-09-19 Thread Harsh J
Tim, Its pretty interesting to read, I once dug in for another user around here. Check out this archive post: http://search-hadoop.com/m/cRmJ3gTtN32 - Make sure to also read the LineReader sources (a layer under the LineRecordReader explained above), where we also can see the beyond-block-boundary

Re: splits and maps

2012-09-19 Thread Tim Robertson
Thanks for the explanation HJ - I always meant to look into that bit of code to work out how it did it. Tim On Wed, Sep 19, 2012 at 6:24 PM, Harsh J wrote: > Hi Tim, > > Splits don't look at newlines in the TextInputFormat at least. So > since the computed splits > default map numbers, I thin

Re: splits and maps

2012-09-19 Thread Harsh J
Hi Tim, Splits don't look at newlines in the TextInputFormat at least. So since the computed splits > default map numbers, I think a perfect file of 10 blocks will spawn only 10 mappers. The mapper's record reader is the one that reads until a newline (even after the end of its block length bytes)

Re: splits and maps

2012-09-19 Thread Tim Robertson
I think the splitting recognises the end of line, so you might get 11 but otherwise that looks correct. On Wed, Sep 19, 2012 at 5:42 PM, Pedro Sá da Costa wrote: > > > If I've an input file of 640MB in size, and a split size of 64Mb, this > file will be partitioned in 10 splits, and each split