CombineInputFormat for mix of small and large files.

Ravindra Fri, 24 Feb 2017 08:00:00 -0800

Hi All,

I have implemented CombineInputFormat for my job and it works well for
small files i.e. combine those to the block boundary. But there are few
very large file that it gets from the input source along with small files.
Hence the mapper that got to work on this large file becomes a laggard.


I had overwritten isSplitable to return false. I guess that was the reason
and hence I removed this overriding (i.e. allow hadoop to have default
behaviour on this). Hadoop splits the big files now, fine but then I see
inconsistency with the output records.

Is there anything related with my CustomRecordReader that I need to take
care of. Not sure.

Please advise!

Thanks.

CombineInputFormat for mix of small and large files.

Reply via email to