Hi All, I have implemented CombineInputFormat for my job and it works well for small files i.e. combine those to the block boundary. But there are few very large file that it gets from the input source along with small files. Hence the mapper that got to work on this large file becomes a laggard.
I had overwritten isSplitable to return false. I guess that was the reason and hence I removed this overriding (i.e. allow hadoop to have default behaviour on this). Hadoop splits the big files now, fine but then I see inconsistency with the output records. Is there anything related with my CustomRecordReader that I need to take care of. Not sure. Please advise! Thanks.
