Is there a reason CombineInputFormat isn't working for small files unless the hive.hadoop.supports.splittable.combineinputformat is set to true?
Additionally, when using this with enough lzo files, we run into errors of the form: 2013-08-02 15:02:43,553 WARN com.hadoop.compression.lzo.LzopInputStream: IOException in getCompressedData; likely LZO corruption. java.io.IOException: Compressed length 1648850803 exceeds max block size 67108864 (probably corrupt file) at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:286) at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:256) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83) at java.io.InputStream.read(InputStream.java:85) ..... Thanks.