> Query is a simple group-by on top of sequence table. ... > java.io.IOException: java.io.IOException: wrong key class: >org.apache.hadoop.io.BytesWritable is not class >org.apache.hadoop.io.NullWritable
I have seen this issue when mixing Sequence files written by PIG with Sequence files written by Hive - primarily because the data ingestion wasn¹t done properly via HCatalog writers. Last report, the first sequence file had as its header M?.io.LongWritable"org.apache.hadoop.io.BytesWritable)org.apache.hadoop.io. compress.SnappyCodec?? and the second one had SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text)org.apache.h adoop.io.compress.SnappyCodec? You can cross-check the exception trace and make sure that the exception is coming from the RecordReader as the k-v pairs change types between files. Primarily this doesn¹t happen in Hive-mr at the small scale, but it happens for both MR and Tez. To hit this via CombineInputFormat, you need a file which has been split up between machines and two such files to generate a combined split of mismatched schema. Tez is more aggressive at splitting, since it relies on the file format splits, not HDFS locations. If you confirm that this is indeed the cause of the issue, I might have an idea how to fix it. Cheers, Gopal