I have successfully been using the sequence file source located here: https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
However recently we started to do block level compression with bzip2 on the SequenceFile. This is supported out of the box on the Hadoop side of things. However when reading in the files, while most records parse out just fine there are a handful of records that throw: #### Exception in thread "main" java.lang.IndexOutOfBoundsException: offs(1368) + len(1369) > dest.length(1467). at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398) #### I've gone in circles looking at this. It seems that the last record being read from the sequencefile in each thread is hitting this on the value retrieval (Key retrieves just fine, but value throws this error). Any clues as to what this could be? File is KV<Text, Text> aka "SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec" Any help is appreciated! - Shannon
