[Java] Compressed SequenceFile

Shannon Duncan Wed, 04 Sep 2019 15:42:55 -0700

I have successfully been using the sequence file source located here:

https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java


However recently we started to do block level compression with bzip2 on the
SequenceFile. This is supported out of the box on the Hadoop side of things.

However when reading in the files, while most records parse out just fine
there are a handful of records that throw:

####
Exception in thread "main" java.lang.IndexOutOfBoundsException: offs(1368)
+ len(1369) > dest.length(1467).
at
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398)
####

I've gone in circles looking at this. It seems that the last record being
read from the sequencefile in each thread is hitting this on the value
retrieval (Key retrieves just fine, but value throws this error).

Any clues as to what this could be?

File is KV<Text, Text> aka
"SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec"

Any help is appreciated!

- Shannon

[Java] Compressed SequenceFile

Reply via email to