Sorry for the poor experience and thanks for sharing a solution with others.
On Thu, Sep 5, 2019 at 6:34 AM Shannon Duncan <[email protected]> wrote: > FYI this was due to hadoop version. 3.2.0 was throwing this error, but > rolled back to version in googles pom.xml 2.7.4 and it is working fine now. > > Kindof annoying cause I wasted several hours jumping through hoops trying > to get 3.2.0 working :( > > On Wed, Sep 4, 2019 at 5:09 PM Shannon Duncan <[email protected]> > wrote: > >> I have successfully been using the sequence file source located here: >> >> >> https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java >> >> However recently we started to do block level compression with bzip2 on >> the SequenceFile. This is supported out of the box on the Hadoop side of >> things. >> >> However when reading in the files, while most records parse out just fine >> there are a handful of records that throw: >> >> #### >> Exception in thread "main" java.lang.IndexOutOfBoundsException: >> offs(1368) + len(1369) > dest.length(1467). >> at >> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398) >> #### >> >> I've gone in circles looking at this. It seems that the last record being >> read from the sequencefile in each thread is hitting this on the value >> retrieval (Key retrieves just fine, but value throws this error). >> >> Any clues as to what this could be? >> >> File is KV<Text, Text> aka >> "SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec" >> >> Any help is appreciated! >> >> - Shannon >> >
