Re: [Java] Compressed SequenceFile

Lukasz Cwik Thu, 05 Sep 2019 09:41:28 -0700

Sorry for the poor experience and thanks for sharing a solution with others.


On Thu, Sep 5, 2019 at 6:34 AM Shannon Duncan <[email protected]>
wrote:

> FYI this was due to hadoop version. 3.2.0 was throwing this error, but
> rolled back to version in googles pom.xml 2.7.4 and it is working fine now.
>
> Kindof annoying cause I wasted several hours jumping through hoops trying
> to get 3.2.0 working :(
>
> On Wed, Sep 4, 2019 at 5:09 PM Shannon Duncan <[email protected]>
> wrote:
>
>> I have successfully been using the sequence file source located here:
>>
>>
>> https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
>>
>> However recently we started to do block level compression with bzip2 on
>> the SequenceFile. This is supported out of the box on the Hadoop side of
>> things.
>>
>> However when reading in the files, while most records parse out just fine
>> there are a handful of records that throw:
>>
>> ####
>> Exception in thread "main" java.lang.IndexOutOfBoundsException:
>> offs(1368) + len(1369) > dest.length(1467).
>> at
>> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398)
>> ####
>>
>> I've gone in circles looking at this. It seems that the last record being
>> read from the sequencefile in each thread is hitting this on the value
>> retrieval (Key retrieves just fine, but value throws this error).
>>
>> Any clues as to what this could be?
>>
>> File is KV<Text, Text> aka
>> "SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec"
>>
>> Any help is appreciated!
>>
>> - Shannon
>>
>

Re: [Java] Compressed SequenceFile

Reply via email to