Strangely enough another version of my reader works https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad The difference is that i have to re-open the file again when i read a new column. The reopening happens through the following line: ParquetFileReader fileReader = new ParquetFileReader(conf, filePath, blocks, schema.getColumns());
which i am calling in a loop where i am looping over column descriptors. ~Pratik On Thu, Aug 28, 2014 at 11:49 AM, pratik khadloya <[email protected]> wrote: > This issue only occurs for some columns and that too after reading a few > thousand records. > > ~Pratik > > > On Thu, Aug 28, 2014 at 11:48 AM, pratik khadloya <[email protected]> > wrote: > >> Hello, >> >> I am facing the following exception when reading a parquet file exported >> by sqoop. >> My parquet column reader code is at >> https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf >> >> Exception in thread "main" parquet.io.ParquetDecodingException: Can't >> read value in column [description] BINARY at value 44899 out of 57096, >> 44899 out of 57096 in currentPage. repetition level: 0, definition level: 1 >> at >> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450) >> at >> parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:398) >> at >> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.load(RfiParquetFileReader.java:147) >> at >> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.<init>(RfiParquetFileReader.java:87) >> at >> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.main(RfiParquetFileReader.java:114) >> Caused by: java.lang.IllegalArgumentException: Reading past >> RLE/BitPacking stream. >> at parquet.Preconditions.checkArgument(Preconditions.java:47) >> at >> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) >> at >> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) >> at >> parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:82) >> at >> parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:295) >> at >> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:446) >> ... 4 more >> >> >> Does anyone know what this could be related to? What i could be doing >> wrong? >> >> >> Thanks, >> ~Pratik >> > >
