Hello,

I have a parquet 2.0 file which contains serialised avro records. Records avro 
schema is plain but contains a couple of optional string fields:

{
    "namespace" : “proto.avro.v1",
    "type" : "record",
    "name" : “FactEntity",
    "fields" : [
        {"name" : “sensorName", "type" : "string"},
        {"name" : “sensorDesc", "type" : "string”},
        {"name" : "firstDeployed", "type" : "long"},
        {"name" : "lastRenewed", "type" : "long"},
        {"name" : “errMsg", "type" : ["null", "string"]},
        {"name" : “errDetails", "type" : ["null", "string"]}
    ]
}

When I try to query entities in this file with

SELECT 
t1.sensorName, 
t1.sensorDesc, 
t1.lastRenewed, 
t1.errMsg
FROM dfs.`/path/to/file` t1
LIMIT 10;

I get this error:

2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG 
o.a.d.e.w.fragment.FragmentExecutor - Starting fragment 0:0 on xxx:31010
2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG 
o.a.d.e.s.p.DrillParquetReader - Requesting schema message 
proto.avro.v1.FactEntity {
  required binary sensorName (UTF8);
  required binary sensorDesc (UTF8);
  required int64 firstDeployed;
  required int64 lastRenewed;
  optional binary errMsg (UTF8);
  optional binary errDetails (UTF8);
}

2019-03-07 12:07:30,615 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO  
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in drill 
parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null (Error in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null)
org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Error 
in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null


Please, refer to logs for more information.

[Error Id: 2b5a06a0-fa8e-497b-848d-01aae15874ee ]
        at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) 
[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:101)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at java.security.AccessController.doPrivileged(Native Method) 
[na:1.8.0_161]
        at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 [hadoop-common-2.7.4.jar:na]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.15.0.jar:1.15.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null
        at 
org.apache.drill.exec.store.parquet2.DrillParquetReader.handleAndRaise(DrillParquetReader.java:273)
 ~[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:265)
 ~[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:321)
 [drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:216) 
[drill-java-exec-1.15.0.jar:1.15.0]
        at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271) 
[drill-java-exec-1.15.0.jar:1.15.0]
        ... 27 common frames omitted
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error 
reading page.
File path: /filepath/xxx
Row count: 3730439
Column Chunk Metadata: ColumnMetaData{GZIP [errMsg] optional binary errMsg 
(UTF8)  [DELTA_BYTE_ARRAY], 16876631}
Page Header: PageHeader(type:DATA_PAGE_V2, uncompressed_page_size:15, 
compressed_page_size:32, 
data_page_header_v2:DataPageHeaderV2(num_values:3730439, num_nulls:3730439, 
num_rows:3730439, encoding:DELTA_BYTE_ARRAY, definition_levels_byte_length:5, 
repetition_levels_byte_length:0, statistics:Statistics(null_count:3730439)))
File offset: 16876631
Size: 69
Value read so far: 3730439
        at 
org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:226)
 ~[drill-java-exec-1.15.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:525)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:638)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:353)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) 
~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) 
~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
 ~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) 
~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:80) 
~[parquet-column-1.10.0.jar:1.10.0]
        at 
org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:262)
 ~[drill-java-exec-1.15.0.jar:1.15.0]
        ... 30 common frames omitted
Caused by: java.io.IOException: not a gzip file
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496)
 ~[hadoop-common-2.7.4.jar:na]
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257)
 ~[hadoop-common-2.7.4.jar:na]
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186)
 ~[hadoop-common-2.7.4.jar:na]
        at 
org.apache.parquet.hadoop.DirectCodecFactory$IndirectDecompressor.decompress(DirectCodecFactory.java:162)
 ~[parquet-hadoop-1.10.0.jar:1.10.0]
        at 
org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:188)
 ~[drill-java-exec-1.15.0.jar:1.10.0]
        ... 43 common frames omitted
2019-03-07 12:07:30,616 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 237f20ac-b634-5300-06f5-6c731a8a97f2:0:0: 
State change requested RUNNING --> FAILED 


I’m running queries via sqlline with session parameter "set 
`store.parquet.use_new_reader` = true;” (otherwise it fails even without 
optional binary columns included).

Is there some workaround for this problem?

Thanks,
Denis

Reply via email to