[jira] [Resolved] (DRILL-7276) xss(bug) in apache drill Web UI latest verion 1.16.0 when authenticated
[ https://issues.apache.org/jira/browse/DRILL-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-7276. --- Resolution: Fixed > xss(bug) in apache drill Web UI latest verion 1.16.0 when authenticated > > > Key: DRILL-7276 > URL: https://issues.apache.org/jira/browse/DRILL-7276 > Project: Apache Drill > Issue Type: Bug > Components: Web Server >Affects Versions: 1.16.0 >Reporter: shuiboye >Assignee: Anton Gozhiy >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > Attachments: 1.png, 2.png, 4.png > > > In the query page,I select the "SQL" of the "Query Type" and in the "Query" > field I input "*select '' FROM cp.`employee.json`*". > !1.png! > After submitting,I get the Query Profile whose url is > "*[http://127.0.0.1:8047/profiles/231beb11-4b43-0762-8b90-76a9af2edd24]*";. > !2.png! > Any user who visits the profile page and clicks "JSON profile" at the bottom > to see the FULL JSON Profile will see two alert boxes as shown below. > !4.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero
[ https://issues.apache.org/jira/browse/DRILL-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-7130: -- Reviewer: Timothy Farkas > IllegalStateException: Read batch count [0] should be greater than zero > --- > > Key: DRILL-7130 > URL: https://issues.apache.org/jira/browse/DRILL-7130 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.15.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.17.0 > > > The following exception is being hit when reading parquet data: > Caused by: java.lang.IllegalStateException: Read batch count [0] should be > greater than zero at > org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509) > ~[drill-shaded-guava-23.0.jar:23.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624) > ~[vector-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716) > ~[vector-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288) > ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero
salim achouche created DRILL-7130: - Summary: IllegalStateException: Read batch count [0] should be greater than zero Key: DRILL-7130 URL: https://issues.apache.org/jira/browse/DRILL-7130 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: salim achouche Assignee: salim achouche Fix For: 1.17.0 The following exception is being hit when reading parquet data: Caused by: java.lang.IllegalStateException: Read batch count [0] should be greater than zero at org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509) ~[drill-shaded-guava-23.0.jar:23.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624) ~[vector-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716) ~[vector-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288) ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7100) parquet RecordBatchSizerManager : IllegalArgumentException: the requested size must be non-negative
[ https://issues.apache.org/jira/browse/DRILL-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-7100: -- Reviewer: Timothy Farkas > parquet RecordBatchSizerManager : IllegalArgumentException: the requested > size must be non-negative > --- > > Key: DRILL-7100 > URL: https://issues.apache.org/jira/browse/DRILL-7100 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.15.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Major > > Table has string columns that can range from 1024 bytes to 32MB in length, we > should be able to handle such wide string columns in parquet, when querying > from Drill. > Hive Version 2.3.3 > Drill Version 1.15 > {noformat} > CREATE TABLE temp.cust_bhsf_ce_blob_parquet ( > event_id DECIMAL, > valid_until_dt_tm string, > blob_seq_num DECIMAL, > valid_from_dt_tm string, > blob_length DECIMAL, > compression_cd DECIMAL, > blob_contents string, > updt_dt_tm string, > updt_id DECIMAL, > updt_task DECIMAL, > updt_cnt DECIMAL, > updt_applctx DECIMAL, > last_utc_ts string, > ccl_load_dt_tm string, > ccl_updt_dt_tm string ) > STORED AS PARQUET; > {noformat} > > The source table is stored as ORC format. > Failing query. > {noformat} > SELECT event_id, BLOB_CONTENTS FROM hive.temp.cust_bhsf_ce_blob_parquet WHERE > event_id = 3443236037 > 2019-03-07 14:40:17,886 [237e8c79-0e9b-45d6-9134-0da95dba462f:frag:1:269] > INFO o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: the requested > size must be non-negative (the requested size must be non-negative) > org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: the > requested size must be non-negative > {noformat} > Snippet from drillbit.log file > {noformat} > [Error Id: 41a4d597-f54d-42a6-be6d-5dbeb7f642ba ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181] > at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > [hadoop-common-2.7.0-mapr-1808.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_181] > a
[jira] [Closed] (DRILL-7101) IllegalArgumentException when reading parquet data
[ https://issues.apache.org/jira/browse/DRILL-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche closed DRILL-7101. - Resolution: Duplicate > IllegalArgumentException when reading parquet data > -- > > Key: DRILL-7101 > URL: https://issues.apache.org/jira/browse/DRILL-7101 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.15.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.16.0 > > > The Parquet reader fails with the below stack trace: > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181] at > javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181] at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > [hadoop-common-2.7.0-mapr-1808.jar:na] at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284) > [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_181] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] > Caused by: java.lang.IllegalArgumentException: the requested size must be > non-negative at > org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) > ~[drill-shaded-guava-23.0.jar:23.0] at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:224) > ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:211) > ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:394) > ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:250) > ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:41) > ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:54) > ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at > org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.allocate(RecordBatchSizerManager.java:165) > ~[drill-
[jira] [Created] (DRILL-7101) IllegalArgumentException when reading parquet data
salim achouche created DRILL-7101: - Summary: IllegalArgumentException when reading parquet data Key: DRILL-7101 URL: https://issues.apache.org/jira/browse/DRILL-7101 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.15.0 Reporter: salim achouche Assignee: salim achouche Fix For: 1.16.0 The Parquet reader fails with the below stack trace: at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) [hadoop-common-2.7.0-mapr-1808.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] Caused by: java.lang.IllegalArgumentException: the requested size must be non-negative at org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) ~[drill-shaded-guava-23.0.jar:23.0] at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:224) ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:211) ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:394) ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:250) ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:41) ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:54) ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.allocate(RecordBatchSizerManager.java:165) ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.allocate(ParquetRecordReader.java:276) ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:221) [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at or
[jira] [Updated] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37
[ https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-7018: -- Reviewer: Vitalii Diravka (was: Boaz Ben-Zvi) > Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet > File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: > 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= > capacity(256)) > > > Key: DRILL-7018 > URL: https://issues.apache.org/jira/browse/DRILL-7018 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > alter system set `store.parquet.reader.int96_as_timestamp`= true > run query witch projects a column of type Parquet INT96 timestamp with 31 > nulls > The following exception will be thrown: > java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 > (expected: 0 <= readerIndex <= writerIndex <= capacity(256)) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37
salim achouche created DRILL-7018: - Summary: Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= capacity(256)) Key: DRILL-7018 URL: https://issues.apache.org/jira/browse/DRILL-7018 Project: Apache Drill Issue Type: Improvement Reporter: salim achouche -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 3
[ https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-7018: - Assignee: salim achouche Affects Version/s: 1.14.0 Remaining Estimate: 24h Original Estimate: 24h Fix Version/s: 1.16.0 Description: alter system set `store.parquet.reader.int96_as_timestamp`= true run query witch projects a column of type Parquet INT96 timestamp with 31 nulls The following exception will be thrown: java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= capacity(256)) Component/s: Storage - Parquet > Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet > File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: > 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= > capacity(256)) > > > Key: DRILL-7018 > URL: https://issues.apache.org/jira/browse/DRILL-7018 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.16.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > alter system set `store.parquet.reader.int96_as_timestamp`= true > run query witch projects a column of type Parquet INT96 timestamp with 31 > nulls > The following exception will be thrown: > java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 > (expected: 0 <= readerIndex <= writerIndex <= capacity(256)) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch
[ https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6853: -- Labels: pull-request-available (was: ) > Parquet Complex Reader for nested schema should have configurable memory or > max records to fetch > > > Key: DRILL-6853 > URL: https://issues.apache.org/jira/browse/DRILL-6853 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Nitin Sharma >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > Parquet Complex reader while fetching nested schema should have configurable > memory or max records to fetch and not default to 4000 records. > While scanning TB of data with wider columns, this could easily cause OOM > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch
[ https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6853: -- Reviewer: Timothy Farkas Fix Version/s: 1.15.0 > Parquet Complex Reader for nested schema should have configurable memory or > max records to fetch > > > Key: DRILL-6853 > URL: https://issues.apache.org/jira/browse/DRILL-6853 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Nitin Sharma >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > Parquet Complex reader while fetching nested schema should have configurable > memory or max records to fetch and not default to 4000 records. > While scanning TB of data with wider columns, this could easily cause OOM > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch
[ https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6853: - Assignee: salim achouche > Parquet Complex Reader for nested schema should have configurable memory or > max records to fetch > > > Key: DRILL-6853 > URL: https://issues.apache.org/jira/browse/DRILL-6853 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Nitin Sharma >Assignee: salim achouche >Priority: Major > > Parquet Complex reader while fetching nested schema should have configurable > memory or max records to fetch and not default to 4000 records. > While scanning TB of data with wider columns, this could easily cause OOM > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6410) Memory leak in Parquet Reader during cancellation
[ https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-6410. --- Resolution: Fixed Reviewer: Timothy Farkas (was: Parth Chandra) > Memory leak in Parquet Reader during cancellation > - > > Key: DRILL-6410 > URL: https://issues.apache.org/jira/browse/DRILL-6410 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > Occasionally, a memory leak is observed within the flat Parquet reader when > query cancellation is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6246) Build Failing in jdbc-all artifact
[ https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6246: -- Labels: pull-request-available (was: ) > Build Failing in jdbc-all artifact > -- > > Key: DRILL-6246 > URL: https://issues.apache.org/jira/browse/DRILL-6246 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.13.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > * {color:#00}It was noticed that the build was failing because of the > jdbc-all artifact{color} > * {color:#00}The maximum compressed jar size was set to 32MB but we are > currently creating a JAR a bit larger than 32MB {color} > * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and > drill-1.13.0 (on my MacOS){color} > * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color} > * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color} > * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is > roughly 32MB){color} > * {color:#00}Compared then in more details jdbc-all-1.12.0 and > jdbc-all-1.13.0{color} > * {color:#00}The bulk of the increase is attributed to the calcite > artifact{color} > * {color:#00}Used to be 2MB (uncompressed) and now 22MB > (uncompressed){color} > * {color:#00}It is likely an exclusion problem {color} > * {color:#00}The jdbc-all-1.12.0 version has only two top packages > calcite/avatica/utils and calcite/avatica/remote{color} > * {color:#00}The jdbc-all-1.13.0 includes new packages (within > calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color} > {color:#00} {color} > {color:#00}I am planning to exclude these new sub-packages{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (DRILL-6246) Build Failing in jdbc-all artifact
[ https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reopened DRILL-6246: --- Re-opening this issue as PR #1168 has been successfully tested and thus should provide a more optimal solution. > Build Failing in jdbc-all artifact > -- > > Key: DRILL-6246 > URL: https://issues.apache.org/jira/browse/DRILL-6246 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.13.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > > * {color:#00}It was noticed that the build was failing because of the > jdbc-all artifact{color} > * {color:#00}The maximum compressed jar size was set to 32MB but we are > currently creating a JAR a bit larger than 32MB {color} > * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and > drill-1.13.0 (on my MacOS){color} > * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color} > * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color} > * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is > roughly 32MB){color} > * {color:#00}Compared then in more details jdbc-all-1.12.0 and > jdbc-all-1.13.0{color} > * {color:#00}The bulk of the increase is attributed to the calcite > artifact{color} > * {color:#00}Used to be 2MB (uncompressed) and now 22MB > (uncompressed){color} > * {color:#00}It is likely an exclusion problem {color} > * {color:#00}The jdbc-all-1.12.0 version has only two top packages > calcite/avatica/utils and calcite/avatica/remote{color} > * {color:#00}The jdbc-all-1.13.0 includes new packages (within > calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color} > {color:#00} {color} > {color:#00}I am planning to exclude these new sub-packages{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6706: -- Labels: pull-request-available (was: ) > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Labels: pull-request-available > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, > 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348 > 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], >
[jira] [Updated] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6706: -- Reviewer: Timothy Farkas > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Labels: pull-request-available > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, > 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348 > 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], > groupscan=[Parq
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592997#comment-16592997 ] salim achouche commented on DRILL-6706: --- Got this code from Aggregator. > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, > 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348 > 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], > g
[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592807#comment-16592807 ] salim achouche edited comment on DRILL-6706 at 8/26/18 5:28 AM: Ran all functional and advanced tests successfully; though there was one test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". Debugged the issue and found an intriguing use-case: * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE XYZ = 'a'; ** This query doesn't fail ** The code generation doesn't find it so it then generates default code which assumes that both right and left values are strings * When I fixed the ParquetSchema to insert the correct column name, then ** The code generator knows the column type to be an INT ** It then tried to cast the constant 'a' to an integer which throws an exception This somehow collaborates [~vvysotskyi] comment which indicated he didn't want the column to be found. If this is the case, I feel there is a better fix which is to add a new indicator in the MetadataField to indicate this condition. This can give an opportunity to operators to better handle such cases. [~timothyfarkas] and [~vvysotskyi] what do you guys think? was (Author: sachouche): Ran all functional and advanced tests successfully; though there was one test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". Debugged the issue and found an intriguing use-case: * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE XYZ = 'a'; ** This query doesn't fail ** The code generation doesn't find it so it then generates default code which assumes that both right and left values are strings * When I fixed the ParquetSchema to insert the correct column name, then ** The code generator knows the column type to be an INT ** It then tried to cast the constant 'a' to an integer which throws an exception This somehow collaborates [~vvysotskyi] comment which indicated he didn't want the column to be found. If this is the case, I feel there is a better fix which is to add a new indicator in the MetadataField to indicate this condition. This can give an opportunity to operators to better handle. [~timothyfarkas] and [~vvysotskyi] what do you guys think? > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4]
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592807#comment-16592807 ] salim achouche commented on DRILL-6706: --- Ran all functional and advanced tests successfully; though there was one test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". Debugged the issue and found an intriguing use-case: * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE XYZ = 'a'; ** This query doesn't fail ** The code generation doesn't find it so it then generates default code which assumes that both right and left values are strings * When I fixed the ParquetSchema to insert the correct column name, then ** The code generator knows the column type to be an INT ** It then tried to cast the constant 'a' to an integer which throws an exception This somehow collaborates [~vvysotskyi] comment which indicated he didn't want the column to be found. If this is the case, I feel there is a better fix which is to add a new indicator in the MetadataField to indicate this condition. This can give an opportunity to operators to better handle. [~timothyfarkas] and [~vvysotskyi] what do you guys think? > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.55
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592741#comment-16592741 ] salim achouche commented on DRILL-6706: --- *Findings after Code Inspection -* * Looking at the code, having back-ticks within SchemaPath is not necessary (this is my understanding) ** Back-ticks are useful in the context of a compound name such as T.`column.with.a.dot`.another_column ** As soon as *individual parts are parsed* then the back-ticks can be omitted * Then why Aggregator was able to handle columns having back-ticks (and whatever I could throw at it :)) ** I found logic to strip the extra back-ticks within the aggregator code *Suggested Fix -* * Stripping the back-tick from the MaterializedField names within Parquet seems to fix the HashJoin issue * The Aggregator didn't seem bothered by this change either * I am currently running the test-suite and the Apache pre-tests * If they pass, I'll push this PR for review > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY,
[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592731#comment-16592731 ] salim achouche edited comment on DRILL-6706 at 8/25/18 11:24 PM: - I say it is as-designed because of the following points: * This comments seems to imply that we want to treat not found columns differently: _*{color:#ff}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}*_ * {color:#33}The rest of the code seems to work just fine (including sqlline); I have a hard time to believe that such an obvious bug would not be found{color} [~timothyfarkas], I could be wrong but this is what happens when the code doesn't have adequate documentation; for example, I looked at the toExpr() method and couldn't find any useful documentation. Now, we are left to guess what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should inquiry with him about whether leaving the backtick is a bug or as-designed. was (Author: sachouche): I say it is as-designed because of the following points: * This comments seems to imply that we want to treat not found columns differently: _*{color:#ff}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}*_ * {color:#33}The rest of the code seems to work just find (including sqlline); I have a hard time to believe that such an obvious bug would not be found{color} [~timothyfarkas], I could be wrong but this is what happens when the code doesn't have adequate documentation; for example, I looked at the toExpr() method and couldn't find any useful documentation. Now, we are left to guess what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should inquiry with him about whether leaving the backtick is a bug or as-designed. > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGION
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592731#comment-16592731 ] salim achouche commented on DRILL-6706: --- I say it is as-designed because of the following points: * This comments seems to imply that we want to treat not found columns differently: _*{color:#ff}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}*_ * {color:#33}The rest of the code seems to work just find (including sqlline); I have a hard time to believe that such an obvious bug would not be found{color} [~timothyfarkas], I could be wrong but this is what happens when the code doesn't have adequate documentation; for example, I looked at the toExpr() method and couldn't find any useful documentation. Now, we are left to guess what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should inquiry with him about whether leaving the backtick is a bug or as-designed. > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592707#comment-16592707 ] salim achouche commented on DRILL-6706: --- It seems that [~timothyfarkas] is not available; I'll take ownership of this JIRA. I am looking at the downstream operators and their ability to cope with columns with `` syntax; will try to mimic such logic within the BatchSizer (preferably) or the HashJoin code. > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: Timothy Farkas >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, AN
[jira] [Assigned] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6706: - Assignee: salim achouche (was: Timothy Farkas) > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, > 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348 > 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], > groupscan=[ParquetGroupScan [en
[jira] [Assigned] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6706: - Assignee: Timothy Farkas (was: salim achouche) > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: Timothy Farkas >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, > 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348 > 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], > groupscan=[ParquetGroupScan [en
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592314#comment-16592314 ] salim achouche commented on DRILL-6706: --- Yes, this is as-designed: * The change was done by commit id: d105950a7a9fb2ff3acd072ee65a51ef1fca120e * JIRA: [DRILL-4264: Allow field names to include dots|https://github.com/apache/drill/commit/d105950a7a9fb2ff3acd072ee65a51ef1fca120e#diff-cdcf7a999bb6a806125da3fa1d4a78b2] > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY > N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, > cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, > 1.74592E11 network, 4.8577056E7 memory}, id = 515363 > 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): > rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, > 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353 > 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, A
[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592303#comment-16592303 ] salim achouche edited comment on DRILL-6706 at 8/24/18 11:16 PM: - * This condition of having columns with back-tick occurs when a selected column is missing * I located the code which does that and it seems as though it is on purpose (?) * There are also tests which look for this kind of behavior (TestExternalSortExec) * I also ran queries with missing columns and they worked fine: ** SELECT count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000 ** SELECT P.XYZ from dfs.`.../part.*`* P *//* SQLLINE is able to print the correct column name Tim, it seems the code expects such behavior. Can you for now just ignore missing columns as they will have only nulls? private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator output) throws SchemaChangeException { *{color:#ff}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}* MaterializedField field = MaterializedField.create({color:#ff}*col.toExpr()*{color}, Types.optional(TypeProtos.MinorType.INT)); return (NullableIntVector) output.addField(field, TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL)); } was (Author: sachouche): * This condition of having columns with back-tick occurs when a selected column is missing * I located the code which does that and it seems as though it is on purpose (?) * There are also tests which look for this kind of behavior (TestExternalSortExec) * I also ran queries with missing columns and they worked fine: ** SELECT count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000 ** SELECT P.XYZ from dfs.`.../part.*` P /* SQLLINE is able to print the correct column name */ Tim, it seems the code expects such behavior. Can you for now just ignore missing columns as they will have only nulls? private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator output) throws SchemaChangeException { *{color:#FF}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}* MaterializedField field = MaterializedField.create({color:#FF}*col.toExpr()*{color}, Types.optional(TypeProtos.MinorType.INT)); return (NullableIntVector) output.addField(field, TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL)); } > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02
[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592303#comment-16592303 ] salim achouche commented on DRILL-6706: --- * This condition of having columns with back-tick occurs when a selected column is missing * I located the code which does that and it seems as though it is on purpose (?) * There are also tests which look for this kind of behavior (TestExternalSortExec) * I also ran queries with missing columns and they worked fine: ** SELECT count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000 ** SELECT P.XYZ from dfs.`.../part.*` P /* SQLLINE is able to print the correct column name */ Tim, it seems the code expects such behavior. Can you for now just ignore missing columns as they will have only nulls? private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator output) throws SchemaChangeException { *{color:#FF}// col.toExpr() is used here as field name since we don't want to see these fields in the existing maps{color}* MaterializedField field = MaterializedField.create({color:#FF}*col.toExpr()*{color}, Types.optional(TypeProtos.MinorType.INT)); return (NullableIntVector) output.addField(field, TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL)); } > Query with 10-way hash join fails with NullPointerException > --- > > Key: DRILL-6706 > URL: https://issues.apache.org/jira/browse/DRILL-6706 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.15.0 >Reporter: Abhishek Girish >Assignee: salim achouche >Priority: Critical > Attachments: drillbit.log.zip > > > {code} > SELECT C.C_CUSTKEY AS C_CUSTKEY > FROM si.tpch_sf1_parquet.customer C, > si.tpch_sf1_parquet.orders O, > si.tpch_sf1_parquet.lineitem L, > si.tpch_sf1_parquet.part P, > si.tpch_sf1_parquet.supplier S, > si.tpch_sf1_parquet.partsupp PS, > si.tpch_sf1_parquet.nation S_N, > si.tpch_sf1_parquet.region S_R, > si.tpch_sf1_parquet.nation C_N, > si.tpch_sf1_parquet.region C_R > WHEREC.C_CUSTKEY = O.O_CUSTKEY > AND O.O_ORDERKEY = L.L_ORDERKEY > AND L.L_PARTKEY = P.P_PARTKEY > AND L.L_SUPPKEY = S.S_SUPPKEY > AND P.P_PARTKEY = PS.PS_PARTKEY > AND P.P_SUPPKEY = PS.PS_SUPPKEY > AND S.S_NATIONKEY = S_N.N_NATIONKEY > AND S_N.N_REGIONKEY = S_R.R_REGIONKEY > AND C.C_NATIONKEY = C_N.N_NATIONKEY > AND C_N.N_REGIONKEY = C_R.R_REGIONKEY > {code} > Plan > {code} > 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, > cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, > 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368 > 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, > 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367 > 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = > 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 > io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366 > 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): > rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, > 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365 > 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], > O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], > P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], > PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], > R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) > : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY > O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY > P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, > ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY > N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = > {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, > 4.8577056E7 memory}, id = 515364 > 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) : > rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY > S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY > R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, > ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, A
[jira] [Updated] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6709: -- Labels: pull-request-available (was: ) > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators
[ https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6709: -- Reviewer: Timothy Farkas > Batch statistics logging utility needs to be extended to mid-stream operators > - > > Key: DRILL-6709 > URL: https://issues.apache.org/jira/browse/DRILL-6709 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > A new batch logging utility has been created to log batch sizing messages to > drillbit.log. It is being used by the Parquet reader. It needs to be enhanced > so it can be used by mid-stream operators. In particular, mid-stream > operators have both incoming batches and outgoing batches, while Parquet only > has outgoing batches. So the utility needs to support incoming batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6685) Error in parquet record reader
[ https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6685: -- Reviewer: Boaz Ben-Zvi > Error in parquet record reader > -- > > Key: DRILL-6685 > URL: https://issues.apache.org/jira/browse/DRILL-6685 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: drillbit.log.6685 > > > This is the query: > select VarbinaryValue1 from > dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit > 36; > It appears to be caused by this commit: > DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader > aee899c1b26ebb9a5781d280d5a73b42c273d4d5 > This is the stack trace: > {noformat} > Error: INTERNAL_ERROR ERROR: Error in parquet record reader. > Message: > Hadoop path: > /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 1250 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 Index; > optional binary VarbinaryValue1; > optional int64 BigIntValue; > optional boolean BooleanValue; > optional int32 DateValue (DATE); > optional float FloatValue; > optional binary VarcharValue1 (UTF8); > optional double DoubleValue; > optional int32 IntegerValue; > optional int32 TimeValue (TIME_MILLIS); > optional int64 TimestampValue (TIMESTAMP_MILLIS); > optional binary VarbinaryValue2; > optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL); > optional binary VarcharValue2 (UTF8); > } > , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: > [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional > int64 Index [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED > [VarbinaryValue1] optional binary VarbinaryValue1 [PLAIN, RLE, BIT_PACKED], > 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue > [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED > [BooleanValue] optional boolean BooleanValue [PLAIN, RLE, BIT_PACKED], > 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue > (DATE) [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED > [FloatValue] optional float FloatValue [PLAIN, RLE, BIT_PACKED], 8184959}, > ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 > (UTF8) [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED > [DoubleValue] optional double DoubleValue [PLAIN, RLE, BIT_PACKED], > 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 > IntegerValue [PLAIN, RLE, BIT_PACKED], 10240111}, > ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue > (TIME_MILLIS) [PLAIN, RLE, BIT_PACKED], 10245154}, > ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue > (TIMESTAMP_MILLIS) [PLAIN, RLE, BIT_PACKED], 10250197}, > ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 > [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED > [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED > [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED > [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED > [VarcharValue2] optional binary VarcharValue2 (UTF8) [PLAIN, RLE, > BIT_PACKED], 19677568}]}]} > Fragment 0:0 > [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] > (state=,code=0) > {noformat} > Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet > sys.version is: > 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bacChanged > PROJECT_OUTPUT_BATCH_SIZE to System + Session 30.07.2018 @ 17:12:53 PDT > r...@mapr.com 30.07.2018 @ 17:25:21 PDT^M > fourvarchar_asc_nulls70.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6685) Error in parquet record reader
[ https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581620#comment-16581620 ] salim achouche commented on DRILL-6685: --- Fixed a regression when addressing DRILL-6570: * When fixing DRILL-6570, we unified a bulk entry's max-values so that a false-positive (from fixed length to variable length) could happen smoothly * The regression was that the fixed length algorithm was relying on the previous bulk-entry max-value constraint Fix - * I have re-introduced the constraint within the fixed-length reader * Added a test-suite using [@robert|https://github.com/robert] Hou parquet data to prevent such regressions > Error in parquet record reader > -- > > Key: DRILL-6685 > URL: https://issues.apache.org/jira/browse/DRILL-6685 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: drillbit.log.6685 > > > This is the query: > select VarbinaryValue1 from > dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit > 36; > It appears to be caused by this commit: > DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader > aee899c1b26ebb9a5781d280d5a73b42c273d4d5 > This is the stack trace: > {noformat} > Error: INTERNAL_ERROR ERROR: Error in parquet record reader. > Message: > Hadoop path: > /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 1250 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 Index; > optional binary VarbinaryValue1; > optional int64 BigIntValue; > optional boolean BooleanValue; > optional int32 DateValue (DATE); > optional float FloatValue; > optional binary VarcharValue1 (UTF8); > optional double DoubleValue; > optional int32 IntegerValue; > optional int32 TimeValue (TIME_MILLIS); > optional int64 TimestampValue (TIMESTAMP_MILLIS); > optional binary VarbinaryValue2; > optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL); > optional binary VarcharValue2 (UTF8); > } > , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: > [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional > int64 Index [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED > [VarbinaryValue1] optional binary VarbinaryValue1 [PLAIN, RLE, BIT_PACKED], > 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue > [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED > [BooleanValue] optional boolean BooleanValue [PLAIN, RLE, BIT_PACKED], > 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue > (DATE) [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED > [FloatValue] optional float FloatValue [PLAIN, RLE, BIT_PACKED], 8184959}, > ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 > (UTF8) [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED > [DoubleValue] optional double DoubleValue [PLAIN, RLE, BIT_PACKED], > 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 > IntegerValue [PLAIN, RLE, BIT_PACKED], 10240111}, > ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue > (TIME_MILLIS) [PLAIN, RLE, BIT_PACKED], 10245154}, > ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue > (TIMESTAMP_MILLIS) [PLAIN, RLE, BIT_PACKED], 10250197}, > ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 > [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED > [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED > [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED > [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED > [VarcharValue2] optional binary VarcharValue2 (UTF8) [PLAIN, RLE, > BIT_PACKED], 19677568}]}]} > Fragment 0:0 > [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] > (state=,code=0) > {noformat} > Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet > sys.version is: > 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bac
[jira] [Updated] (DRILL-6685) Error in parquet record reader
[ https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6685: -- Labels: pull-request-available (was: ) > Error in parquet record reader > -- > > Key: DRILL-6685 > URL: https://issues.apache.org/jira/browse/DRILL-6685 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: drillbit.log.6685 > > > This is the query: > select VarbinaryValue1 from > dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit > 36; > It appears to be caused by this commit: > DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader > aee899c1b26ebb9a5781d280d5a73b42c273d4d5 > This is the stack trace: > {noformat} > Error: INTERNAL_ERROR ERROR: Error in parquet record reader. > Message: > Hadoop path: > /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 1250 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 Index; > optional binary VarbinaryValue1; > optional int64 BigIntValue; > optional boolean BooleanValue; > optional int32 DateValue (DATE); > optional float FloatValue; > optional binary VarcharValue1 (UTF8); > optional double DoubleValue; > optional int32 IntegerValue; > optional int32 TimeValue (TIME_MILLIS); > optional int64 TimestampValue (TIMESTAMP_MILLIS); > optional binary VarbinaryValue2; > optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL); > optional binary VarcharValue2 (UTF8); > } > , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: > [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional > int64 Index [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED > [VarbinaryValue1] optional binary VarbinaryValue1 [PLAIN, RLE, BIT_PACKED], > 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue > [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED > [BooleanValue] optional boolean BooleanValue [PLAIN, RLE, BIT_PACKED], > 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue > (DATE) [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED > [FloatValue] optional float FloatValue [PLAIN, RLE, BIT_PACKED], 8184959}, > ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 > (UTF8) [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED > [DoubleValue] optional double DoubleValue [PLAIN, RLE, BIT_PACKED], > 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 > IntegerValue [PLAIN, RLE, BIT_PACKED], 10240111}, > ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue > (TIME_MILLIS) [PLAIN, RLE, BIT_PACKED], 10245154}, > ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue > (TIMESTAMP_MILLIS) [PLAIN, RLE, BIT_PACKED], 10250197}, > ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 > [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED > [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED > [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED > [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED > [VarcharValue2] optional binary VarcharValue2 (UTF8) [PLAIN, RLE, > BIT_PACKED], 19677568}]}]} > Fragment 0:0 > [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] > (state=,code=0) > {noformat} > Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet > sys.version is: > 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bacChanged > PROJECT_OUTPUT_BATCH_SIZE to System + Session 30.07.2018 @ 17:12:53 PDT > r...@mapr.com 30.07.2018 @ 17:25:21 PDT^M > fourvarchar_asc_nulls70.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows
[ https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6664: -- Labels: ready-to-commit (was: pull-request-available) > Parquet reader should not allow batches with more than 64k rows > --- > > Key: DRILL-6664 > URL: https://issues.apache.org/jira/browse/DRILL-6664 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > > The Drill configuration allows the Parquet reader to handle batches larger > than 64. We should limit this setting to 64k as several operators assume a > maximum batch size of 64k. > NOTE - This Jira is precautionary as the default is 32k rows maximum -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows
[ https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6664: -- Reviewer: Boaz Ben-Zvi > Parquet reader should not allow batches with more than 64k rows > --- > > Key: DRILL-6664 > URL: https://issues.apache.org/jira/browse/DRILL-6664 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > The Drill configuration allows the Parquet reader to handle batches larger > than 64. We should limit this setting to 64k as several operators assume a > maximum batch size of 64k. > NOTE - This Jira is precautionary as the default is 32k rows maximum -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows
[ https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6664: -- Labels: pull-request-available (was: ) > Parquet reader should not allow batches with more than 64k rows > --- > > Key: DRILL-6664 > URL: https://issues.apache.org/jira/browse/DRILL-6664 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > The Drill configuration allows the Parquet reader to handle batches larger > than 64. We should limit this setting to 64k as several operators assume a > maximum batch size of 64k. > NOTE - This Jira is precautionary as the default is 32k rows maximum -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows
salim achouche created DRILL-6664: - Summary: Parquet reader should not allow batches with more than 64k rows Key: DRILL-6664 URL: https://issues.apache.org/jira/browse/DRILL-6664 Project: Apache Drill Issue Type: Improvement Reporter: salim achouche Assignee: salim achouche The Drill configuration allows the Parquet reader to handle batches larger than 64. We should limit this setting to 64k as several operators assume a maximum batch size of 64k. NOTE - This Jira is precautionary as the default is 32k rows maximum -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing
[ https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6101: -- Fix Version/s: 1.15.0 > Optimize Implicit Columns Processing > > > Key: DRILL-6101 > URL: https://issues.apache.org/jira/browse/DRILL-6101 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.12.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > > Problem Description - > * Apache Drill allows users to specify columns even for SELECT STAR queries > * From my discussion with [~paul-rogers], Apache Calcite has a limitation > where the, extra columns are not provided > * The workaround has been to always include all implicit columns for SELECT > STAR queries > * Unfortunately, the current implementation is very inefficient as implicit > column values get duplicated; this leads to substantial performance > degradation when the number of rows are large > Suggested Optimization - > * The NullableVarChar vector should be enhanced to efficiently store > duplicate values > * This will not only address the current Calcite limitations (for SELECT > STAR queries) but also optimize all queries with implicit columns > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing
[ https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6101: -- Labels: ready-to-commit (was: pull-request-available) > Optimize Implicit Columns Processing > > > Key: DRILL-6101 > URL: https://issues.apache.org/jira/browse/DRILL-6101 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.12.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Critical > Labels: ready-to-commit > Fix For: 1.15.0 > > > Problem Description - > * Apache Drill allows users to specify columns even for SELECT STAR queries > * From my discussion with [~paul-rogers], Apache Calcite has a limitation > where the, extra columns are not provided > * The workaround has been to always include all implicit columns for SELECT > STAR queries > * Unfortunately, the current implementation is very inefficient as implicit > column values get duplicated; this leads to substantial performance > degradation when the number of rows are large > Suggested Optimization - > * The NullableVarChar vector should be enhanced to efficiently store > duplicate values > * This will not only address the current Calcite limitations (for SELECT > STAR queries) but also optimize all queries with implicit columns > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing
[ https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6101: -- Reviewer: Timothy Farkas > Optimize Implicit Columns Processing > > > Key: DRILL-6101 > URL: https://issues.apache.org/jira/browse/DRILL-6101 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.12.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Critical > Labels: pull-request-available > > Problem Description - > * Apache Drill allows users to specify columns even for SELECT STAR queries > * From my discussion with [~paul-rogers], Apache Calcite has a limitation > where the, extra columns are not provided > * The workaround has been to always include all implicit columns for SELECT > STAR queries > * Unfortunately, the current implementation is very inefficient as implicit > column values get duplicated; this leads to substantial performance > degradation when the number of rows are large > Suggested Optimization - > * The NullableVarChar vector should be enhanced to efficiently store > duplicate values > * This will not only address the current Calcite limitations (for SELECT > STAR queries) but also optimize all queries with implicit columns > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing
[ https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6101: -- Labels: pull-request-available (was: ) > Optimize Implicit Columns Processing > > > Key: DRILL-6101 > URL: https://issues.apache.org/jira/browse/DRILL-6101 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.12.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Critical > Labels: pull-request-available > > Problem Description - > * Apache Drill allows users to specify columns even for SELECT STAR queries > * From my discussion with [~paul-rogers], Apache Calcite has a limitation > where the, extra columns are not provided > * The workaround has been to always include all implicit columns for SELECT > STAR queries > * Unfortunately, the current implementation is very inefficient as implicit > column values get duplicated; this leads to substantial performance > degradation when the number of rows are large > Suggested Optimization - > * The NullableVarChar vector should be enhanced to efficiently store > duplicate values > * This will not only address the current Calcite limitations (for SELECT > STAR queries) but also optimize all queries with implicit columns > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6660) Exchange operators Analysis
[ https://issues.apache.org/jira/browse/DRILL-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565708#comment-16565708 ] salim achouche commented on DRILL-6660: --- During the analysis it was realized that the Batch Sizing functionality (for exchange operators) is not enough on its own: * Exchanges are usually involved with MxN communication; using a 16MB (default) batch size for each output / input batch will not scale * Instead, the analysis should include the exchange topology, performance implications, communication timing (should we fill an output batch even if it means a long delay?) Thus, our recommendation is to club Resource Management, Batch Sizing, and Performance Tuning within a single initiative to avoid regression and achieve the desired goal of more scalability. > Exchange operators Analysis > --- > > Key: DRILL-6660 > URL: https://issues.apache.org/jira/browse/DRILL-6660 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.15.0 > > > Analysis of what will it take to batch size the exchange operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6660) Exchange operators Analysis
salim achouche created DRILL-6660: - Summary: Exchange operators Analysis Key: DRILL-6660 URL: https://issues.apache.org/jira/browse/DRILL-6660 Project: Apache Drill Issue Type: Sub-task Components: Execution - Flow Reporter: salim achouche Assignee: salim achouche Fix For: 1.15.0 Analysis of what will it take to batch size the exchange operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6659) Batch sizing functionality for exchange operators
salim achouche created DRILL-6659: - Summary: Batch sizing functionality for exchange operators Key: DRILL-6659 URL: https://issues.apache.org/jira/browse/DRILL-6659 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Reporter: salim achouche Assignee: salim achouche Fix For: 1.15.0 This task aims at controlling memory usage within Drill's exchange operators. This is a continuation of the Drill Resource Management effort. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling
[ https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6626: -- Labels: pull-request-available (was: ) > Hash Aggregate: Index out of bounds with small output batch size and spilling > - > > Key: DRILL-6626 > URL: https://issues.apache.org/jira/browse/DRILL-6626 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Boaz Ben-Zvi >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > >This new IOOB failure was seen while trying to recreate the NPE failure in > DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does > not seem to make a difference. > This IOOB can easily be created with other large Hash-Agg queries that need > to spill. > The IOOB was caused after restricting the output batch size (to force many), > and the Hash Aggr memory (to force a spill): > {code} > 0: jdbc:drill:zk=local> alter system set > `drill.exec.memory.operator.output_batch_size` = 262144; > +---++ > | ok |summary | > +---++ > | true | drill.exec.memory.operator.output_batch_size updated. | > +---++ > 1 row selected (0.106 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true; > +---+---+ > | ok |summary| > +---+---+ > | true | exec.errors.verbose updated. | > +---+---+ > 1 row selected (0.081 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216; > +---+--+ > | ok | summary | > +---+--+ > | true | exec.hashagg.mem_limit updated. | > +---+--+ > 1 row selected (0.089 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> SELECT c_customer_id FROM > dfs.`/data/tpcds/sf1/parquet/customer` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT ca_address_id FROM > dfs.`/data/tpcds/sf1/parquet/customer_address` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cd_credit_rating FROM > dfs.`/data/tpcds/sf1/parquet/customer_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT hd_buy_potential FROM > dfs.`/data/tpcds/sf1/parquet/household_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT i_item_id FROM > dfs.`/data/tpcds/sf1/parquet/item` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT p_promo_id FROM > dfs.`/data/tpcds/sf1/parquet/promotion` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT t_time_id FROM > dfs.`/data/tpcds/sf1/parquet/time_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT d_date_id FROM > dfs.`/data/tpcds/sf1/parquet/date_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT s_store_id FROM > dfs.`/data/tpcds/sf1/parquet/store` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT w_warehouse_id FROM > dfs.`/data/tpcds/sf1/parquet/warehouse` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT sm_ship_mode_id FROM > dfs.`/data/tpcds/sf1/parquet/ship_mode` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT r_reason_id FROM > dfs.`/data/tpcds/sf1/parquet/reason` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cc_call_center_id FROM > dfs.`/data/tpcds/sf1/parquet/call_center` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT web_site_id FROM > dfs.`/data/tpcds/sf1/parquet/web_site` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT wp_web_page_id FROM > dfs.`/data/tpcds/sf1/parquet/web_page` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cp_catalog_page_id FROM > dfs.`/data/tpcds/sf1/parquet/catalog_page`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7 > Fragment 4:0 > [Error Id: d44e64ea-f474-436e-94b0-61c61eec2227 on 172.30.8.176:31020] > (java.lang.IndexOutOfBoundsException) Index: 26474, Size: 7 > java.util.ArrayList.rangeCheck():653 > java.util.ArrayList.get():429 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.rehash():293 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1300():120 > > org.
[jira] [Assigned] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling
[ https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6626: - Assignee: salim achouche > Hash Aggregate: Index out of bounds with small output batch size and spilling > - > > Key: DRILL-6626 > URL: https://issues.apache.org/jira/browse/DRILL-6626 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Boaz Ben-Zvi >Assignee: salim achouche >Priority: Major > >This new IOOB failure was seen while trying to recreate the NPE failure in > DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does > not seem to make a difference. > This IOOB can easily be created with other large Hash-Agg queries that need > to spill. > The IOOB was caused after restricting the output batch size (to force many), > and the Hash Aggr memory (to force a spill): > {code} > 0: jdbc:drill:zk=local> alter system set > `drill.exec.memory.operator.output_batch_size` = 262144; > +---++ > | ok |summary | > +---++ > | true | drill.exec.memory.operator.output_batch_size updated. | > +---++ > 1 row selected (0.106 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true; > +---+---+ > | ok |summary| > +---+---+ > | true | exec.errors.verbose updated. | > +---+---+ > 1 row selected (0.081 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216; > +---+--+ > | ok | summary | > +---+--+ > | true | exec.hashagg.mem_limit updated. | > +---+--+ > 1 row selected (0.089 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> SELECT c_customer_id FROM > dfs.`/data/tpcds/sf1/parquet/customer` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT ca_address_id FROM > dfs.`/data/tpcds/sf1/parquet/customer_address` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cd_credit_rating FROM > dfs.`/data/tpcds/sf1/parquet/customer_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT hd_buy_potential FROM > dfs.`/data/tpcds/sf1/parquet/household_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT i_item_id FROM > dfs.`/data/tpcds/sf1/parquet/item` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT p_promo_id FROM > dfs.`/data/tpcds/sf1/parquet/promotion` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT t_time_id FROM > dfs.`/data/tpcds/sf1/parquet/time_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT d_date_id FROM > dfs.`/data/tpcds/sf1/parquet/date_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT s_store_id FROM > dfs.`/data/tpcds/sf1/parquet/store` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT w_warehouse_id FROM > dfs.`/data/tpcds/sf1/parquet/warehouse` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT sm_ship_mode_id FROM > dfs.`/data/tpcds/sf1/parquet/ship_mode` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT r_reason_id FROM > dfs.`/data/tpcds/sf1/parquet/reason` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cc_call_center_id FROM > dfs.`/data/tpcds/sf1/parquet/call_center` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT web_site_id FROM > dfs.`/data/tpcds/sf1/parquet/web_site` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT wp_web_page_id FROM > dfs.`/data/tpcds/sf1/parquet/web_page` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cp_catalog_page_id FROM > dfs.`/data/tpcds/sf1/parquet/catalog_page`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7 > Fragment 4:0 > [Error Id: d44e64ea-f474-436e-94b0-61c61eec2227 on 172.30.8.176:31020] > (java.lang.IndexOutOfBoundsException) Index: 26474, Size: 7 > java.util.ArrayList.rangeCheck():653 > java.util.ArrayList.get():429 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.rehash():293 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1300():120 > > org.apache.drill.exec.physical.impl.common.HashTableTempla
[jira] [Commented] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling
[ https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553377#comment-16553377 ] salim achouche commented on DRILL-6626: --- The IndexOutOfBoundException was happening during Hash Table rehash: * The was a regression when doing batch sizing * Each outgoing-batch needed to fix its hash values (which were cached) * Each outgoing batch should start at index (num-out-batches - 1) * MAX_BATCH_SZ; this is based on the insertion logic * The code instead used the real row count instead of MAX_BATCH_SZ Fix - Put back the original code since the indexing scheme didn't change > Hash Aggregate: Index out of bounds with small output batch size and spilling > - > > Key: DRILL-6626 > URL: https://issues.apache.org/jira/browse/DRILL-6626 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Boaz Ben-Zvi >Priority: Major > >This new IOOB failure was seen while trying to recreate the NPE failure in > DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does > not seem to make a difference. > This IOOB can easily be created with other large Hash-Agg queries that need > to spill. > The IOOB was caused after restricting the output batch size (to force many), > and the Hash Aggr memory (to force a spill): > {code} > 0: jdbc:drill:zk=local> alter system set > `drill.exec.memory.operator.output_batch_size` = 262144; > +---++ > | ok |summary | > +---++ > | true | drill.exec.memory.operator.output_batch_size updated. | > +---++ > 1 row selected (0.106 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true; > +---+---+ > | ok |summary| > +---+---+ > | true | exec.errors.verbose updated. | > +---+---+ > 1 row selected (0.081 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216; > +---+--+ > | ok | summary | > +---+--+ > | true | exec.hashagg.mem_limit updated. | > +---+--+ > 1 row selected (0.089 seconds) > 0: jdbc:drill:zk=local> > 0: jdbc:drill:zk=local> SELECT c_customer_id FROM > dfs.`/data/tpcds/sf1/parquet/customer` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT ca_address_id FROM > dfs.`/data/tpcds/sf1/parquet/customer_address` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cd_credit_rating FROM > dfs.`/data/tpcds/sf1/parquet/customer_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT hd_buy_potential FROM > dfs.`/data/tpcds/sf1/parquet/household_demographics` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT i_item_id FROM > dfs.`/data/tpcds/sf1/parquet/item` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT p_promo_id FROM > dfs.`/data/tpcds/sf1/parquet/promotion` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT t_time_id FROM > dfs.`/data/tpcds/sf1/parquet/time_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT d_date_id FROM > dfs.`/data/tpcds/sf1/parquet/date_dim` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT s_store_id FROM > dfs.`/data/tpcds/sf1/parquet/store` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT w_warehouse_id FROM > dfs.`/data/tpcds/sf1/parquet/warehouse` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT sm_ship_mode_id FROM > dfs.`/data/tpcds/sf1/parquet/ship_mode` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT r_reason_id FROM > dfs.`/data/tpcds/sf1/parquet/reason` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cc_call_center_id FROM > dfs.`/data/tpcds/sf1/parquet/call_center` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT web_site_id FROM > dfs.`/data/tpcds/sf1/parquet/web_site` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT wp_web_page_id FROM > dfs.`/data/tpcds/sf1/parquet/web_page` > . . . . . . . . . . . > UNION > . . . . . . . . . . . > SELECT cp_catalog_page_id FROM > dfs.`/data/tpcds/sf1/parquet/catalog_page`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7 > Fragment 4:0 > [Error Id: d44e64ea-f474-436e-94b0-61c6
[jira] [Updated] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6622: -- Labels: pull-request-available (was: ) > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Labels: pull-request-available > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6622: -- Reviewer: Boaz Ben-Zvi > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551888#comment-16551888 ] salim achouche commented on DRILL-6622: --- Alright, just fixed this issue; there were two bugs in the Aggregator batch sizing logic: Issue I * The aggregator runs in a loop to consume all input batches * The loop was updating the batch sizing stats after they were consumed * Assume output-row-count is 1 and we receive a batch with at least 32k + 1 records * The code would create 32k output batches (one per incoming record) and then fails because of overflow * Fix - Now updating the batch sizing logic when a non-empty batch is received and before the processing loop Issue II * The Aggregator has two main modules: AggregatorBatch and Aggregator objects * Both share the same "incoming" record batch instance * Though there is logic to spill incoming batches when under pressure * The batch sizing logic was not aware that when batches are spilled the shared "incoming" object instance will diverge; that is, the Aggregator object will mutate the incoming object * The batch sizer was being invoked with a stale "incoming" object (the one from the AggregatorBatch) * Fix - Update the Aggregator code to always pass the active incoming object explicitly > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551497#comment-16551497 ] salim achouche edited comment on DRILL-6622 at 7/21/18 1:55 AM: This looks like a serious bug: * The batch memory managers are somehow thinking that most incoming batches are empty * The aggregator used to create outgoing batches with exactly 2**16 max capacity * The memory manager erroneous stats makes it so the Aggregator is getting a max capacity of 1 * This means that every unique group is being stored in its own outgoing batch * The Aggregator limits the max number of outgoing batches to 64k (since previously a batch could contain 64k entries); a 32bits indexing scheme subdivides this space into a couple (out-batch-idx, idx-within-batch) * A NullPointerException happens when this indexing scheme fails because of the large number of outgoing batches (overflow) * The bug was there for awhile (when Aggregator was modified to support batch sizing) but the bug manifested itself only on a large number of unique groups I am having now to reverse engineer the reason for the erroneous batch sizer stats. was (Author: sachouche): This looks like a serious bug: * The batch memory managers are somehow thinking that most incoming batches are empty * The aggregator used to create outgoing batches with exactly 2**16 max capacity * The memory manager erroneous stats make it so the Aggregator is getting a max capacity of 1 * This meant that every unique group is being stored in its own outgoing batch * The Aggregator limits the max number of outgoing batches to 64k (since previously a batch could contain 64k entries); a 32bits indexing scheme subdivides this space into a couple (out-batch-idx, idx-within-batch) * A NullpointException happens when this indexing scheme fails becomes of the large number of outgoing batches (overflow) * The bug was there for awhile (when Aggregator was modified to support batch sizing) but the bug manifested itself only on a large number of unique groups I am having now to reverse engineer the reason for the erroneous batch sizer stats. > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNe
[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551497#comment-16551497 ] salim achouche commented on DRILL-6622: --- This looks like a serious bug: * The batch memory managers are somehow thinking that most incoming batches are empty * The aggregator used to create outgoing batches with exactly 2**16 max capacity * The memory manager erroneous stats make it so the Aggregator is getting a max capacity of 1 * This meant that every unique group is being stored in its own outgoing batch * The Aggregator limits the max number of outgoing batches to 64k (since previously a batch could contain 64k entries); a 32bits indexing scheme subdivides this space into a couple (out-batch-idx, idx-within-batch) * A NullpointException happens when this indexing scheme fails becomes of the large number of outgoing batches (overflow) * The bug was there for awhile (when Aggregator was modified to support batch sizing) but the bug manifested itself only on a large number of unique groups I am having now to reverse engineer the reason for the erroneous batch sizer stats. > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551462#comment-16551462 ] salim achouche commented on DRILL-6622: --- [~priteshm], * Being able to reproduce this bug really helps * There is a regression in the aggregator's output batch management * Fixing the bug required that I spend sometime getting familiar with the code (which I have done now) * Hopefully, I am closing in on the reason of the NullPointerException * Will update my findings ASAP > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551168#comment-16551168 ] salim achouche commented on DRILL-6622: --- Realized that I was running with debug mode enabled which somehow changed this bug repro; now able to observe the failure with debug off. > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551095#comment-16551095 ] salim achouche commented on DRILL-6622: --- * Tried the query (on my mac) with SF10 and SF100 but no crash as reported by this Jira * Had multiple GC failures (with 4GB and 8GB of JVM heap memory); my guess, this is due to the overhead of spilling data (avoiding spilling makes the query succeed) * The query succeeded as soon as I bumped up the query direct memory per node from 2GB to 6GB I'll now try to reproduce this issue on my 4 nodes cluster. > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6622: - Assignee: salim achouche (was: Boaz Ben-Zvi) > UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException > --- > > Key: DRILL-6622 > URL: https://issues.apache.org/jira/browse/DRILL-6622 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Assignee: salim achouche >Priority: Blocker > Fix For: 1.14.0 > > Attachments: > MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, > MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json > > > {code} > SELECT c_customer_id FROM customer > UNION > SELECT ca_address_id FROM customer_address > UNION > SELECT cd_credit_rating FROM customer_demographics > UNION > SELECT hd_buy_potential FROM household_demographics > UNION > SELECT i_item_id FROM item > UNION > SELECT p_promo_id FROM promotion > UNION > SELECT t_time_id FROM time_dim > UNION > SELECT d_date_id FROM date_dim > UNION > SELECT s_store_id FROM store > UNION > SELECT w_warehouse_id FROM warehouse > UNION > SELECT sm_ship_mode_id FROM ship_mode > UNION > SELECT r_reason_id FROM reason > UNION > SELECT cc_call_center_id FROM call_center > UNION > SELECT web_site_id FROM web_site > UNION > SELECT wp_web_page_id FROM web_page > UNION > SELECT cp_catalog_page_id FROM catalog_page; > {code} > hit the following error: > {code} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96) > ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171) > ~[na:na] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372) > ~[na:na] > at > org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599) > ~[na:na] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > {code} > [~dechanggu] found that the issue is absent in Drill 1.13. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Labels: ready-to-commit (was: pull-request-available) > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation
[ https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6578: -- Labels: ready-to-commit (was: pull-request-available) > Ensure the Flat Parquet Reader can handle query cancellation > > > Key: DRILL-6578 > URL: https://issues.apache.org/jira/browse/DRILL-6578 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > * The optimized Parquet reader uses an iterator style to load column data > * We need to ensure the code can properly handle query cancellation even in > the presence of bugs within the hasNext() .. next() calls -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Labels: pull-request-available (was: ) > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/
[ https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539048#comment-16539048 ] salim achouche commented on DRILL-6569: --- Robert, According to the original comment: * Using the DFS command is successful; this invokes the Parquet reader * Running the complex query (without the explicit DFS clause) fails; the stack trace indicates the Hive reader was invoked ** org.apache.drill.exec.store.hive.readers.{color:#d04437}*HiveParquetReader.*{color}next():54 ** org.apache.drill.exec.physical.impl.ScanBatch.next():172 > Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not > read value at 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > -- > > Key: DRILL-6569 > URL: https://issues.apache.org/jira/browse/DRILL-6569 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Robert Hou >Priority: Critical > Fix For: 1.15.0 > > > This is TPCDS Query 19. > I am able to scan the parquet file using: >select * from > dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet` > and I get 3,349,279 rows selected. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql > SELECT i_brand_id brand_id, > i_brand brand, > i_manufact_id, > i_manufact, > Sum(ss_ext_sales_price) ext_price > FROM date_dim, > store_sales, > item, > customer, > customer_address, > store > WHERE d_date_sk = ss_sold_date_sk > AND ss_item_sk = i_item_sk > AND i_manager_id = 38 > AND d_moy = 12 > AND d_year = 1998 > AND ss_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) > AND ss_store_sk = s_store_sk > GROUP BY i_brand, > i_brand_id, > i_manufact_id, > i_manufact > ORDER BY ext_price DESC, > i_brand, > i_brand_id, > i_manufact_id, > i_manufact > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block > 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > Fragment 4:26 > [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] > (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at > 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > > hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 > hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 > > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 > org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 > org.apache.drill.exec.physical.impl.ScanBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.
[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538974#comment-16538974 ] salim achouche commented on DRILL-6517: --- [~khfaraaz], when queries are cancelled, we anticipate exceptions to be thrown (e.g., an interrupted thread will receive an exception on a blocking call). The questions which I am trying to figure out: * Is the IllegalException thrown only on query cancellation? * Is there a more important bug causing the foreman to cancel the query? So I'll use your real cluster to debug this issue. > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Critical > Fix For: 1.14.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at >
[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532150#comment-16532150 ] salim achouche commented on DRILL-6517: --- * I ran the query around 10 times and it succeeded each time (running in 29 minutes) * Bounced the Drillbit cluster and immediately one of the nodes became unresponsive * I launched a script to gather jstacks each minute; somehow the jstack failed and got the below kernel messages * VMware blogs indicated the VM is running out of resources * The interesting part is that the java illegal exception showed up again when cancellation happened{color:#f79232}Caused by: java.lang.IllegalStateException: Record count not set for this vector container{color} {color:#FF}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color} {color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 21s! [java:12219] {color} {color:#FF}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color} {color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#3 stuck for 25s! [java:16991]{color} {color:#FF}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color} {color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#4 stuck for 25s! [java:17633]{color} {color:#FF}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color} {color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#5 stuck for 25s! [java:27059]{color} > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Critical > Fix For: 1.14.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRe
[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532019#comment-16532019 ] salim achouche commented on DRILL-6517: --- After debugging this issue, noticed the thrown exception was masking the real problem: * Launched the query the first time on a 4 nodes cluster (made up of VMs) * Query memory per node 10Gb; spilling not enabled (at least not explicitly) * The query ran in 35min and succeded * Re-launched the same query but this time node-3 was irresponsive * After one hour the query failed; the client error was that node-3 was lost * Within the Drillbit logs, the set-count error issue was thrown though after the foreman cancelled the query I'll now focus on understanding why the system is getting in this state when running for the second time; the fact that I am using VMs is not helping as network issues are very common. > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Critical > Fix For: 1.14.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org
[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container
[ https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531837#comment-16531837 ] salim achouche commented on DRILL-6517: --- If this is the case, then I'll fix that; though my impression is that the exception is thrown in the last HJ where both inputs came from non-parquet. I am currently re-running the test with new instrumentation.. > IllegalStateException: Record count not set for this vector container > - > > Key: DRILL-6517 > URL: https://issues.apache.org/jira/browse/DRILL-6517 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.14.0 >Reporter: Khurram Faraaz >Assignee: salim achouche >Priority: Critical > Fix For: 1.14.0 > > Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill > > > TPC-DS query is Canceled after 2 hrs and 47 mins and we see an > IllegalStateException: Record count not set for this vector container, in > drillbit.log > Steps to reproduce the problem, query profile > (24d7b377-7589-7928-f34f-57d02061acef) is attached here. > {noformat} > In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"} > and set these options from sqlline, > alter system set `planner.memory.max_query_memory_per_node` = 10737418240; > alter system set `drill.exec.hashagg.fallback.enabled` = true; > To run the query (replace IP-ADDRESS with your foreman node's IP address) > cd /opt/mapr/drill/drill-1.14.0/bin > ./sqlline -u > "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f > /root/query72.sql > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Record count not set for this vector container > Fragment 4:49 > [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.IllegalStateException: Record count not set for this > vector container > at com.google.common.base.Preconditions.checkState(Preconditions.java:173) > ~[guava-18.0.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218) > ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] > at > org.apache.drill.exec.r
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Labels: pull-request-available ready-to-commit (was: pull-request-available) > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 1.14.0 > > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Reviewer: Boaz Ben-Zvi > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Labels: pull-request-available (was: ) > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6579: -- Summary: Add sanity checks to Parquet Reader (was: Sanity checks to avoid infinite loops) > Add sanity checks to Parquet Reader > > > Key: DRILL-6579 > URL: https://issues.apache.org/jira/browse/DRILL-6579 > Project: Apache Drill > Issue Type: Improvement >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > > Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6579) Sanity checks to avoid infinite loops
salim achouche created DRILL-6579: - Summary: Sanity checks to avoid infinite loops Key: DRILL-6579 URL: https://issues.apache.org/jira/browse/DRILL-6579 Project: Apache Drill Issue Type: Improvement Reporter: salim achouche Assignee: salim achouche Add sanity checks to the Parquet reader to avoid infinite loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1
[ https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6569: - Assignee: Robert Hou (was: salim achouche) > Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not > read value at 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > -- > > Key: DRILL-6569 > URL: https://issues.apache.org/jira/browse/DRILL-6569 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: Robert Hou >Priority: Critical > Fix For: 1.14.0 > > > This is TPCDS Query 19. > I am able to scan the parquet file using: >select * from > dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet` > and I get 3,349,279 rows selected. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql > SELECT i_brand_id brand_id, > i_brand brand, > i_manufact_id, > i_manufact, > Sum(ss_ext_sales_price) ext_price > FROM date_dim, > store_sales, > item, > customer, > customer_address, > store > WHERE d_date_sk = ss_sold_date_sk > AND ss_item_sk = i_item_sk > AND i_manager_id = 38 > AND d_moy = 12 > AND d_year = 1998 > AND ss_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) > AND ss_store_sk = s_store_sk > GROUP BY i_brand, > i_brand_id, > i_manufact_id, > i_manufact > ORDER BY ext_price DESC, > i_brand, > i_brand_id, > i_manufact_id, > i_manufact > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block > 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > Fragment 4:26 > [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] > (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at > 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > > hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 > hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 > > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 > org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 > org.apache.drill.exec.physical.impl.ScanBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordB
[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/
[ https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530733#comment-16530733 ] salim achouche commented on DRILL-6569: --- [~rhou], This is a Hive Parquet reader issue (not the native Drill Parquet reader). > Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not > read value at 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > -- > > Key: DRILL-6569 > URL: https://issues.apache.org/jira/browse/DRILL-6569 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.14.0 >Reporter: Robert Hou >Assignee: salim achouche >Priority: Critical > Fix For: 1.14.0 > > > This is TPCDS Query 19. > I am able to scan the parquet file using: >select * from > dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet` > and I get 3,349,279 rows selected. > Query: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql > SELECT i_brand_id brand_id, > i_brand brand, > i_manufact_id, > i_manufact, > Sum(ss_ext_sales_price) ext_price > FROM date_dim, > store_sales, > item, > customer, > customer_address, > store > WHERE d_date_sk = ss_sold_date_sk > AND ss_item_sk = i_item_sk > AND i_manager_id = 38 > AND d_moy = 12 > AND d_year = 1998 > AND ss_customer_sk = c_customer_sk > AND c_current_addr_sk = ca_address_sk > AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) > AND ss_store_sk = s_store_sk > GROUP BY i_brand, > i_brand_id, > i_manufact_id, > i_manufact > ORDER BY ext_price DESC, > i_brand, > i_brand_id, > i_manufact_id, > i_manufact > LIMIT 100; > Here is the stack trace: > 2018-06-29 07:00:32 INFO DrillTestLogger:348 - > Exception: > java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block > 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > Fragment 4:26 > [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] > (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at > 2 in block 0 in file > maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet > > hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 > hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 > > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 > > org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 > org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 > org.apache.drill.exec.physical.impl.ScanBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 > org.apache.drill.exec.record.AbstractRecordBatch.next():152 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecor
[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation
[ https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6578: -- Labels: pull-request-available (was: ) > Ensure the Flat Parquet Reader can handle query cancellation > > > Key: DRILL-6578 > URL: https://issues.apache.org/jira/browse/DRILL-6578 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > * The optimized Parquet reader uses an iterator style to load column data > * We need to ensure the code can properly handle query cancellation even in > the presence of bugs within the hasNext() .. next() calls -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation
[ https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6578: -- Reviewer: Vlad Rozov > Ensure the Flat Parquet Reader can handle query cancellation > > > Key: DRILL-6578 > URL: https://issues.apache.org/jira/browse/DRILL-6578 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > > * The optimized Parquet reader uses an iterator style to load column data > * We need to ensure the code can properly handle query cancellation even in > the presence of bugs within the hasNext() .. next() calls -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation
salim achouche created DRILL-6578: - Summary: Ensure the Flat Parquet Reader can handle query cancellation Key: DRILL-6578 URL: https://issues.apache.org/jira/browse/DRILL-6578 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Reporter: salim achouche Assignee: salim achouche * The optimized Parquet reader uses an iterator style to load column data * We need to ensure the code can properly handle query cancellation even in the presence of bugs within the hasNext() .. next() calls -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6560) Allow options for controlling the batch size per operator
[ https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6560: -- Reviewer: Karthikeyan Manivannan > Allow options for controlling the batch size per operator > - > > Key: DRILL-6560 > URL: https://issues.apache.org/jira/browse/DRILL-6560 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Minor > Labels: pull-request-available > Fix For: 1.14.0 > > > This Jira is for internal Drill DEV use; the following capabilities are > needed for automating the batch sizing functionality testing: > * Control the enablement of batch sizing statistics at session (per query) > and server level (all queries) > * Control the granularity of batch sizing statistics (summary or verbose) > * Control the set of operators that should log batch statistics -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6560) Allow options for controlling the batch size per operator
[ https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6560: -- Labels: pull-request-available (was: ) > Allow options for controlling the batch size per operator > - > > Key: DRILL-6560 > URL: https://issues.apache.org/jira/browse/DRILL-6560 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Minor > Labels: pull-request-available > Fix For: 1.14.0 > > > This Jira is for internal Drill DEV use; the following capabilities are > needed for automating the batch sizing functionality testing: > * Control the enablement of batch sizing statistics at session (per query) > and server level (all queries) > * Control the granularity of batch sizing statistics (summary or verbose) > * Control the set of operators that should log batch statistics -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6570: -- Remaining Estimate: 2h Original Estimate: 2h Reviewer: Kunal Khatua Issue Type: Bug (was: Improvement) > IndexOutOfBoundsException when using Flat Parquet Reader > - > > Key: DRILL-6570 > URL: https://issues.apache.org/jira/browse/DRILL-6570 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > * The Parquet Reader creates a reusable bulk entry based on the column > precision > * It uses the column precision for optimizing the intermediary heap buffers > * It first detected the column was fixed length but then it reverted this > assumption when the column changed precision > * This step was fine except the bulk entry memory requirement changed though > the code didn't update the bulk entry intermediary buffers > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6570: -- Labels: pull-request-available (was: ) > IndexOutOfBoundsException when using Flat Parquet Reader > - > > Key: DRILL-6570 > URL: https://issues.apache.org/jira/browse/DRILL-6570 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > * The Parquet Reader creates a reusable bulk entry based on the column > precision > * It uses the column precision for optimizing the intermediary heap buffers > * It first detected the column was fixed length but then it reverted this > assumption when the column changed precision > * This step was fine except the bulk entry memory requirement changed though > the code didn't update the bulk entry intermediary buffers > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader
salim achouche created DRILL-6570: - Summary: IndexOutOfBoundsException when using Flat Parquet Reader Key: DRILL-6570 URL: https://issues.apache.org/jira/browse/DRILL-6570 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 * The Parquet Reader creates a reusable bulk entry based on the column precision * It uses the column precision for optimizing the intermediary heap buffers * It first detected the column was fixed length but then it reverted this assumption when the column changed precision * This step was fine except the bulk entry memory requirement changed though the code didn't update the bulk entry intermediary buffers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6560) Allow options for controlling the batch size per operator
salim achouche created DRILL-6560: - Summary: Allow options for controlling the batch size per operator Key: DRILL-6560 URL: https://issues.apache.org/jira/browse/DRILL-6560 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 This Jira is for internal Drill DEV use; the following capabilities are needed for automating the batch sizing functionality testing: * Control the enablement of batch sizing statistics at session (per query) and server level (all queries) * Control the granularity of batch sizing statistics (summary or verbose) * Control the set of operators that should log batch statistics -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6539) Record count not set for this vector container error
[ https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523959#comment-16523959 ] salim achouche commented on DRILL-6539: --- I have been trying to reproduce this issue on my mac os but Khurram' TPCDS test succeeded. [~ppenumarthy] Do you have another repro case? > Record count not set for this vector container error > - > > Key: DRILL-6539 > URL: https://issues.apache.org/jira/browse/DRILL-6539 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > This error is randomly seen when executing queries. > [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010] > (java.lang.IllegalStateException) Record count not set for this vector > container > com.google.common.base.Preconditions.checkState():173 > org.apache.drill.exec.record.VectorContainer.getRecordCount():394 > org.apache.drill.exec.record.RecordBatchSizer.():681 > org.apache.drill.exec.record.RecordBatchSizer.():665 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891 > > org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 >
[jira] [Created] (DRILL-6528) Planner setting the wrong number of records to read (Parquet Reader)
salim achouche created DRILL-6528: - Summary: Planner setting the wrong number of records to read (Parquet Reader) Key: DRILL-6528 URL: https://issues.apache.org/jira/browse/DRILL-6528 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: salim achouche - Recently fixed the Flat Parquet reader to honor the number of records to read - Though few tests failed: TestUnionDistinct.testUnionDistinctEmptySides:356 Different number of records returned expected:<5> but was:<1> TestUnionAll.testUnionAllEmptySides:355 Different number of records returned expected:<5> but was:<1> - I debugged one of them and realized the Planner was setting the wrong number of rows to read (in this case, one) - You can put a break point and see this happening: Class: ParquetGroupScan Method: updateRowGroupInfo(long maxRecords) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6147) Limit batch size for Flat Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6147: -- Labels: pull-request-available (was: ) > Limit batch size for Flat Parquet Reader > > > Key: DRILL-6147 > URL: https://issues.apache.org/jira/browse/DRILL-6147 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > The Parquet reader currently uses a hard-coded batch size limit (32k rows) > when creating scan batches; there is no parameter nor any logic for > controlling the amount of memory used. This enhancement will allow Drill to > take an extra input parameter to control direct memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6513) Drill should only allow valid values when users set planner.memory.max_query_memory_per_node
[ https://issues.apache.org/jira/browse/DRILL-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6513: -- Labels: pull-request-available (was: ) > Drill should only allow valid values when users set > planner.memory.max_query_memory_per_node > > > Key: DRILL-6513 > URL: https://issues.apache.org/jira/browse/DRILL-6513 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > The "planner.memory.max_query_memory_per_node" configuration can be currently > set to values higher than the Drillbit Direct Memory configuration. The goal > of this Jira is to fail queries with such an erroneous configuration to avoid > runtime OOM. > NOTE - The current semantic of the maximum query memory per node > configuration is that the end user has computed valid values especially > knowing the current Drill limitations. Such values have to account for > Netty's overhead (memory pools), shared pools (e.g., network exchanges), and > concurrent query execution. This Jira should not be used to also cover such > use-cases. The Drill Resource Management feature has the means to automate > query quotas and the associated validation. We should create another Jira > requesting the enhanced validations contracts under the umbrella of the > Resource Management feature. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6513) Drill should only allow valid values when users set planner.memory.max_query_memory_per_node
salim achouche created DRILL-6513: - Summary: Drill should only allow valid values when users set planner.memory.max_query_memory_per_node Key: DRILL-6513 URL: https://issues.apache.org/jira/browse/DRILL-6513 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 The "planner.memory.max_query_memory_per_node" configuration can be currently set to values higher than the Drillbit Direct Memory configuration. The goal of this Jira is to fail queries with such an erroneous configuration to avoid runtime OOM. NOTE - The current semantic of the maximum query memory per node configuration is that the end user has computed valid values especially knowing the current Drill limitations. Such values have to account for Netty's overhead (memory pools), shared pools (e.g., network exchanges), and concurrent query execution. This Jira should not be used to also cover such use-cases. The Drill Resource Management feature has the means to automate query quotas and the associated validation. We should create another Jira requesting the enhanced validations contracts under the umbrella of the Resource Management feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6246) Build Failing in jdbc-all artifact
[ https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche closed DRILL-6246. - Resolution: Fixed > Build Failing in jdbc-all artifact > -- > > Key: DRILL-6246 > URL: https://issues.apache.org/jira/browse/DRILL-6246 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.13.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > > * {color:#00}It was noticed that the build was failing because of the > jdbc-all artifact{color} > * {color:#00}The maximum compressed jar size was set to 32MB but we are > currently creating a JAR a bit larger than 32MB {color} > * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and > drill-1.13.0 (on my MacOS){color} > * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color} > * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color} > * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is > roughly 32MB){color} > * {color:#00}Compared then in more details jdbc-all-1.12.0 and > jdbc-all-1.13.0{color} > * {color:#00}The bulk of the increase is attributed to the calcite > artifact{color} > * {color:#00}Used to be 2MB (uncompressed) and now 22MB > (uncompressed){color} > * {color:#00}It is likely an exclusion problem {color} > * {color:#00}The jdbc-all-1.12.0 version has only two top packages > calcite/avatica/utils and calcite/avatica/remote{color} > * {color:#00}The jdbc-all-1.13.0 includes new packages (within > calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color} > {color:#00} {color} > {color:#00}I am planning to exclude these new sub-packages{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6447) Unsupported Operation when reading parquet data
[ https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494167#comment-16494167 ] salim achouche commented on DRILL-6447: --- I have fixed this issue but [~vrozov] indicated he incorporated the fix as part of the Parquet upgrade. [~vrozov] can you please close my PR when this issue has been fixed. > Unsupported Operation when reading parquet data > --- > > Key: DRILL-6447 > URL: https://issues.apache.org/jira/browse/DRILL-6447 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: salim achouche >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > An exception is thrown when reading Parquet data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6447) Unsupported Operation when reading parquet data
[ https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche reassigned DRILL-6447: - Assignee: Vlad Rozov (was: salim achouche) > Unsupported Operation when reading parquet data > --- > > Key: DRILL-6447 > URL: https://issues.apache.org/jira/browse/DRILL-6447 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: salim achouche >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > An exception is thrown when reading Parquet data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6147) Limit batch size for Flat Parquet Reader
[ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6147: -- Reviewer: Parth Chandra > Limit batch size for Flat Parquet Reader > > > Key: DRILL-6147 > URL: https://issues.apache.org/jira/browse/DRILL-6147 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Parquet reader currently uses a hard-coded batch size limit (32k rows) > when creating scan batches; there is no parameter nor any logic for > controlling the amount of memory used. This enhancement will allow Drill to > take an extra input parameter to control direct memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6447) Unsupported Operation when reading parquet data
[ https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6447: -- Reviewer: Arina Ielchiieva > Unsupported Operation when reading parquet data > --- > > Key: DRILL-6447 > URL: https://issues.apache.org/jira/browse/DRILL-6447 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.14.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > An exception is thrown when reading Parquet data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6447) Unsupported Operation when reading parquet data
salim achouche created DRILL-6447: - Summary: Unsupported Operation when reading parquet data Key: DRILL-6447 URL: https://issues.apache.org/jira/browse/DRILL-6447 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.14.0 Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 An exception is thrown when reading Parquet data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5847) Flat Parquet Reader Performance Analysis
[ https://issues.apache.org/jira/browse/DRILL-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-5847. --- Resolution: Fixed Fix Version/s: 1.14.0 > Flat Parquet Reader Performance Analysis > > > Key: DRILL-5847 > URL: https://issues.apache.org/jira/browse/DRILL-5847 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: performance > Fix For: 1.14.0 > > Attachments: Drill Framework Enhancements.pdf, Flat Parquet Scanner > Enhancements Presentation.pdf > > > This task is to analyze the Flat Parquet Reader logic looking for performance > improvements opportunities. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5848) Implement Parquet Columnar Processing & Use Bulk APIs for processing
[ https://issues.apache.org/jira/browse/DRILL-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-5848. --- Resolution: Fixed Fix Version/s: 1.14.0 > Implement Parquet Columnar Processing & Use Bulk APIs for processing > > > Key: DRILL-5848 > URL: https://issues.apache.org/jira/browse/DRILL-5848 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > * Change Flat Parquet Reader processing from row based to columnar > * Use Bulk APIs during the parsing and data loading phase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
[ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-5846: -- Labels: performance ready-to-commit (was: performance) > Improve Parquet Reader Performance for Flat Data types > --- > > Key: DRILL-5846 > URL: https://issues.apache.org/jira/browse/DRILL-5846 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: performance, ready-to-commit > Fix For: 1.14.0 > > Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, > 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill > > > The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to > further improve the Parquet Reader performance as several users reported that > Parquet parsing represents the lion share of the overall query execution. It > tracks Flat Data types only as Nested DTs might involve functional and > processing enhancements (e.g., a nested column can be seen as a Document; > user might want to perform operations scoped at the document level that is no > need to span all rows). Another JIRA will be created to handle the nested > columns use-case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6348: -- [~parthc], can you please review this task? Thanks! > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6410) Memory leak in Parquet Reader during cancellation
[ https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche updated DRILL-6410: -- Reviewer: Parth Chandra Created pull request [1257|https://github.com/apache/drill/pull/1257/commits] to address this bug. > Memory leak in Parquet Reader during cancellation > - > > Key: DRILL-6410 > URL: https://issues.apache.org/jira/browse/DRILL-6410 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > > Occasionally, a memory leak is observed within the flat Parquet reader when > query cancellation is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6410) Memory leak in Parquet Reader during cancellation
salim achouche created DRILL-6410: - Summary: Memory leak in Parquet Reader during cancellation Key: DRILL-6410 URL: https://issues.apache.org/jira/browse/DRILL-6410 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Reporter: salim achouche Assignee: salim achouche Occasionally, a memory leak is observed within the flat Parquet reader when query cancellation is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
[ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466463#comment-16466463 ] salim achouche commented on DRILL-5846: --- [~parthc], Can you please review this Jira PR now that I have provided a detailed performance analysis (DRILL-6301). > Improve Parquet Reader Performance for Flat Data types > --- > > Key: DRILL-5846 > URL: https://issues.apache.org/jira/browse/DRILL-5846 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: performance > Fix For: 1.14.0 > > Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, > 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill > > > The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to > further improve the Parquet Reader performance as several users reported that > Parquet parsing represents the lion share of the overall query execution. It > tracks Flat Data types only as Nested DTs might involve functional and > processing enhancements (e.g., a nested column can be seen as a Document; > user might want to perform operations scoped at the document level that is no > need to span all rows). Another JIRA will be created to handle the nested > columns use-case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6301) Parquet Performance Analysis
[ https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-6301. --- Resolution: Fixed Reviewer: Pritesh Maker This is an analytical task. > Parquet Performance Analysis > > > Key: DRILL-6301 > URL: https://issues.apache.org/jira/browse/DRILL-6301 > Project: Apache Drill > Issue Type: Task > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > _*Description -*_ > * DRILL-5846 is meant to improve the Flat Parquet reader performance > * The associated implementation resulted in a 2x - 4x performance improvement > * Though during the review process ([pull > request|[https://github.com/apache/drill/pull/1060])] few key questions arised > > *_Intermediary Processing via Direct Memory vs Byte Arrays_* > * The main reasons for using byte arrays for intermediary processing is to > a) avoid the high cost of the DrillBuf checks (especially the reference > counting) and b) benefit from some observed Java optimizations when accessing > byte arrays > * Starting with version 1.12.0, the DrillBuf enablement checks have been > refined so that memory access and reference counting checks can be enabled > independently > * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the > performance gap between heap vs direct memory is very narrow except for few > use-cases > * There are also concerns that the extra copy step (from direct memory into > byte arrays) will have a negative effect on performance; note that this > overhead was not observed using Intel's Vtune as the intermediary buffer were > a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 > cache during columnar processing. > _*Goal*_ > * The Flat Parquet reader is amongst the few Drill columnar operators > * It is imperative that we agree on the most optimal processing pattern so > that the decisions that we take within this Jira are not only applied to > Parquet but to all Drill columnar operators > _*Methodology*_ > # Assess the performance impact of using intermediary byte arrays (as > described above) > # Prototype a solution using Direct Memory and DrillBuf checks off, access > checks on, all checks on > # Make an educated decision on which processing pattern should be adopted > # Decide whether it is ok to use Java's unsafe API (and through what > mechanism) on byte arrays (when the use of byte arrays is a necessity) > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)