[jira] [Resolved] (DRILL-7276) xss(bug) in apache drill Web UI latest verion 1.16.0 when authenticated

2019-05-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-7276.
---
Resolution: Fixed

> xss(bug) in apache drill Web UI latest verion 1.16.0 when authenticated 
> 
>
> Key: DRILL-7276
> URL: https://issues.apache.org/jira/browse/DRILL-7276
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: shuiboye
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: 1.png, 2.png, 4.png
>
>
> In the query page,I select the "SQL" of the "Query Type"  and in the "Query" 
> field I input "*select ''  FROM cp.`employee.json`*".
> !1.png!
> After submitting,I get the Query Profile whose url is 
> "*[http://127.0.0.1:8047/profiles/231beb11-4b43-0762-8b90-76a9af2edd24]*";.
> !2.png!
> Any user who visits the profile page and clicks "JSON profile" at the bottom 
> to see the FULL JSON Profile will see two alert boxes as shown below.
>   !4.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-22 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-7130:
--
Reviewer: Timothy Farkas

> IllegalStateException: Read batch count [0] should be greater than zero
> ---
>
> Key: DRILL-7130
> URL: https://issues.apache.org/jira/browse/DRILL-7130
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.15.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.17.0
>
>
> The following exception is being hit when reading parquet data:
> Caused by: java.lang.IllegalStateException: Read batch count [0] should be 
> greater than zero at 
> org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509)
>  ~[drill-shaded-guava-23.0.jar:23.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624)
>  ~[vector-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716)
>  ~[vector-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288)
>  ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread salim achouche (JIRA)
salim achouche created DRILL-7130:
-

 Summary: IllegalStateException: Read batch count [0] should be 
greater than zero
 Key: DRILL-7130
 URL: https://issues.apache.org/jira/browse/DRILL-7130
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.17.0


The following exception is being hit when reading parquet data:

Caused by: java.lang.IllegalStateException: Read batch count [0] should be 
greater than zero at 
org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7100) parquet RecordBatchSizerManager : IllegalArgumentException: the requested size must be non-negative

2019-03-12 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-7100:
--
Reviewer: Timothy Farkas

> parquet RecordBatchSizerManager : IllegalArgumentException: the requested 
> size must be non-negative
> ---
>
> Key: DRILL-7100
> URL: https://issues.apache.org/jira/browse/DRILL-7100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.15.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Major
>
> Table has string columns that can range from 1024 bytes to 32MB in length, we 
> should be able to handle such wide string columns in parquet, when querying 
> from Drill.
> Hive Version 2.3.3
> Drill Version 1.15
> {noformat}
> CREATE TABLE temp.cust_bhsf_ce_blob_parquet (
>  event_id DECIMAL, 
>  valid_until_dt_tm string, 
>  blob_seq_num DECIMAL, 
>  valid_from_dt_tm string, 
>  blob_length DECIMAL, 
>  compression_cd DECIMAL, 
>  blob_contents string, 
>  updt_dt_tm string, 
>  updt_id DECIMAL, 
>  updt_task DECIMAL, 
>  updt_cnt DECIMAL, 
>  updt_applctx DECIMAL, 
>  last_utc_ts string, 
>  ccl_load_dt_tm string, 
>  ccl_updt_dt_tm string )
>  STORED AS PARQUET;
> {noformat}
>  
> The source table is stored as ORC format.
> Failing query.
> {noformat}
> SELECT event_id, BLOB_CONTENTS FROM hive.temp.cust_bhsf_ce_blob_parquet WHERE 
> event_id = 3443236037
> 2019-03-07 14:40:17,886 [237e8c79-0e9b-45d6-9134-0da95dba462f:frag:1:269] 
> INFO o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: the requested 
> size must be non-negative (the requested size must be non-negative)
> org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: the 
> requested size must be non-negative
> {noformat}
> Snippet from drillbit.log file
> {noformat}
> [Error Id: 41a4d597-f54d-42a6-be6d-5dbeb7f642ba ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181]
> at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>  [hadoop-common-2.7.0-mapr-1808.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_181]
> a

[jira] [Closed] (DRILL-7101) IllegalArgumentException when reading parquet data

2019-03-12 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche closed DRILL-7101.
-
Resolution: Duplicate

> IllegalArgumentException when reading parquet data
> --
>
> Key: DRILL-7101
> URL: https://issues.apache.org/jira/browse/DRILL-7101
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.15.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.16.0
>
>
> The Parquet reader fails with the below stack trace:
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181] at 
> javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>  [hadoop-common-2.7.0-mapr-1808.jar:na] at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
>  [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_181] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] 
> Caused by: java.lang.IllegalArgumentException: the requested size must be 
> non-negative at 
> org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
>  ~[drill-shaded-guava-23.0.jar:23.0] at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:224) 
> ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:211) 
> ~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:394)
>  ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:250)
>  ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:41)
>  ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:54)
>  ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
> org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.allocate(RecordBatchSizerManager.java:165)
>  ~[drill-

[jira] [Created] (DRILL-7101) IllegalArgumentException when reading parquet data

2019-03-12 Thread salim achouche (JIRA)
salim achouche created DRILL-7101:
-

 Summary: IllegalArgumentException when reading parquet data
 Key: DRILL-7101
 URL: https://issues.apache.org/jira/browse/DRILL-7101
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.16.0


The Parquet reader fails with the below stack trace:

at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) 
[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:69)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181] at 
javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181] at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
 [hadoop-common-2.7.0-mapr-1808.jar:na] at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
 [drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_181] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] Caused 
by: java.lang.IllegalArgumentException: the requested size must be non-negative 
at 
org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:224) 
~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:211) 
~[drill-memory-base-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:394) 
~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:250)
 ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:41)
 ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:54)
 ~[vector-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.allocate(RecordBatchSizerManager.java:165)
 ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.allocate(ParquetRecordReader.java:276)
 ~[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:221) 
[drill-java-exec-1.15.0.0-mapr.jar:1.15.0.0-mapr] at 
or

[jira] [Updated] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37

2019-01-31 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-7018:
--
Reviewer: Vitalii Diravka  (was: Boaz Ben-Zvi)

> Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet 
> File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 
> 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(256))
> 
>
> Key: DRILL-7018
> URL: https://issues.apache.org/jira/browse/DRILL-7018
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> alter system set `store.parquet.reader.int96_as_timestamp`= true
> run query witch projects a column of type Parquet INT96 timestamp with 31 
> nulls
> The following exception will be thrown:
> java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
> (expected: 0 <= readerIndex <= writerIndex <= capacity(256))
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 37

2019-01-30 Thread salim achouche (JIRA)
salim achouche created DRILL-7018:
-

 Summary: Drill Query (when 
store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: 
SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
(expected: 0 <= readerIndex <= writerIndex <= capacity(256))
 Key: DRILL-7018
 URL: https://issues.apache.org/jira/browse/DRILL-7018
 Project: Apache Drill
  Issue Type: Improvement
Reporter: salim achouche






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7018) Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 3

2019-01-30 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-7018:
-

  Assignee: salim achouche
 Affects Version/s: 1.14.0
Remaining Estimate: 24h
 Original Estimate: 24h
 Fix Version/s: 1.16.0
   Description: 
alter system set `store.parquet.reader.int96_as_timestamp`= true

run query witch projects a column of type Parquet INT96 timestamp with 31 nulls

The following exception will be thrown:

java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
(expected: 0 <= readerIndex <= writerIndex <= capacity(256))

 
   Component/s: Storage - Parquet

> Drill Query (when store.parquet.reader.int96_as_timestamp=true) on Parquet 
> File fails with Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 
> 0, writerIndex: 372 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(256))
> 
>
> Key: DRILL-7018
> URL: https://issues.apache.org/jira/browse/DRILL-7018
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> alter system set `store.parquet.reader.int96_as_timestamp`= true
> run query witch projects a column of type Parquet INT96 timestamp with 31 
> nulls
> The following exception will be thrown:
> java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 372 
> (expected: 0 <= readerIndex <= writerIndex <= capacity(256))
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-15 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6853:
--
Labels: pull-request-available  (was: )

> Parquet Complex Reader for nested schema should have configurable memory or 
> max records to fetch
> 
>
> Key: DRILL-6853
> URL: https://issues.apache.org/jira/browse/DRILL-6853
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Nitin Sharma
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Parquet Complex reader while fetching nested schema should have configurable 
> memory or max records to fetch and not default to 4000 records.
> While scanning TB of data with wider columns, this could easily cause OOM 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-15 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6853:
--
 Reviewer: Timothy Farkas
Fix Version/s: 1.15.0

> Parquet Complex Reader for nested schema should have configurable memory or 
> max records to fetch
> 
>
> Key: DRILL-6853
> URL: https://issues.apache.org/jira/browse/DRILL-6853
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Nitin Sharma
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> Parquet Complex reader while fetching nested schema should have configurable 
> memory or max records to fetch and not default to 4000 records.
> While scanning TB of data with wider columns, this could easily cause OOM 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6853) Parquet Complex Reader for nested schema should have configurable memory or max records to fetch

2018-11-15 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6853:
-

Assignee: salim achouche

> Parquet Complex Reader for nested schema should have configurable memory or 
> max records to fetch
> 
>
> Key: DRILL-6853
> URL: https://issues.apache.org/jira/browse/DRILL-6853
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Nitin Sharma
>Assignee: salim achouche
>Priority: Major
>
> Parquet Complex reader while fetching nested schema should have configurable 
> memory or max records to fetch and not default to 4000 records.
> While scanning TB of data with wider columns, this could easily cause OOM 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6410) Memory leak in Parquet Reader during cancellation

2018-10-08 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-6410.
---
Resolution: Fixed
  Reviewer: Timothy Farkas  (was: Parth Chandra)

> Memory leak in Parquet Reader during cancellation
> -
>
> Key: DRILL-6410
> URL: https://issues.apache.org/jira/browse/DRILL-6410
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> Occasionally, a memory leak is observed within the flat Parquet reader when 
> query cancellation is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6246) Build Failing in jdbc-all artifact

2018-09-28 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6246:
--
Labels: pull-request-available  (was: )

> Build Failing in jdbc-all artifact
> --
>
> Key: DRILL-6246
> URL: https://issues.apache.org/jira/browse/DRILL-6246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.13.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> * {color:#00}It was noticed that the build was failing because of the 
> jdbc-all artifact{color}
>  * {color:#00}The maximum compressed jar size was set to 32MB but we are 
> currently creating a JAR a bit larger than 32MB {color}
>  * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and 
> drill-1.13.0 (on my MacOS){color}
>  * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color}
>  * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color}
>  * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is 
> roughly 32MB){color}
>  * {color:#00}Compared then in more details jdbc-all-1.12.0 and 
> jdbc-all-1.13.0{color}
>  * {color:#00}The bulk of the increase is attributed to the calcite 
> artifact{color}
>  * {color:#00}Used to be 2MB (uncompressed) and now 22MB 
> (uncompressed){color}
>  * {color:#00}It is likely an exclusion problem {color}
>  * {color:#00}The jdbc-all-1.12.0 version has only two top packages 
> calcite/avatica/utils and calcite/avatica/remote{color}
>  * {color:#00}The jdbc-all-1.13.0  includes new packages (within 
> calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color}
> {color:#00} {color}
> {color:#00}I am planning to exclude these new sub-packages{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (DRILL-6246) Build Failing in jdbc-all artifact

2018-09-28 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reopened DRILL-6246:
---

Re-opening this issue as PR #1168 has been successfully tested and thus should 
provide a more optimal solution.

> Build Failing in jdbc-all artifact
> --
>
> Key: DRILL-6246
> URL: https://issues.apache.org/jira/browse/DRILL-6246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.13.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> * {color:#00}It was noticed that the build was failing because of the 
> jdbc-all artifact{color}
>  * {color:#00}The maximum compressed jar size was set to 32MB but we are 
> currently creating a JAR a bit larger than 32MB {color}
>  * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and 
> drill-1.13.0 (on my MacOS){color}
>  * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color}
>  * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color}
>  * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is 
> roughly 32MB){color}
>  * {color:#00}Compared then in more details jdbc-all-1.12.0 and 
> jdbc-all-1.13.0{color}
>  * {color:#00}The bulk of the increase is attributed to the calcite 
> artifact{color}
>  * {color:#00}Used to be 2MB (uncompressed) and now 22MB 
> (uncompressed){color}
>  * {color:#00}It is likely an exclusion problem {color}
>  * {color:#00}The jdbc-all-1.12.0 version has only two top packages 
> calcite/avatica/utils and calcite/avatica/remote{color}
>  * {color:#00}The jdbc-all-1.13.0  includes new packages (within 
> calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color}
> {color:#00} {color}
> {color:#00}I am planning to exclude these new sub-packages{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-26 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6706:
--
Labels: pull-request-available  (was: )

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
>  Labels: pull-request-available
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> 

[jira] [Updated] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-26 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6706:
--
Reviewer: Timothy Farkas

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
>  Labels: pull-request-available
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> groupscan=[Parq

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-26 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592997#comment-16592997
 ] 

salim achouche commented on DRILL-6706:
---

Got this code from Aggregator.

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> g

[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592807#comment-16592807
 ] 

salim achouche edited comment on DRILL-6706 at 8/26/18 5:28 AM:


Ran all functional and advanced tests successfully; though there was one 
test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". 
Debugged the issue and found an intriguing use-case:
 * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE 
XYZ = 'a';
 ** This query doesn't fail
 ** The code generation doesn't find it so it then generates default code which 
assumes that both right and left values are strings
 * When I fixed the ParquetSchema to insert the correct column name, then
 ** The code generator knows the column type to be an INT
 ** It then tried to cast the constant 'a' to an integer which throws an 
exception

This somehow collaborates [~vvysotskyi] comment which indicated he didn't want 
the column to be found. If this is the case, I feel there is a better fix which 
is to add a new indicator in the MetadataField to indicate this condition. This 
can give an opportunity to operators to better handle such cases.

 

[~timothyfarkas] and [~vvysotskyi] what do you guys think?


was (Author: sachouche):
Ran all functional and advanced tests successfully; though there was one 
test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". 
Debugged the issue and found an intriguing use-case:
 * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE 
XYZ = 'a';
 ** This query doesn't fail
 ** The code generation doesn't find it so it then generates default code which 
assumes that both right and left values are strings
 * When I fixed the ParquetSchema to insert the correct column name, then
 ** The code generator knows the column type to be an INT
 ** It then tried to cast the constant 'a' to an integer which throws an 
exception

This somehow collaborates [~vvysotskyi] comment which indicated he didn't want 
the column to be found. If this is the case, I feel there is a better fix which 
is to add a new indicator in the MetadataField to indicate this condition. This 
can give an opportunity to operators to better handle.

 

[~timothyfarkas] and [~vvysotskyi] what do you guys think?

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4]

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592807#comment-16592807
 ] 

salim achouche commented on DRILL-6706:
---

Ran all functional and advanced tests successfully; though there was one 
test-suite test which failed "TestParquetFilterPushDown.testBooleanPredicate". 
Debugged the issue and found an intriguing use-case:
 * Assume the following query before the fix: SELECT XYZ FROM MY_TABLE WHERE 
XYZ = 'a';
 ** This query doesn't fail
 ** The code generation doesn't find it so it then generates default code which 
assumes that both right and left values are strings
 * When I fixed the ParquetSchema to insert the correct column name, then
 ** The code generator knows the column type to be an INT
 ** It then tried to cast the constant 'a' to an integer which throws an 
exception

This somehow collaborates [~vvysotskyi] comment which indicated he didn't want 
the column to be found. If this is the case, I feel there is a better fix which 
is to add a new indicator in the MetadataField to indicate this condition. This 
can give an opportunity to operators to better handle.

 

[~timothyfarkas] and [~vvysotskyi] what do you guys think?

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.55

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592741#comment-16592741
 ] 

salim achouche commented on DRILL-6706:
---

*Findings after Code Inspection -*
 * Looking at the code, having back-ticks within SchemaPath is not necessary 
(this is my understanding)
 ** Back-ticks are useful in the context of a compound name such as 
T.`column.with.a.dot`.another_column
 ** As soon as *individual parts are parsed* then the back-ticks can be omitted


 * Then why Aggregator was able to handle columns having back-ticks (and 
whatever I could throw at it :))
 ** I found logic to strip the extra back-ticks within the aggregator code

*Suggested Fix -*
 * Stripping the back-tick from the MaterializedField names within Parquet 
seems to fix the HashJoin issue
 * The Aggregator didn't seem bothered by this change either
 * I am currently running the test-suite and the Apache pre-tests
 * If they pass, I'll push this PR for review

 

 

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY,

[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592731#comment-16592731
 ] 

salim achouche edited comment on DRILL-6706 at 8/25/18 11:24 PM:
-

I say it is as-designed because of the following points:
 * This comments seems to imply that we want to treat not found columns 
differently: _*{color:#ff}// col.toExpr() is used here as field name since 
we don't want to see these fields in the existing maps{color}*_ 
 * {color:#33}The rest of the code seems to work just fine (including 
sqlline); I have a hard time to believe that such an obvious bug would not be 
found{color}

[~timothyfarkas], I could be wrong but this is what happens when the code 
doesn't have adequate documentation; for example, I looked at the toExpr() 
method and couldn't find any useful documentation. Now, we are left to guess 
what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should 
inquiry with him about whether leaving the backtick is a bug or as-designed.


was (Author: sachouche):
I say it is as-designed because of the following points:
 * This comments seems to imply that we want to treat not found columns 
differently: _*{color:#ff}// col.toExpr() is used here as field name since 
we don't want to see these fields in the existing maps{color}*_ 
 * {color:#33}The rest of the code seems to work just find (including 
sqlline); I have a hard time to believe that such an obvious bug would not be 
found{color}

[~timothyfarkas], I could be wrong but this is what happens when the code 
doesn't have adequate documentation; for example, I looked at the toExpr() 
method and couldn't find any useful documentation. Now, we are left to guess 
what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should 
inquiry with him about whether leaving the backtick is a bug or as-designed.

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGION

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592731#comment-16592731
 ] 

salim achouche commented on DRILL-6706:
---

I say it is as-designed because of the following points:
 * This comments seems to imply that we want to treat not found columns 
differently: _*{color:#ff}// col.toExpr() is used here as field name since 
we don't want to see these fields in the existing maps{color}*_ 
 * {color:#33}The rest of the code seems to work just find (including 
sqlline); I have a hard time to believe that such an obvious bug would not be 
found{color}

[~timothyfarkas], I could be wrong but this is what happens when the code 
doesn't have adequate documentation; for example, I looked at the toExpr() 
method and couldn't find any useful documentation. Now, we are left to guess 
what was the intended functionality. [~aj_09] reviewed DRILL-4264, we should 
inquiry with him about whether leaving the backtick is a bug or as-designed.

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY 

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592707#comment-16592707
 ] 

salim achouche commented on DRILL-6706:
---

It seems that [~timothyfarkas] is not available; I'll take ownership of this 
JIRA. I am looking at the downstream operators and their ability to cope with 
columns with `` syntax; will try to mimic such logic within the 
BatchSizer (preferably) or the HashJoin code.

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, AN

[jira] [Assigned] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-25 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6706:
-

Assignee: salim achouche  (was: Timothy Farkas)

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> groupscan=[ParquetGroupScan [en

[jira] [Assigned] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-24 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6706:
-

Assignee: Timothy Farkas  (was: salim achouche)

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> groupscan=[ParquetGroupScan [en

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-24 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592314#comment-16592314
 ] 

salim achouche commented on DRILL-6706:
---

Yes, this is as-designed:
 * The change was done by commit id: d105950a7a9fb2ff3acd072ee65a51ef1fca120e
 * JIRA: [DRILL-4264: Allow field names to include 
dots|https://github.com/apache/drill/commit/d105950a7a9fb2ff3acd072ee65a51ef1fca120e#diff-cdcf7a999bb6a806125da3fa1d4a78b2]

 

 

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, A

[jira] [Comment Edited] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-24 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592303#comment-16592303
 ] 

salim achouche edited comment on DRILL-6706 at 8/24/18 11:16 PM:
-

* This condition of having columns with back-tick occurs when a selected column 
is missing
 * I located the code which does that and it seems as though it is on purpose 
(?)
 * There are also tests which look for this kind of behavior 
(TestExternalSortExec)
 * I also ran queries with missing columns and they worked fine:
 ** SELECT  count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000
 ** SELECT P.XYZ  from dfs.`.../part.*`* P  *//* SQLLINE is able to print the 
correct column name

Tim, it seems the code expects such behavior. Can you for now just ignore 
missing columns as they will have only nulls?

 

private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator 
output) throws SchemaChangeException {
 *{color:#ff}// col.toExpr() is used here as field name since we don't want 
to see these fields in the existing maps{color}*
 MaterializedField field = 
MaterializedField.create({color:#ff}*col.toExpr()*{color},
 Types.optional(TypeProtos.MinorType.INT));
 return (NullableIntVector) output.addField(field,
 TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL));
 }


was (Author: sachouche):
* This condition of having columns with back-tick occurs when a selected column 
is missing
 * I located the code which does that and it seems as though it is on purpose 
(?)
 * There are also tests which look for this kind of behavior 
(TestExternalSortExec)
 * I also ran queries with missing columns and they worked fine:
 ** SELECT  count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000
 ** SELECT P.XYZ  from dfs.`.../part.*` P  /* SQLLINE is able to print the 
correct column name */

Tim, it seems the code expects such behavior. Can you for now just ignore 
missing columns as they will have only nulls?

 

private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator 
output) throws SchemaChangeException {
 *{color:#FF}// col.toExpr() is used here as field name since we don't want 
to see these fields in the existing maps{color}*
 MaterializedField field = 
MaterializedField.create({color:#FF}*col.toExpr()*{color},
 Types.optional(TypeProtos.MinorType.INT));
 return (NullableIntVector) output.addField(field,
 TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL));
 }

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02   

[jira] [Commented] (DRILL-6706) Query with 10-way hash join fails with NullPointerException

2018-08-24 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592303#comment-16592303
 ] 

salim achouche commented on DRILL-6706:
---

* This condition of having columns with back-tick occurs when a selected column 
is missing
 * I located the code which does that and it seems as though it is on purpose 
(?)
 * There are also tests which look for this kind of behavior 
(TestExternalSortExec)
 * I also ran queries with missing columns and they worked fine:
 ** SELECT  count(*) from dfs.`.../part.*` P where P.xyz is null --> 2,000
 ** SELECT P.XYZ  from dfs.`.../part.*` P  /* SQLLINE is able to print the 
correct column name */

Tim, it seems the code expects such behavior. Can you for now just ignore 
missing columns as they will have only nulls?

 

private NullableIntVector createMissingColumn(SchemaPath col, OutputMutator 
output) throws SchemaChangeException {
 *{color:#FF}// col.toExpr() is used here as field name since we don't want 
to see these fields in the existing maps{color}*
 MaterializedField field = 
MaterializedField.create({color:#FF}*col.toExpr()*{color},
 Types.optional(TypeProtos.MinorType.INT));
 return (NullableIntVector) output.addField(field,
 TypeHelper.getValueVectorClass(TypeProtos.MinorType.INT, DataMode.OPTIONAL));
 }

> Query with 10-way hash join fails with NullPointerException
> ---
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.15.0
>Reporter: Abhishek Girish
>Assignee: salim achouche
>Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>  si.tpch_sf1_parquet.orders O,
>  si.tpch_sf1_parquet.lineitem L,
>  si.tpch_sf1_parquet.part P,
>  si.tpch_sf1_parquet.supplier S,
>  si.tpch_sf1_parquet.partsupp PS,
>  si.tpch_sf1_parquet.nation S_N,
>  si.tpch_sf1_parquet.region S_R,
>  si.tpch_sf1_parquet.nation C_N,
>  si.tpch_sf1_parquet.region C_R
> WHEREC.C_CUSTKEY = O.O_CUSTKEY 
> AND  O.O_ORDERKEY = L.L_ORDERKEY
> AND  L.L_PARTKEY = P.P_PARTKEY
> AND  L.L_SUPPKEY = S.S_SUPPKEY
> AND  P.P_PARTKEY = PS.PS_PARTKEY
> AND  P.P_SUPPKEY = PS.PS_SUPPKEY
> AND  S.S_NATIONKEY = S_N.N_NATIONKEY
> AND  S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND  C.C_NATIONKEY = C_N.N_NATIONKEY
> AND  C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01  Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03  HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, A

[jira] [Updated] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6709:
--
Labels: pull-request-available  (was: )

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-24 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6709:
--
Reviewer: Timothy Farkas

> Batch statistics logging utility needs to be extended to mid-stream operators
> -
>
> Key: DRILL-6709
> URL: https://issues.apache.org/jira/browse/DRILL-6709
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> A new batch logging utility has been created to log batch sizing messages to 
> drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
> so it can be used by mid-stream operators. In particular, mid-stream 
> operators have both incoming batches and outgoing batches, while Parquet only 
> has outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6685) Error in parquet record reader

2018-08-15 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6685:
--
Reviewer: Boaz Ben-Zvi

> Error in parquet record reader
> --
>
> Key: DRILL-6685
> URL: https://issues.apache.org/jira/browse/DRILL-6685
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6685
>
>
> This is the query:
> select VarbinaryValue1 from 
> dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit 
> 36;
> It appears to be caused by this commit:
> DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader
> aee899c1b26ebb9a5781d280d5a73b42c273d4d5
> This is the stack trace:
> {noformat}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: 
> Hadoop path: 
> /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet
> Total records read: 0
> Row group index: 0
> Records in row group: 1250
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 Index;
>   optional binary VarbinaryValue1;
>   optional int64 BigIntValue;
>   optional boolean BooleanValue;
>   optional int32 DateValue (DATE);
>   optional float FloatValue;
>   optional binary VarcharValue1 (UTF8);
>   optional double DoubleValue;
>   optional int32 IntegerValue;
>   optional int32 TimeValue (TIME_MILLIS);
>   optional int64 TimestampValue (TIMESTAMP_MILLIS);
>   optional binary VarbinaryValue2;
>   optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL);
>   optional binary VarcharValue2 (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional 
> int64 Index  [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED 
> [VarbinaryValue1] optional binary VarbinaryValue1  [PLAIN, RLE, BIT_PACKED], 
> 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue  
> [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED 
> [BooleanValue] optional boolean BooleanValue  [PLAIN, RLE, BIT_PACKED], 
> 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue 
> (DATE)  [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED 
> [FloatValue] optional float FloatValue  [PLAIN, RLE, BIT_PACKED], 8184959}, 
> ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 
> (UTF8)  [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED 
> [DoubleValue] optional double DoubleValue  [PLAIN, RLE, BIT_PACKED], 
> 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 
> IntegerValue  [PLAIN, RLE, BIT_PACKED], 10240111}, 
> ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue 
> (TIME_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10245154}, 
> ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue 
> (TIMESTAMP_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10250197}, 
> ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 
>  [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED 
> [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED 
> [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED 
> [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED 
> [VarcharValue2] optional binary VarcharValue2 (UTF8)  [PLAIN, RLE, 
> BIT_PACKED], 19677568}]}]}
> Fragment 0:0
> [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet
> sys.version is:
> 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bacChanged 
> PROJECT_OUTPUT_BATCH_SIZE to System + Session   30.07.2018 @ 17:12:53 PDT 
>   r...@mapr.com   30.07.2018 @ 17:25:21 PDT^M
> fourvarchar_asc_nulls70.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6685) Error in parquet record reader

2018-08-15 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581620#comment-16581620
 ] 

salim achouche commented on DRILL-6685:
---

Fixed a regression when addressing DRILL-6570:
 * When fixing DRILL-6570, we unified a bulk entry's max-values so that a 
false-positive (from fixed length to variable length) could happen smoothly
 * The regression was that the fixed length algorithm was relying on the 
previous bulk-entry max-value constraint

Fix -
 * I have re-introduced the constraint within the fixed-length reader
 * Added a test-suite using [@robert|https://github.com/robert] Hou parquet 
data to prevent such regressions

> Error in parquet record reader
> --
>
> Key: DRILL-6685
> URL: https://issues.apache.org/jira/browse/DRILL-6685
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6685
>
>
> This is the query:
> select VarbinaryValue1 from 
> dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit 
> 36;
> It appears to be caused by this commit:
> DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader
> aee899c1b26ebb9a5781d280d5a73b42c273d4d5
> This is the stack trace:
> {noformat}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: 
> Hadoop path: 
> /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet
> Total records read: 0
> Row group index: 0
> Records in row group: 1250
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 Index;
>   optional binary VarbinaryValue1;
>   optional int64 BigIntValue;
>   optional boolean BooleanValue;
>   optional int32 DateValue (DATE);
>   optional float FloatValue;
>   optional binary VarcharValue1 (UTF8);
>   optional double DoubleValue;
>   optional int32 IntegerValue;
>   optional int32 TimeValue (TIME_MILLIS);
>   optional int64 TimestampValue (TIMESTAMP_MILLIS);
>   optional binary VarbinaryValue2;
>   optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL);
>   optional binary VarcharValue2 (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional 
> int64 Index  [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED 
> [VarbinaryValue1] optional binary VarbinaryValue1  [PLAIN, RLE, BIT_PACKED], 
> 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue  
> [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED 
> [BooleanValue] optional boolean BooleanValue  [PLAIN, RLE, BIT_PACKED], 
> 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue 
> (DATE)  [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED 
> [FloatValue] optional float FloatValue  [PLAIN, RLE, BIT_PACKED], 8184959}, 
> ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 
> (UTF8)  [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED 
> [DoubleValue] optional double DoubleValue  [PLAIN, RLE, BIT_PACKED], 
> 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 
> IntegerValue  [PLAIN, RLE, BIT_PACKED], 10240111}, 
> ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue 
> (TIME_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10245154}, 
> ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue 
> (TIMESTAMP_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10250197}, 
> ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 
>  [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED 
> [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED 
> [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED 
> [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED 
> [VarcharValue2] optional binary VarcharValue2 (UTF8)  [PLAIN, RLE, 
> BIT_PACKED], 19677568}]}]}
> Fragment 0:0
> [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet
> sys.version is:
> 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bac

[jira] [Updated] (DRILL-6685) Error in parquet record reader

2018-08-15 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6685:
--
Labels: pull-request-available  (was: )

> Error in parquet record reader
> --
>
> Key: DRILL-6685
> URL: https://issues.apache.org/jira/browse/DRILL-6685
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6685
>
>
> This is the query:
> select VarbinaryValue1 from 
> dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit 
> 36;
> It appears to be caused by this commit:
> DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader
> aee899c1b26ebb9a5781d280d5a73b42c273d4d5
> This is the stack trace:
> {noformat}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: 
> Hadoop path: 
> /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet
> Total records read: 0
> Row group index: 0
> Records in row group: 1250
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 Index;
>   optional binary VarbinaryValue1;
>   optional int64 BigIntValue;
>   optional boolean BooleanValue;
>   optional int32 DateValue (DATE);
>   optional float FloatValue;
>   optional binary VarcharValue1 (UTF8);
>   optional double DoubleValue;
>   optional int32 IntegerValue;
>   optional int32 TimeValue (TIME_MILLIS);
>   optional int64 TimestampValue (TIMESTAMP_MILLIS);
>   optional binary VarbinaryValue2;
>   optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL);
>   optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL);
>   optional binary VarcharValue2 (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional 
> int64 Index  [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED 
> [VarbinaryValue1] optional binary VarbinaryValue1  [PLAIN, RLE, BIT_PACKED], 
> 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue  
> [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED 
> [BooleanValue] optional boolean BooleanValue  [PLAIN, RLE, BIT_PACKED], 
> 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue 
> (DATE)  [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED 
> [FloatValue] optional float FloatValue  [PLAIN, RLE, BIT_PACKED], 8184959}, 
> ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 
> (UTF8)  [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED 
> [DoubleValue] optional double DoubleValue  [PLAIN, RLE, BIT_PACKED], 
> 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 
> IntegerValue  [PLAIN, RLE, BIT_PACKED], 10240111}, 
> ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue 
> (TIME_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10245154}, 
> ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue 
> (TIMESTAMP_MILLIS)  [PLAIN, RLE, BIT_PACKED], 10250197}, 
> ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 
>  [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED 
> [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED 
> [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED 
> [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue 
> (INTERVAL)  [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED 
> [VarcharValue2] optional binary VarcharValue2 (UTF8)  [PLAIN, RLE, 
> BIT_PACKED], 19677568}]}]}
> Fragment 0:0
> [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet
> sys.version is:
> 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bacChanged 
> PROJECT_OUTPUT_BATCH_SIZE to System + Session   30.07.2018 @ 17:12:53 PDT 
>   r...@mapr.com   30.07.2018 @ 17:25:21 PDT^M
> fourvarchar_asc_nulls70.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-03 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6664:
--
Labels: ready-to-commit  (was: pull-request-available)

> Parquet reader should not allow batches with more than 64k rows
> ---
>
> Key: DRILL-6664
> URL: https://issues.apache.org/jira/browse/DRILL-6664
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
>
> The Drill configuration allows the Parquet reader to handle batches larger 
> than 64. We should limit this setting to 64k as several operators assume a 
> maximum batch size of 64k.
> NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-03 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6664:
--
Reviewer: Boaz Ben-Zvi

> Parquet reader should not allow batches with more than 64k rows
> ---
>
> Key: DRILL-6664
> URL: https://issues.apache.org/jira/browse/DRILL-6664
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> The Drill configuration allows the Parquet reader to handle batches larger 
> than 64. We should limit this setting to 64k as several operators assume a 
> maximum batch size of 64k.
> NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-03 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6664:
--
Labels: pull-request-available  (was: )

> Parquet reader should not allow batches with more than 64k rows
> ---
>
> Key: DRILL-6664
> URL: https://issues.apache.org/jira/browse/DRILL-6664
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> The Drill configuration allows the Parquet reader to handle batches larger 
> than 64. We should limit this setting to 64k as several operators assume a 
> maximum batch size of 64k.
> NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-03 Thread salim achouche (JIRA)
salim achouche created DRILL-6664:
-

 Summary: Parquet reader should not allow batches with more than 
64k rows
 Key: DRILL-6664
 URL: https://issues.apache.org/jira/browse/DRILL-6664
 Project: Apache Drill
  Issue Type: Improvement
Reporter: salim achouche
Assignee: salim achouche


The Drill configuration allows the Parquet reader to handle batches larger than 
64. We should limit this setting to 64k as several operators assume a maximum 
batch size of 64k.

NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Fix Version/s: 1.15.0

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Labels: ready-to-commit  (was: pull-request-available)

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-01 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Reviewer: Timothy Farkas

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: pull-request-available
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-01 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Labels: pull-request-available  (was: )

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: pull-request-available
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6660) Exchange operators Analysis

2018-08-01 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565708#comment-16565708
 ] 

salim achouche commented on DRILL-6660:
---

During the analysis it was realized that the Batch Sizing functionality (for 
exchange operators) is not enough on its own:
 * Exchanges are usually involved with MxN communication; using a 16MB 
(default) batch size for each output / input batch will not scale 
 * Instead, the analysis should include the exchange topology, performance 
implications, communication timing (should we fill an output batch even if it 
means a long delay?)

Thus, our recommendation is to club Resource Management, Batch Sizing, and 
Performance Tuning within a single initiative to avoid regression and achieve 
the desired goal of more scalability.

> Exchange operators Analysis
> ---
>
> Key: DRILL-6660
> URL: https://issues.apache.org/jira/browse/DRILL-6660
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> Analysis of what will it take to batch size the exchange operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6660) Exchange operators Analysis

2018-08-01 Thread salim achouche (JIRA)
salim achouche created DRILL-6660:
-

 Summary: Exchange operators Analysis
 Key: DRILL-6660
 URL: https://issues.apache.org/jira/browse/DRILL-6660
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Flow
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.15.0


Analysis of what will it take to batch size the exchange operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6659) Batch sizing functionality for exchange operators

2018-08-01 Thread salim achouche (JIRA)
salim achouche created DRILL-6659:
-

 Summary: Batch sizing functionality for exchange operators
 Key: DRILL-6659
 URL: https://issues.apache.org/jira/browse/DRILL-6659
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.15.0


This task aims at controlling memory usage within Drill's exchange operators. 
This is a continuation of the Drill Resource Management effort. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling

2018-07-23 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6626:
--
Labels: pull-request-available  (was: )

> Hash Aggregate: Index out of bounds with small output batch size and spilling
> -
>
> Key: DRILL-6626
> URL: https://issues.apache.org/jira/browse/DRILL-6626
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
>This new IOOB failure was seen while trying to recreate the NPE failure in 
> DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does 
> not seem to make a difference.
> This IOOB can easily be created with other large Hash-Agg queries that need 
> to spill. 
> The IOOB was caused after restricting the output batch size (to force many), 
> and the Hash Aggr memory (to force a spill):
> {code}
> 0: jdbc:drill:zk=local> alter system set 
> `drill.exec.memory.operator.output_batch_size` = 262144;
> +---++
> |  ok   |summary |
> +---++
> | true  | drill.exec.memory.operator.output_batch_size updated.  |
> +---++
> 1 row selected (0.106 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | exec.errors.verbose updated.  |
> +---+---+
> 1 row selected (0.081 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.hashagg.mem_limit updated.  |
> +---+--+
> 1 row selected (0.089 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> SELECT c_customer_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT ca_address_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_address`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cd_credit_rating FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT hd_buy_potential FROM 
> dfs.`/data/tpcds/sf1/parquet/household_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT i_item_id FROM 
> dfs.`/data/tpcds/sf1/parquet/item`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT p_promo_id FROM 
> dfs.`/data/tpcds/sf1/parquet/promotion`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT t_time_id FROM 
> dfs.`/data/tpcds/sf1/parquet/time_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT d_date_id FROM 
> dfs.`/data/tpcds/sf1/parquet/date_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT s_store_id FROM 
> dfs.`/data/tpcds/sf1/parquet/store`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT w_warehouse_id FROM 
> dfs.`/data/tpcds/sf1/parquet/warehouse`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT sm_ship_mode_id FROM 
> dfs.`/data/tpcds/sf1/parquet/ship_mode`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT r_reason_id FROM 
> dfs.`/data/tpcds/sf1/parquet/reason`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cc_call_center_id FROM 
> dfs.`/data/tpcds/sf1/parquet/call_center`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT web_site_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_site`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT wp_web_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_page`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cp_catalog_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/catalog_page`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7
> Fragment 4:0
> [Error Id: d44e64ea-f474-436e-94b0-61c61eec2227 on 172.30.8.176:31020]
>   (java.lang.IndexOutOfBoundsException) Index: 26474, Size: 7
> java.util.ArrayList.rangeCheck():653
> java.util.ArrayList.get():429
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.rehash():293
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1300():120
> 
> org.

[jira] [Assigned] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling

2018-07-23 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6626:
-

Assignee: salim achouche

> Hash Aggregate: Index out of bounds with small output batch size and spilling
> -
>
> Key: DRILL-6626
> URL: https://issues.apache.org/jira/browse/DRILL-6626
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: salim achouche
>Priority: Major
>
>This new IOOB failure was seen while trying to recreate the NPE failure in 
> DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does 
> not seem to make a difference.
> This IOOB can easily be created with other large Hash-Agg queries that need 
> to spill. 
> The IOOB was caused after restricting the output batch size (to force many), 
> and the Hash Aggr memory (to force a spill):
> {code}
> 0: jdbc:drill:zk=local> alter system set 
> `drill.exec.memory.operator.output_batch_size` = 262144;
> +---++
> |  ok   |summary |
> +---++
> | true  | drill.exec.memory.operator.output_batch_size updated.  |
> +---++
> 1 row selected (0.106 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | exec.errors.verbose updated.  |
> +---+---+
> 1 row selected (0.081 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.hashagg.mem_limit updated.  |
> +---+--+
> 1 row selected (0.089 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> SELECT c_customer_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT ca_address_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_address`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cd_credit_rating FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT hd_buy_potential FROM 
> dfs.`/data/tpcds/sf1/parquet/household_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT i_item_id FROM 
> dfs.`/data/tpcds/sf1/parquet/item`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT p_promo_id FROM 
> dfs.`/data/tpcds/sf1/parquet/promotion`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT t_time_id FROM 
> dfs.`/data/tpcds/sf1/parquet/time_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT d_date_id FROM 
> dfs.`/data/tpcds/sf1/parquet/date_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT s_store_id FROM 
> dfs.`/data/tpcds/sf1/parquet/store`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT w_warehouse_id FROM 
> dfs.`/data/tpcds/sf1/parquet/warehouse`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT sm_ship_mode_id FROM 
> dfs.`/data/tpcds/sf1/parquet/ship_mode`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT r_reason_id FROM 
> dfs.`/data/tpcds/sf1/parquet/reason`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cc_call_center_id FROM 
> dfs.`/data/tpcds/sf1/parquet/call_center`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT web_site_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_site`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT wp_web_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_page`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cp_catalog_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/catalog_page`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7
> Fragment 4:0
> [Error Id: d44e64ea-f474-436e-94b0-61c61eec2227 on 172.30.8.176:31020]
>   (java.lang.IndexOutOfBoundsException) Index: 26474, Size: 7
> java.util.ArrayList.rangeCheck():653
> java.util.ArrayList.get():429
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.rehash():293
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1300():120
> 
> org.apache.drill.exec.physical.impl.common.HashTableTempla

[jira] [Commented] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling

2018-07-23 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553377#comment-16553377
 ] 

salim achouche commented on DRILL-6626:
---

The IndexOutOfBoundException was happening during Hash Table rehash:
 * The was a regression when doing batch sizing
 * Each outgoing-batch needed to fix its hash values (which were cached)
 * Each outgoing batch should start at index (num-out-batches - 1) * 
MAX_BATCH_SZ; this is based on the insertion logic
 * The code instead used the real row count instead of MAX_BATCH_SZ

Fix - Put back the original code since the indexing scheme didn't change

> Hash Aggregate: Index out of bounds with small output batch size and spilling
> -
>
> Key: DRILL-6626
> URL: https://issues.apache.org/jira/browse/DRILL-6626
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Priority: Major
>
>This new IOOB failure was seen while trying to recreate the NPE failure in 
> DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does 
> not seem to make a difference.
> This IOOB can easily be created with other large Hash-Agg queries that need 
> to spill. 
> The IOOB was caused after restricting the output batch size (to force many), 
> and the Hash Aggr memory (to force a spill):
> {code}
> 0: jdbc:drill:zk=local> alter system set 
> `drill.exec.memory.operator.output_batch_size` = 262144;
> +---++
> |  ok   |summary |
> +---++
> | true  | drill.exec.memory.operator.output_batch_size updated.  |
> +---++
> 1 row selected (0.106 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | exec.errors.verbose updated.  |
> +---+---+
> 1 row selected (0.081 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.hashagg.mem_limit updated.  |
> +---+--+
> 1 row selected (0.089 seconds)
> 0: jdbc:drill:zk=local>
> 0: jdbc:drill:zk=local> SELECT c_customer_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT ca_address_id FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_address`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cd_credit_rating FROM 
> dfs.`/data/tpcds/sf1/parquet/customer_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT hd_buy_potential FROM 
> dfs.`/data/tpcds/sf1/parquet/household_demographics`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT i_item_id FROM 
> dfs.`/data/tpcds/sf1/parquet/item`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT p_promo_id FROM 
> dfs.`/data/tpcds/sf1/parquet/promotion`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT t_time_id FROM 
> dfs.`/data/tpcds/sf1/parquet/time_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT d_date_id FROM 
> dfs.`/data/tpcds/sf1/parquet/date_dim`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT s_store_id FROM 
> dfs.`/data/tpcds/sf1/parquet/store`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT w_warehouse_id FROM 
> dfs.`/data/tpcds/sf1/parquet/warehouse`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT sm_ship_mode_id FROM 
> dfs.`/data/tpcds/sf1/parquet/ship_mode`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT r_reason_id FROM 
> dfs.`/data/tpcds/sf1/parquet/reason`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cc_call_center_id FROM 
> dfs.`/data/tpcds/sf1/parquet/call_center`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT web_site_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_site`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT wp_web_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/web_page`
> . . . . . . . . . . . > UNION
> . . . . . . . . . . . > SELECT cp_catalog_page_id FROM 
> dfs.`/data/tpcds/sf1/parquet/catalog_page`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7
> Fragment 4:0
> [Error Id: d44e64ea-f474-436e-94b0-61c6

[jira] [Updated] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-21 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6622:
--
Labels: pull-request-available  (was: )

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-21 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6622:
--
Reviewer: Boaz Ben-Zvi

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-21 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551888#comment-16551888
 ] 

salim achouche commented on DRILL-6622:
---

Alright, just fixed this issue; there were two bugs in the Aggregator batch 
sizing logic:

Issue I
 * The aggregator runs in a loop to consume all input batches
 * The loop was updating the batch sizing stats after they were consumed
 * Assume output-row-count is 1 and we receive a batch with at least 32k + 1 
records
 * The code would create 32k output batches (one per incoming record) and then 
fails because of overflow
 * Fix - Now updating the batch sizing logic when a non-empty batch is received 
and before the processing loop

Issue II
 * The Aggregator has two main modules: AggregatorBatch and Aggregator objects
 * Both share the same "incoming" record batch instance
 * Though there is logic to spill incoming batches when under pressure
 * The batch sizing logic was not aware that when batches are spilled the 
shared "incoming" object instance will diverge; that is, the Aggregator object 
will mutate the incoming object
 * The batch sizer was being invoked with a stale "incoming" object (the one 
from the AggregatorBatch)
 * Fix - Update the  Aggregator code to always pass the active incoming object 
explicitly 

 

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-20 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551497#comment-16551497
 ] 

salim achouche edited comment on DRILL-6622 at 7/21/18 1:55 AM:


This looks like a serious bug:
 * The batch memory managers are somehow thinking that most incoming batches 
are empty
 * The aggregator used to create outgoing batches with exactly 2**16 max 
capacity
 * The memory manager erroneous stats makes it so the Aggregator is getting a 
max capacity of 1
 * This means that every unique group is being stored in its own outgoing batch
 * The Aggregator limits the max number of outgoing batches to 64k (since 
previously a batch could contain 64k entries); a 32bits indexing scheme 
subdivides this space into a couple (out-batch-idx, idx-within-batch)
 * A NullPointerException happens when this indexing scheme fails because of 
the large number of outgoing batches  (overflow)
 * The bug was there for awhile (when Aggregator was modified to support batch 
sizing) but the bug manifested itself only on a large number of unique groups

I am having now to reverse engineer the reason for the erroneous batch sizer 
stats.

 

 

 


was (Author: sachouche):
This looks like a serious bug:
 * The batch memory managers are somehow thinking that most incoming batches 
are empty
 * The aggregator used to create outgoing batches with exactly 2**16 max 
capacity
 * The memory manager erroneous stats make it so the Aggregator is getting a 
max capacity of 1
 * This meant that every unique group is being stored in its own outgoing batch
 * The Aggregator limits the max number of outgoing batches to 64k (since 
previously a batch could contain 64k entries); a 32bits indexing scheme 
subdivides this space into a couple (out-batch-idx, idx-within-batch)
 * A NullpointException happens when this indexing scheme fails becomes of the 
large number of outgoing batches  (overflow)
 * The bug was there for awhile (when Aggregator was modified to support batch 
sizing) but the bug manifested itself only on a large number of unique groups

I am having now to reverse engineer the reason for the erroneous batch sizer 
stats.

 

 

 

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNe

[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-20 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551497#comment-16551497
 ] 

salim achouche commented on DRILL-6622:
---

This looks like a serious bug:
 * The batch memory managers are somehow thinking that most incoming batches 
are empty
 * The aggregator used to create outgoing batches with exactly 2**16 max 
capacity
 * The memory manager erroneous stats make it so the Aggregator is getting a 
max capacity of 1
 * This meant that every unique group is being stored in its own outgoing batch
 * The Aggregator limits the max number of outgoing batches to 64k (since 
previously a batch could contain 64k entries); a 32bits indexing scheme 
subdivides this space into a couple (out-batch-idx, idx-within-batch)
 * A NullpointException happens when this indexing scheme fails becomes of the 
large number of outgoing batches  (overflow)
 * The bug was there for awhile (when Aggregator was modified to support batch 
sizing) but the bug manifested itself only on a large number of unique groups

I am having now to reverse engineer the reason for the erroneous batch sizer 
stats.

 

 

 

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-20 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551462#comment-16551462
 ] 

salim achouche commented on DRILL-6622:
---

[~priteshm],
 * Being able to reproduce this bug really helps
 * There is a regression in the aggregator's output batch management
 * Fixing the bug required that I spend sometime getting familiar with the code 
(which I have done now)
 * Hopefully, I am closing in on the reason of the NullPointerException
 * Will update my findings ASAP

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-20 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551168#comment-16551168
 ] 

salim achouche commented on DRILL-6622:
---

Realized that I was running with debug mode enabled which somehow changed this 
bug repro; now able to observe the failure with debug off.

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-20 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551095#comment-16551095
 ] 

salim achouche commented on DRILL-6622:
---

* Tried the query (on my mac) with SF10 and SF100 but no crash as reported by 
this Jira
 * Had multiple GC failures (with 4GB and 8GB of JVM heap memory); my guess, 
this is due to the overhead of spilling data (avoiding spilling makes the query 
succeed)
 * The query succeeded as soon as I bumped up the query direct memory per node 
from 2GB to 6GB

 

I'll now try to reproduce this issue on my 4 nodes cluster.

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6622) UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException

2018-07-19 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6622:
-

Assignee: salim achouche  (was: Boaz Ben-Zvi)

> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> ---
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: salim achouche
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json, 
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer 
> UNION
> SELECT ca_address_id FROM customer_address 
> UNION
> SELECT cd_credit_rating FROM customer_demographics 
> UNION
> SELECT hd_buy_potential FROM household_demographics 
> UNION
> SELECT i_item_id FROM item 
> UNION
> SELECT p_promo_id FROM promotion 
> UNION
> SELECT t_time_id FROM time_dim 
> UNION
> SELECT d_date_id FROM date_dim 
> UNION
> SELECT s_store_id FROM store 
> UNION
> SELECT w_warehouse_id FROM warehouse 
> UNION
> SELECT sm_ship_mode_id FROM ship_mode 
> UNION
> SELECT r_reason_id FROM reason 
> UNION
> SELECT cc_call_center_id FROM call_center 
> UNION
> SELECT web_site_id FROM web_site 
> UNION
> SELECT wp_web_page_id FROM web_page 
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-11 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Labels: ready-to-commit  (was: pull-request-available)

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation

2018-07-11 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6578:
--
Labels: ready-to-commit  (was: pull-request-available)

> Ensure the Flat Parquet Reader can handle query cancellation
> 
>
> Key: DRILL-6578
> URL: https://issues.apache.org/jira/browse/DRILL-6578
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> * The optimized Parquet reader uses an iterator style to load column data 
>  * We need to ensure the code can properly handle query cancellation even in 
> the presence of bugs within the hasNext() .. next() calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-11 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Labels: pull-request-available  (was: )

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-07-10 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539048#comment-16539048
 ] 

salim achouche commented on DRILL-6569:
---

Robert,

According to the original comment:
 * Using the DFS command is successful; this invokes the Parquet reader
 * Running the complex query (without the explicit DFS clause) fails; the stack 
trace indicates the Hive reader was invoked
 ** 
org.apache.drill.exec.store.hive.readers.{color:#d04437}*HiveParquetReader.*{color}next():54
 
 ** org.apache.drill.exec.physical.impl.ScanBatch.next():172

 

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538974#comment-16538974
 ] 

salim achouche commented on DRILL-6517:
---

[~khfaraaz], when queries are cancelled, we anticipate exceptions to be thrown 
(e.g., an interrupted thread will receive an exception on a blocking call). The 
questions which I am trying to figure out:
 * Is the IllegalException thrown only on query cancellation?
 * Is there a more important bug causing the foreman to cancel the query?

 

So I'll use your real cluster to debug this issue.

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-03 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532150#comment-16532150
 ] 

salim achouche commented on DRILL-6517:
---

* I ran the query around 10 times and it succeeded each time (running in 29 
minutes)
 * Bounced the Drillbit cluster and immediately one of the nodes became 
unresponsive
 * I launched a script to gather jstacks each minute; somehow the jstack failed 
and got the below kernel messages
 * VMware blogs indicated the VM is running out of resources
 * The interesting part is that the java illegal exception showed up again when 
cancellation happened{color:#f79232}Caused by: java.lang.IllegalStateException: 
Record count not set for this vector container{color}

{color:#FF}Message from syslogd@mfs133 at Jul  3 18:48:27 ...{color}

{color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 21s! 
[java:12219] {color}

{color:#FF}Message from syslogd@mfs133 at Jul  3 18:48:27 ...{color}

{color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#3 stuck for 25s! 
[java:16991]{color}

{color:#FF}Message from syslogd@mfs133 at Jul  3 18:48:27 ...{color}

{color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#4 stuck for 25s! 
[java:17633]{color}

{color:#FF}Message from syslogd@mfs133 at Jul  3 18:48:27 ...{color}

{color:#FF} kernel:NMI watchdog: BUG: soft lockup - CPU#5 stuck for 25s! 
[java:27059]{color}

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRe

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-03 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532019#comment-16532019
 ] 

salim achouche commented on DRILL-6517:
---

After debugging this issue, noticed the thrown exception was masking the real 
problem:
 * Launched the query the first time on a 4 nodes cluster (made up of VMs)
 * Query memory per node 10Gb; spilling not enabled (at least not explicitly)
 * The query ran in 35min and succeded
 * Re-launched the same query but this time node-3 was irresponsive 
 * After one hour the query failed; the client error was that node-3 was lost 
 * Within the Drillbit logs, the set-count error issue was thrown though after 
the foreman cancelled the query

I'll now focus on understanding why the system is getting in this state when 
running for the second time; the fact that I am using VMs is not helping as 
network issues are very common.

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-03 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531837#comment-16531837
 ] 

salim achouche commented on DRILL-6517:
---

If this is the case, then I'll fix that; though my impression is that the 
exception is thrown in the last HJ where both inputs came from non-parquet. I 
am currently re-running the test with new instrumentation..

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.r

[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-03 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Labels: pull-request-available ready-to-commit  (was: 
pull-request-available)

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Reviewer: Boaz Ben-Zvi

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Labels: pull-request-available  (was: )

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6579:
--
Summary: Add sanity checks to Parquet Reader   (was: Sanity checks to avoid 
infinite loops)

> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6579) Sanity checks to avoid infinite loops

2018-07-02 Thread salim achouche (JIRA)
salim achouche created DRILL-6579:
-

 Summary: Sanity checks to avoid infinite loops
 Key: DRILL-6579
 URL: https://issues.apache.org/jira/browse/DRILL-6579
 Project: Apache Drill
  Issue Type: Improvement
Reporter: salim achouche
Assignee: salim achouche


Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6569:
-

Assignee: Robert Hou  (was: salim achouche)

 

 

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Critical
> Fix For: 1.14.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordB

[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-07-02 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530733#comment-16530733
 ] 

salim achouche commented on DRILL-6569:
---

[~rhou],

This is a Hive Parquet reader issue (not the native Drill Parquet reader).

 

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecor

[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6578:
--
Labels: pull-request-available  (was: )

> Ensure the Flat Parquet Reader can handle query cancellation
> 
>
> Key: DRILL-6578
> URL: https://issues.apache.org/jira/browse/DRILL-6578
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> * The optimized Parquet reader uses an iterator style to load column data 
>  * We need to ensure the code can properly handle query cancellation even in 
> the presence of bugs within the hasNext() .. next() calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation

2018-07-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6578:
--
Reviewer: Vlad Rozov

> Ensure the Flat Parquet Reader can handle query cancellation
> 
>
> Key: DRILL-6578
> URL: https://issues.apache.org/jira/browse/DRILL-6578
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
>
> * The optimized Parquet reader uses an iterator style to load column data 
>  * We need to ensure the code can properly handle query cancellation even in 
> the presence of bugs within the hasNext() .. next() calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation

2018-07-02 Thread salim achouche (JIRA)
salim achouche created DRILL-6578:
-

 Summary: Ensure the Flat Parquet Reader can handle query 
cancellation
 Key: DRILL-6578
 URL: https://issues.apache.org/jira/browse/DRILL-6578
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Reporter: salim achouche
Assignee: salim achouche


* The optimized Parquet reader uses an iterator style to load column data 
 * We need to ensure the code can properly handle query cancellation even in 
the presence of bugs within the hasNext() .. next() calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6560) Allow options for controlling the batch size per operator

2018-06-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6560:
--
Reviewer: Karthikeyan Manivannan

> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6560) Allow options for controlling the batch size per operator

2018-06-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6560:
--
Labels: pull-request-available  (was: )

> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader

2018-06-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6570:
--
Remaining Estimate: 2h
 Original Estimate: 2h
  Reviewer: Kunal Khatua
Issue Type: Bug  (was: Improvement)

> IndexOutOfBoundsException when using Flat Parquet  Reader
> -
>
> Key: DRILL-6570
> URL: https://issues.apache.org/jira/browse/DRILL-6570
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> * The Parquet Reader creates a reusable bulk entry based on the column 
> precision
>  * It uses the column precision for optimizing the intermediary heap buffers
>  * It first detected the column was fixed length but then it reverted this 
> assumption when the column changed precision
>  * This step was fine except the bulk entry memory requirement changed though 
> the code didn't update the bulk entry intermediary buffers
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader

2018-06-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6570:
--
Labels: pull-request-available  (was: )

> IndexOutOfBoundsException when using Flat Parquet  Reader
> -
>
> Key: DRILL-6570
> URL: https://issues.apache.org/jira/browse/DRILL-6570
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> * The Parquet Reader creates a reusable bulk entry based on the column 
> precision
>  * It uses the column precision for optimizing the intermediary heap buffers
>  * It first detected the column was fixed length but then it reverted this 
> assumption when the column changed precision
>  * This step was fine except the bulk entry memory requirement changed though 
> the code didn't update the bulk entry intermediary buffers
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader

2018-06-29 Thread salim achouche (JIRA)
salim achouche created DRILL-6570:
-

 Summary: IndexOutOfBoundsException when using Flat Parquet  Reader
 Key: DRILL-6570
 URL: https://issues.apache.org/jira/browse/DRILL-6570
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


* The Parquet Reader creates a reusable bulk entry based on the column precision
 * It uses the column precision for optimizing the intermediary heap buffers
 * It first detected the column was fixed length but then it reverted this 
assumption when the column changed precision
 * This step was fine except the bulk entry memory requirement changed though 
the code didn't update the bulk entry intermediary buffers

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6560) Allow options for controlling the batch size per operator

2018-06-29 Thread salim achouche (JIRA)
salim achouche created DRILL-6560:
-

 Summary: Allow options for controlling the batch size per operator
 Key: DRILL-6560
 URL: https://issues.apache.org/jira/browse/DRILL-6560
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


This Jira is for internal Drill DEV use; the following capabilities are needed 
for automating the batch sizing functionality testing:
 * Control the enablement of batch sizing statistics at session (per query) and 
server level (all queries)
 * Control the granularity of batch sizing statistics (summary or verbose)
 * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6539) Record count not set for this vector container error

2018-06-26 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523959#comment-16523959
 ] 

salim achouche commented on DRILL-6539:
---

I have been trying to reproduce this issue on my mac os but Khurram' TPCDS test 
succeeded. [~ppenumarthy] Do you have another repro case?

> Record count not set for this vector container error 
> -
>
> Key: DRILL-6539
> URL: https://issues.apache.org/jira/browse/DRILL-6539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> This error is randomly seen when executing queries.
> [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector 
> container
> com.google.common.base.Preconditions.checkState():173
> org.apache.drill.exec.record.VectorContainer.getRecordCount():394
> org.apache.drill.exec.record.RecordBatchSizer.():681
> org.apache.drill.exec.record.RecordBatchSizer.():665
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891
> 
> org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
>   

[jira] [Created] (DRILL-6528) Planner setting the wrong number of records to read (Parquet Reader)

2018-06-22 Thread salim achouche (JIRA)
salim achouche created DRILL-6528:
-

 Summary: Planner setting the wrong number of records to read 
(Parquet Reader)
 Key: DRILL-6528
 URL: https://issues.apache.org/jira/browse/DRILL-6528
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: salim achouche


- Recently fixed the Flat Parquet reader to honor the number of records to read
 - Though few tests failed:
TestUnionDistinct.testUnionDistinctEmptySides:356 Different number of records 
returned expected:<5> but was:<1>
TestUnionAll.testUnionAllEmptySides:355 Different number of records returned 
expected:<5> but was:<1>

 - I debugged one of them and realized the Planner was setting the wrong number 
of rows to read (in this case, one)
 - You can put a break point and see this happening:
Class: ParquetGroupScan
Method: updateRowGroupInfo(long maxRecords)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6147) Limit batch size for Flat Parquet Reader

2018-06-20 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6147:
--
Labels: pull-request-available  (was: )

> Limit batch size for Flat Parquet Reader
> 
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6513) Drill should only allow valid values when users set planner.memory.max_query_memory_per_node

2018-06-19 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6513:
--
Labels: pull-request-available  (was: )

> Drill should only allow valid values when users set 
> planner.memory.max_query_memory_per_node
> 
>
> Key: DRILL-6513
> URL: https://issues.apache.org/jira/browse/DRILL-6513
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> The "planner.memory.max_query_memory_per_node" configuration can be currently 
> set to values higher than the Drillbit Direct Memory configuration. The goal 
> of this Jira is to fail queries with such an erroneous configuration to avoid 
> runtime OOM.
> NOTE - The current semantic of the maximum query memory per node 
> configuration is that the end user has computed valid values especially 
> knowing the current Drill limitations. Such values have to account for 
> Netty's overhead (memory pools), shared pools (e.g., network exchanges), and 
> concurrent query execution. This Jira should not be used to also cover such 
> use-cases. The Drill Resource Management feature has the means to automate 
> query quotas and the associated validation. We should create another Jira 
> requesting the enhanced validations contracts under the umbrella of the 
> Resource Management feature.   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6513) Drill should only allow valid values when users set planner.memory.max_query_memory_per_node

2018-06-18 Thread salim achouche (JIRA)
salim achouche created DRILL-6513:
-

 Summary: Drill should only allow valid values when users set 
planner.memory.max_query_memory_per_node
 Key: DRILL-6513
 URL: https://issues.apache.org/jira/browse/DRILL-6513
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


The "planner.memory.max_query_memory_per_node" configuration can be currently 
set to values higher than the Drillbit Direct Memory configuration. The goal of 
this Jira is to fail queries with such an erroneous configuration to avoid 
runtime OOM.

NOTE - The current semantic of the maximum query memory per node configuration 
is that the end user has computed valid values especially knowing the current 
Drill limitations. Such values have to account for Netty's overhead (memory 
pools), shared pools (e.g., network exchanges), and concurrent query execution. 
This Jira should not be used to also cover such use-cases. The Drill Resource 
Management feature has the means to automate query quotas and the associated 
validation. We should create another Jira requesting the enhanced validations 
contracts under the umbrella of the Resource Management feature.   

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6246) Build Failing in jdbc-all artifact

2018-06-18 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche closed DRILL-6246.
-
Resolution: Fixed

> Build Failing in jdbc-all artifact
> --
>
> Key: DRILL-6246
> URL: https://issues.apache.org/jira/browse/DRILL-6246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.13.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> * {color:#00}It was noticed that the build was failing because of the 
> jdbc-all artifact{color}
>  * {color:#00}The maximum compressed jar size was set to 32MB but we are 
> currently creating a JAR a bit larger than 32MB {color}
>  * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and 
> drill-1.13.0 (on my MacOS){color}
>  * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color}
>  * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color}
>  * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is 
> roughly 32MB){color}
>  * {color:#00}Compared then in more details jdbc-all-1.12.0 and 
> jdbc-all-1.13.0{color}
>  * {color:#00}The bulk of the increase is attributed to the calcite 
> artifact{color}
>  * {color:#00}Used to be 2MB (uncompressed) and now 22MB 
> (uncompressed){color}
>  * {color:#00}It is likely an exclusion problem {color}
>  * {color:#00}The jdbc-all-1.12.0 version has only two top packages 
> calcite/avatica/utils and calcite/avatica/remote{color}
>  * {color:#00}The jdbc-all-1.13.0  includes new packages (within 
> calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color}
> {color:#00} {color}
> {color:#00}I am planning to exclude these new sub-packages{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6447) Unsupported Operation when reading parquet data

2018-05-29 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494167#comment-16494167
 ] 

salim achouche commented on DRILL-6447:
---

I have fixed this issue but [~vrozov] indicated he incorporated the fix as part 
of the Parquet upgrade. [~vrozov] can you please close my PR when this issue 
has been fixed.

> Unsupported Operation when reading parquet data
> ---
>
> Key: DRILL-6447
> URL: https://issues.apache.org/jira/browse/DRILL-6447
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> An exception is thrown when reading Parquet data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6447) Unsupported Operation when reading parquet data

2018-05-29 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche reassigned DRILL-6447:
-

Assignee: Vlad Rozov  (was: salim achouche)

> Unsupported Operation when reading parquet data
> ---
>
> Key: DRILL-6447
> URL: https://issues.apache.org/jira/browse/DRILL-6447
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> An exception is thrown when reading Parquet data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6147) Limit batch size for Flat Parquet Reader

2018-05-25 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6147:
--
Reviewer: Parth Chandra

> Limit batch size for Flat Parquet Reader
> 
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6447) Unsupported Operation when reading parquet data

2018-05-25 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6447:
--
Reviewer: Arina Ielchiieva

> Unsupported Operation when reading parquet data
> ---
>
> Key: DRILL-6447
> URL: https://issues.apache.org/jira/browse/DRILL-6447
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> An exception is thrown when reading Parquet data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6447) Unsupported Operation when reading parquet data

2018-05-25 Thread salim achouche (JIRA)
salim achouche created DRILL-6447:
-

 Summary: Unsupported Operation when reading parquet data
 Key: DRILL-6447
 URL: https://issues.apache.org/jira/browse/DRILL-6447
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.14.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


An exception is thrown when reading Parquet data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5847) Flat Parquet Reader Performance Analysis

2018-05-22 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-5847.
---
   Resolution: Fixed
Fix Version/s: 1.14.0

> Flat Parquet Reader Performance Analysis
> 
>
> Key: DRILL-5847
> URL: https://issues.apache.org/jira/browse/DRILL-5847
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: Drill Framework Enhancements.pdf, Flat Parquet Scanner 
> Enhancements Presentation.pdf
>
>
> This task is to analyze the Flat Parquet Reader logic looking for performance 
> improvements opportunities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5848) Implement Parquet Columnar Processing & Use Bulk APIs for processing

2018-05-22 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-5848.
---
   Resolution: Fixed
Fix Version/s: 1.14.0

> Implement Parquet Columnar Processing & Use Bulk APIs for processing
> 
>
> Key: DRILL-5848
> URL: https://issues.apache.org/jira/browse/DRILL-5848
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> * Change Flat Parquet Reader processing from row based to columnar
> * Use Bulk APIs during the parsing and data loading phase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-05-16 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-5846:
--
Labels: performance ready-to-commit  (was: performance)

> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance, ready-to-commit
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, 
> 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-05-11 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6348:
--

[~parthc], can you please review this task?

 

Thanks!

> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6410) Memory leak in Parquet Reader during cancellation

2018-05-11 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6410:
--
Reviewer: Parth Chandra

Created pull request [1257|https://github.com/apache/drill/pull/1257/commits] 
to address this bug.

> Memory leak in Parquet Reader during cancellation
> -
>
> Key: DRILL-6410
> URL: https://issues.apache.org/jira/browse/DRILL-6410
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> Occasionally, a memory leak is observed within the flat Parquet reader when 
> query cancellation is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6410) Memory leak in Parquet Reader during cancellation

2018-05-11 Thread salim achouche (JIRA)
salim achouche created DRILL-6410:
-

 Summary: Memory leak in Parquet Reader during cancellation
 Key: DRILL-6410
 URL: https://issues.apache.org/jira/browse/DRILL-6410
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Reporter: salim achouche
Assignee: salim achouche


Occasionally, a memory leak is observed within the flat Parquet reader when 
query cancellation is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-05-07 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466463#comment-16466463
 ] 

salim achouche commented on DRILL-5846:
---

[~parthc],

Can you please review this Jira PR now that I have provided a detailed 
performance analysis (DRILL-6301).

> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, 
> 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6301) Parquet Performance Analysis

2018-05-07 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-6301.
---
Resolution: Fixed
  Reviewer: Pritesh Maker

This is an analytical task.

> Parquet Performance Analysis
> 
>
> Key: DRILL-6301
> URL: https://issues.apache.org/jira/browse/DRILL-6301
> Project: Apache Drill
>  Issue Type: Task
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> _*Description -*_
>  * DRILL-5846 is meant to improve the Flat Parquet reader performance
>  * The associated implementation resulted in a 2x - 4x performance improvement
>  * Though during the review process ([pull 
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>  
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
>  * The main reasons for using byte arrays for intermediary processing is to 
> a) avoid the high cost of the DrillBuf checks (especially the reference 
> counting) and b) benefit from some observed Java optimizations when accessing 
> byte arrays
>  * Starting with version 1.12.0, the DrillBuf enablement checks have been 
> refined so that memory access and reference counting checks can be enabled 
> independently
>  * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the 
> performance gap between heap vs direct memory  is very narrow except for few 
> use-cases
>  * There are also concerns that the extra copy step (from direct memory into 
> byte arrays) will have a negative effect on performance; note that this 
> overhead was not observed using Intel's Vtune as the intermediary buffer were 
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 
> cache during columnar processing.
> _*Goal*_ 
>  * The Flat Parquet reader is amongst the few Drill columnar operators
>  * It is imperative that we agree on the most optimal processing pattern so 
> that the decisions that we take within this Jira are not only applied to 
> Parquet but to all Drill columnar operators   
> _*Methodology*_ 
>  # Assess the performance impact of using intermediary byte arrays (as 
> described above)
>  # Prototype a solution using Direct Memory and DrillBuf checks off, access 
> checks on, all checks on
>  # Make an educated decision on which processing pattern should be adopted
>  # Decide whether it is ok to use Java's unsafe API (and through what 
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >