[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539560#comment-16539560
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

sohami commented on a change in pull request #1373: DRILL-6517: Hash-Join: If 
not OK, exit early from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373#discussion_r201563764
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -258,6 +251,13 @@ protected boolean prefetchFirstBatchFromBothSides() {
   return false;
 
 Review comment:
   AFAIK NONE is not supposed to have a valid incoming container associated 
with it 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539461#comment-16539461
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

Ben-Zvi commented on issue #1373: DRILL-6517: Hash-Join: If not OK, exit early 
from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373#issuecomment-404023176
 
 
   Thanks @ppadma ; lets have @ilooner look at this too since he made some of 
the relevant changes (e.g., *getRecordCount()* for *RemovingRecordBatch*).
   The initial thesis was that this bug was triggering DRILL-6453 (query fails 
after 2h11m). Unfortunately now with this fix it seems the other way around 
(that failure caused those STOP outcomes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-07-10 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539458#comment-16539458
 ] 

Boaz Ben-Zvi commented on DRILL-6453:
-

The jstacks before the failure look like normal work; the upper hash joins are 
"sniffing", and the lower ones are doing actual work (building the inner, or 
probing with the outer). None of the logs shows any spilling (though spilling 
did happen).
  We thought that DRILL-6517 was the trigger for this failure (cancel after 
2:11); but now with the fix it looks the other way - this failure was causing 
DRILL-6517 to show.


> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6453) TPC-DS query 72 has regressed

2018-07-10 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539436#comment-16539436
 ] 

Khurram Faraaz edited comment on DRILL-6453 at 7/11/18 1:38 AM:


Query 72 fails (is marked as Canceled) , it cancels it self after running for 
2hrs and 11 mins. 

Drill 1.14.0 git.commit.id.abbrev=eb946b0

jstack is attached here, from few mins just before the query canceled itself 
and after it was canceled.[^jstack_29173_June_10_2018_b.txt]


was (Author: khfaraaz):
Query 72 fails (is marked as Canceled) , it cancels it self after running for 
2hrs and 11 mins. 

jstack is attached here, from few mins just before the query canceled itself 
and after it was canceled.[^jstack_29173_June_10_2018_b.txt]

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6453) TPC-DS query 72 has regressed

2018-07-10 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6453:
--
Attachment: jstack_29173_June_10_2018.txt
jstack_29173_June_10_2018_e.txt
jstack_29173_June_10_2018_d.txt
jstack_29173_June_10_2018_c.txt
jstack_29173_June_10_2018_b.txt

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-07-10 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539436#comment-16539436
 ] 

Khurram Faraaz commented on DRILL-6453:
---

Query 72 fails (is marked as Canceled) , it cancels it self after running for 
2hrs and 11 mins. 

jstack is attached here, from few mins just before the query canceled itself 
and after it was canceled.[^jstack_29173_June_10_2018_b.txt]

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6453) TPC-DS query 72 has regressed

2018-07-10 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6453:
--
Attachment: jstack_29173_June_10_2018.txt
jstack_29173_June_10_2018_e.txt
jstack_29173_June_10_2018_d.txt
jstack_29173_June_10_2018_c.txt
jstack_29173_June_10_2018_b.txt

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539434#comment-16539434
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

sachouche commented on issue #1355: DRILL-6560: Enhanced the batch statistics 
logging enablement
URL: https://github.com/apache/drill/pull/1355#issuecomment-404015591
 
 
   @bitblender and @ilooner 
   I have responded and implemented Karthik's feedback. Can you please approve 
before the drill 1.14 deadline?
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539428#comment-16539428
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

ppadma commented on issue #1373: DRILL-6517: Hash-Join: If not OK, exit early 
from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373#issuecomment-404013626
 
 
   +1. LGTM. Thanks for fixing this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539427#comment-16539427
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

superbstreak commented on issue #1366: [DRILL-6581] C++ Client SSL 
Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#issuecomment-404013127
 
 
   @sohami updated. Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539425#comment-16539425
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201539246
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/BitToUserConnectionIterator.java
 ##
 @@ -99,14 +102,18 @@ public void remove() {
 
   public static class ConnectionInfo {
 public String user;
+@Nonnull
 
 Review comment:
   Thanks for pointing out the availability of these annotations. Didn't want 
to reinvent the wheel. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539424#comment-16539424
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201539183
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordDataType.java
 ##
 @@ -47,17 +48,22 @@
* @return the constructed type
*/
   public final RelDataType getRowType(RelDataTypeFactory factory) {
-final List types = getFieldSqlTypeNames();
+final List> types = 
getFieldSqlTypeNames();
 
 Review comment:
   There are just 2 lists : `names` and `types`. 
   The third list (`fields`) is used as `factory.createStructType(fields, 
names)`
   I'm reluctant to change abstract methods because they might have usage 
beyond the current SystemTable context.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539401#comment-16539401
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201533864
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordDataType.java
 ##
 @@ -47,17 +48,22 @@
* @return the constructed type
*/
   public final RelDataType getRowType(RelDataTypeFactory factory) {
-final List types = getFieldSqlTypeNames();
+final List> types = 
getFieldSqlTypeNames();
 
 Review comment:
   Better, but we still have three correlated lists. Can you combine the four 
fields into a single struct (class) and have a single list?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539400#comment-16539400
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201533958
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/BitToUserConnectionIterator.java
 ##
 @@ -99,14 +102,18 @@ public void remove() {
 
   public static class ConnectionInfo {
 public String user;
+@Nonnull
 
 Review comment:
   Nice use of the Java native annotation!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539399#comment-16539399
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201533730
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordDataType.java
 ##
 @@ -49,15 +55,25 @@
   public final RelDataType getRowType(RelDataTypeFactory factory) {
 final List types = getFieldSqlTypeNames();
 final List names = getFieldNames();
+final List nullables = getFieldNullability();
 final List fields = Lists.newArrayList();
-for (final SqlTypeName typeName : types) {
+Iterator typesIter = types.listIterator();
+Iterator nullabilityIter = nullables.listIterator();
 
 Review comment:
   This is, in fact, the gist of the comment. While it is understandable to 
want to leave he existing code as-is, adding more lists to the existing 
parallel lists is not an improvement: it has caused a semi-wrong initial 
implementation to become overly cumbersome.
   
   If the implementation is local to this class or module, then leaving the 
code cleaner after your change than when you found it is generally a good thing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539397#comment-16539397
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on issue #1371: DRILL-6588: Make Sys tables of nullable 
datatypes
URL: https://github.com/apache/drill/pull/1371#issuecomment-404005272
 
 
   @paul-rogers I've made the changes to use Java's Annotation class `Nonnull`
   Please review the rest of the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539388#comment-16539388
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

Ben-Zvi commented on a change in pull request #1373: DRILL-6517: Hash-Join: If 
not OK, exit early from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373#discussion_r201531287
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -258,6 +251,13 @@ protected boolean prefetchFirstBatchFromBothSides() {
   return false;
 
 Review comment:
   The case of only one side receiving NONE is legit -- for those 
RIGHT/LEFT/FULL OUTER joins, where the other side is still returned.
   As for the update() on a NONE -- probably any setting of NONE also 
initializes the container's record count to zero.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539382#comment-16539382
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

Ben-Zvi commented on a change in pull request #1358:  DRILL-6516: EMIT support 
in streaming agg
URL: https://github.com/apache/drill/pull/1358#discussion_r201529291
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -394,4 +394,47 @@ public void testLateral_HashAgg_with_nulls() throws 
Exception {
   .baselineValues("dd",222L)
   .build().run();
   }
+
+  @Test
+  public void testMultipleBatchesLateral_WithStreamingAgg() throws Exception {
+String sql = "SELECT t2.maxprice FROM (SELECT customer.c_orders AS 
c_orders FROM "
++ "dfs.`lateraljoin/multipleFiles/` customer) t1, LATERAL (SELECT 
CAST(MAX(t.ord.o_totalprice)"
++ " AS int) AS maxprice FROM UNNEST(t1.c_orders) t(ord) GROUP BY 
t.ord.o_orderstatus) t2";
+
+testBuilder()
+.optionSettingQueriesForTestQuery("alter session set `%s` = true",
+PlannerSettings.STREAMAGG.getOptionName())
+.sqlQuery(sql)
+.unOrdered()
+.baselineColumns("maxprice")
+.baselineValues(367190)
+.baselineValues(316347)
+.baselineValues(146610)
+.baselineValues(306996)
+.baselineValues(235695)
+.baselineValues(177819)
+.build().run();
+  }
+
+  @Test
+  public void testLateral_StreamingAgg_with_nulls() throws Exception {
+String sql = "SELECT key, t3.dsls FROM cp.`lateraljoin/with_nulls.json` t 
LEFT OUTER "
++ "JOIN LATERAL (SELECT DISTINCT t2.sls AS dsls FROM UNNEST(t.sales) 
t2(sls)) t3 ON TRUE";
+
+testBuilder()
+.optionSettingQueriesForTestQuery("alter session set `%s` = true",
+PlannerSettings.STREAMAGG.getOptionName())
+.sqlQuery(sql)
+.unOrdered()
+.baselineColumns("key","dsls")
+.baselineValues("aa",null)
+.baselineValues("bb",100L)
+.baselineValues("bb",200L)
+.baselineValues("bb",300L)
+.baselineValues("bb",400L)
+.baselineValues("cc",null)
+.baselineValues("dd",111L)
+.baselineValues("dd",222L)
+.build().run();
+  }
 }
 
 Review comment:
   That's MAX, not COUNT; though the difference should only be in the generated 
code for the aggregation function. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in streaming agg
> -
>
> Key: DRILL-6516
> URL: https://issues.apache.org/jira/browse/DRILL-6516
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.14.0
>
>
> Update the streaming aggregator to recognize the EMIT outcome



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6589:
-
Reviewer: Vitalii Diravka

> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539376#comment-16539376
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

superbstreak commented on a change in pull request #1366: [DRILL-6581] C++ 
Client SSL Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201528448
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.hpp
 ##
 @@ -21,6 +21,17 @@
 #include "drill/common.hpp"
 #include "drill/drillClient.hpp"
 #include "streamSocket.hpp"
+#include "errmsgs.hpp"
+
+#if defined(IS_SSL_ENABLED)
+#include 
+#endif
+
+namespace
+{
+// The error message to indicate certificate verification failure.
+#define DRILL_BOOST_SSL_CERT_VERIFY_FAILED  "handshake: certificate verify 
failed\0"
 
 Review comment:
   Yup, I was concern about this bit, too. Will do, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539377#comment-16539377
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

sohami commented on a change in pull request #1373: DRILL-6517: Hash-Join: If 
not OK, exit early from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373#discussion_r201528460
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -258,6 +251,13 @@ protected boolean prefetchFirstBatchFromBothSides() {
   return false;
 
 Review comment:
   `checkForEarlyFinish` checks for NONE from both side. What if only one side 
receives NONE? In that case also `BatchMemoryManager.update()` should not be 
called


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  

[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539372#comment-16539372
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201527711
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoDataType.java
 ##
 @@ -36,6 +36,7 @@
 
   private final List types = Lists.newArrayList();
   private final List names = Lists.newArrayList();
+  private final List nullables = Lists.newArrayList();
 
 Review comment:
   RecordDataType uses the `names` and `types` lists separately. So, I'll need 
atleast 2 lists.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539371#comment-16539371
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201527575
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/Nullability.java
 ##
 @@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pojo;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Indicates Nullability
+ */
+@Retention(RetentionPolicy.RUNTIME)
+@Target(ElementType.FIELD)
+public @interface Nullability {
 
 Review comment:
   That would help. Let me take a look. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539368#comment-16539368
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201527356
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoDataType.java
 ##
 @@ -48,6 +49,11 @@ public PojoDataType(Class pojoClass) {
   Class type = f.getType();
   names.add(f.getName());
 
+  //Look up @Nullability for nullable property
+  Nullability nullability = f.getDeclaredAnnotation(Nullability.class);
+  nullables.add(nullability == null ? //Absence of annotation => 
(isNullable=true)
 
 Review comment:
   It seemed very verbose, but that makes sense. I'll make this change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539367#comment-16539367
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

superbstreak commented on a change in pull request #1366: [DRILL-6581] C++ 
Client SSL Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201527169
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.hpp
 ##
 @@ -199,6 +248,29 @@ class UserProperties;
 :Channel(ioService, host, port){
 }
 connectionStatus_t init();
+protected:
+/// @brief Handle protocol handshake exceptions for SSL specific 
failures.
+/// 
+/// @param in_errmsgThe error message.
+/// 
+/// @return the connectionStatus.
+connectionStatus_t HandleProtocolHandshakeException(const char* 
errmsg) {
+if (!(((SSLChannelContext_t 
*)m_pContext)->GetCertificateHostnameVerificationStatus())){
+return handleError(
+CONN_HANDSHAKE_FAILED,
+getMessage(ERR_CONN_SSL_CN));
 
 Review comment:
   Will do Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539366#comment-16539366
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201527129
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordDataType.java
 ##
 @@ -49,15 +55,25 @@
   public final RelDataType getRowType(RelDataTypeFactory factory) {
 final List types = getFieldSqlTypeNames();
 final List names = getFieldNames();
+final List nullables = getFieldNullability();
 final List fields = Lists.newArrayList();
-for (final SqlTypeName typeName : types) {
+Iterator typesIter = types.listIterator();
+Iterator nullabilityIter = nullables.listIterator();
 
 Review comment:
   I wanted to avoid changing existing implementation, so I added an extra list 
and iterated in parallel.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539365#comment-16539365
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

superbstreak commented on a change in pull request #1366: [DRILL-6581] C++ 
Client SSL Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201527041
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.cpp
 ##
 @@ -211,6 +211,21 @@ ChannelContext* 
ChannelFactory::getChannelContext(channelType_t t, DrillUserProp
 }
 
 pChannelContext = new SSLChannelContext(props, tlsVersion, 
verifyMode);
+
+if (props->isPropSet(USERPROP_CUSTOM_SSLCTXOPTIONS)){
 
 Review comment:
   This is an optional settings that the client application can set since the 
default configuration from boost is somewhat limited. And also based on the 
version of boost the client is compiled against, some configurations are not 
available. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539364#comment-16539364
 ] 

ASF GitHub Bot commented on DRILL-6346:
---

paul-rogers commented on issue #1348: DRILL-6346: Create an Official Drill 
Docker Container
URL: https://github.com/apache/drill/pull/1348#issuecomment-403997520
 
 
   If running a server, we may want to let the Drillbit shutdown gracefully. To 
do this, intercept the `SIGTERM` coming into the container and forward it to 
the process. If the `start.sh` script `exec`s the Drillbit, then the Drillbit 
may handle the shutdown. Otherwise, you can have the `start.sh` [handle the 
SIGTERM](https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86).
   
   The normal `drillbit.sh` script appears to do an `exec`, so the Drill server 
should end up as pid 1 (if your `start.sh` script does an `exec` to 
`drillbit.sh`.) But, check this in case anything has changed since we looked 
into this stuff for Drill-on-YARN. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create an Official Drill Docker Container
> -
>
> Key: DRILL-6346
> URL: https://issues.apache.org/jira/browse/DRILL-6346
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539362#comment-16539362
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

superbstreak commented on a change in pull request #1366: [DRILL-6581] C++ 
Client SSL Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201526439
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.cpp
 ##
 @@ -211,6 +211,21 @@ ChannelContext* 
ChannelFactory::getChannelContext(channelType_t t, DrillUserProp
 }
 
 pChannelContext = new SSLChannelContext(props, tlsVersion, 
verifyMode);
+
+if (props->isPropSet(USERPROP_CUSTOM_SSLCTXOPTIONS)){
 
 Review comment:
   Thanks! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539361#comment-16539361
 ] 

ASF GitHub Bot commented on DRILL-6346:
---

paul-rogers commented on issue #1348: DRILL-6346: Create an Official Drill 
Docker Container
URL: https://github.com/apache/drill/pull/1348#issuecomment-403997146
 
 
   Note also that if any of the processes can launch child processes, something 
in the container must reap zombies. Bash will do it as will a utility called 
[`tini`](https://github.com/krallin/tini).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create an Official Drill Docker Container
> -
>
> Key: DRILL-6346
> URL: https://issues.apache.org/jira/browse/DRILL-6346
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539360#comment-16539360
 ] 

ASF GitHub Bot commented on DRILL-6346:
---

paul-rogers commented on issue #1348: DRILL-6346: Create an Official Drill 
Docker Container
URL: https://github.com/apache/drill/pull/1348#issuecomment-403997092
 
 
   Another thought, FWIW. The container offered here is for the embedded 
Drillbit. A previous comment suggested it would be handy to run an actual Drill 
server (Drillbit). It would be further handy to run a Sqlline that connects to 
a remote Drillbit.
   
   These can be done by creating a `start.sh` or `entrypoint.sh` script in the 
container that parses arguments, then does what is requested. For example:
   
   ```
   start.sh [drillbit|embedded|sqlline]  ...
   ```
   
   Then, have `start.sh` do a switch on the first arg (with a pattern that 
accepts d*, e* or s*) and passes the args to that process. (For Sqlline, we 
need the JDBC arguments.)
   
   Might be helpful to see the Kubernetes Dockerfile provided with the Spark 
2.3/2.4 release since they had to wrestle with these same issues (launch Driver 
or Executor, later extended to History Server, etc.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create an Official Drill Docker Container
> -
>
> Key: DRILL-6346
> URL: https://issues.apache.org/jira/browse/DRILL-6346
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6542) IndexOutOfBoundsException for multilevel lateral queries with schema changed partitioned complex data

2018-07-10 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6542:
-
Reviewer: Parth Chandra

> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> -
>
> Key: DRILL-6542
> URL: https://issues.apache.org/jira/browse/DRILL-6542
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> query:
> {code}
> select customer.c_custkey, customer.c_name, orders.orderkey, 
> orders.totalprice, olineitems.l_partkey, olineitems.l_linenumber, 
> olineitems.l_quantity from customer, 
> lateral (select t1.o.o_orderkey as orderkey, t1.o.o_totalprice as totalprice, 
> t1.o.o_lineitems as lineitems from unnest(customer.c_orders) t1(o)) orders, 
> lateral (select t2.l.l_partkey as l_partkey, t2.l.l_linenumber as 
> l_linenumber, t2.l.l_quantity as l_quantity from unnest(orders.lineitems) 
> t2(l)) olineitems 
> order by customer.c_custkey, orders.orderkey, orders.totalprice, 
> olineitems.l_partkey, olineitems.l_linenumber, olineitems.l_quantity limit 50;
> {code}
> Error:
> {code}
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: writerIndex: 1 (expected: readerIndex(0) <= 
> writerIndex <= capacity(0))
> Fragment 0:0
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IndexOutOfBoundsException: writerIndex: 1 (expected: 
> readerIndex(0) <= writerIndex <= capacity(0))
>  at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:104) 
> ~[netty-buffer-4.0.48.Final.jar:4.0.48.Final]
>  at 
> org.apache.drill.exec.vector.UInt1Vector.splitAndTransferTo(UInt1Vector.java:329)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector.splitAndTransferTo(NullableBigIntVector.java:312)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.splitAndTransfer(NullableBigIntVector.java:339)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer(RepeatedMapVector.java:298)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestImpl.unnestRecords(UnnestImpl.java:101)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.doWork(UnnestRecordBatch.java:283)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:236)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6542) IndexOutOfBoundsException for multilevel lateral queries with schema changed partitioned complex data

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539358#comment-16539358
 ] 

ASF GitHub Bot commented on DRILL-6542:
---

sohami commented on issue #1374: DRILL-6542 : IndexOutOfBoundsException for 
multilevel lateral queries…
URL: https://github.com/apache/drill/pull/1374#issuecomment-403996831
 
 
   @parthchandra - Please help to review this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> -
>
> Key: DRILL-6542
> URL: https://issues.apache.org/jira/browse/DRILL-6542
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> query:
> {code}
> select customer.c_custkey, customer.c_name, orders.orderkey, 
> orders.totalprice, olineitems.l_partkey, olineitems.l_linenumber, 
> olineitems.l_quantity from customer, 
> lateral (select t1.o.o_orderkey as orderkey, t1.o.o_totalprice as totalprice, 
> t1.o.o_lineitems as lineitems from unnest(customer.c_orders) t1(o)) orders, 
> lateral (select t2.l.l_partkey as l_partkey, t2.l.l_linenumber as 
> l_linenumber, t2.l.l_quantity as l_quantity from unnest(orders.lineitems) 
> t2(l)) olineitems 
> order by customer.c_custkey, orders.orderkey, orders.totalprice, 
> olineitems.l_partkey, olineitems.l_linenumber, olineitems.l_quantity limit 50;
> {code}
> Error:
> {code}
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: writerIndex: 1 (expected: readerIndex(0) <= 
> writerIndex <= capacity(0))
> Fragment 0:0
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IndexOutOfBoundsException: writerIndex: 1 (expected: 
> readerIndex(0) <= writerIndex <= capacity(0))
>  at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:104) 
> ~[netty-buffer-4.0.48.Final.jar:4.0.48.Final]
>  at 
> org.apache.drill.exec.vector.UInt1Vector.splitAndTransferTo(UInt1Vector.java:329)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector.splitAndTransferTo(NullableBigIntVector.java:312)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.splitAndTransfer(NullableBigIntVector.java:339)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer(RepeatedMapVector.java:298)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestImpl.unnestRecords(UnnestImpl.java:101)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.doWork(UnnestRecordBatch.java:283)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:236)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  

[jira] [Commented] (DRILL-6542) IndexOutOfBoundsException for multilevel lateral queries with schema changed partitioned complex data

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539357#comment-16539357
 ] 

ASF GitHub Bot commented on DRILL-6542:
---

sohami opened a new pull request #1374: DRILL-6542 : IndexOutOfBoundsException 
for multilevel lateral queries…
URL: https://github.com/apache/drill/pull/1374
 
 
   … with schema changed partitioned complex data


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> -
>
> Key: DRILL-6542
> URL: https://issues.apache.org/jira/browse/DRILL-6542
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> IndexOutOfBoundsException for multilevel lateral queries with schema changed 
> partitioned complex data
> query:
> {code}
> select customer.c_custkey, customer.c_name, orders.orderkey, 
> orders.totalprice, olineitems.l_partkey, olineitems.l_linenumber, 
> olineitems.l_quantity from customer, 
> lateral (select t1.o.o_orderkey as orderkey, t1.o.o_totalprice as totalprice, 
> t1.o.o_lineitems as lineitems from unnest(customer.c_orders) t1(o)) orders, 
> lateral (select t2.l.l_partkey as l_partkey, t2.l.l_linenumber as 
> l_linenumber, t2.l.l_quantity as l_quantity from unnest(orders.lineitems) 
> t2(l)) olineitems 
> order by customer.c_custkey, orders.orderkey, orders.totalprice, 
> olineitems.l_partkey, olineitems.l_linenumber, olineitems.l_quantity limit 50;
> {code}
> Error:
> {code}
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: writerIndex: 1 (expected: readerIndex(0) <= 
> writerIndex <= capacity(0))
> Fragment 0:0
> [Error Id: 7427fa7e-af4a-4f11-acd9-ced71848a1ed on drill182:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IndexOutOfBoundsException: writerIndex: 1 (expected: 
> readerIndex(0) <= writerIndex <= capacity(0))
>  at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:104) 
> ~[netty-buffer-4.0.48.Final.jar:4.0.48.Final]
>  at 
> org.apache.drill.exec.vector.UInt1Vector.splitAndTransferTo(UInt1Vector.java:329)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector.splitAndTransferTo(NullableBigIntVector.java:312)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.splitAndTransfer(NullableBigIntVector.java:339)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$SingleMapTransferPair.splitAndTransfer(RepeatedMapVector.java:298)
>  ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestImpl.unnestRecords(UnnestImpl.java:101)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.doWork(UnnestRecordBatch.java:283)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext(UnnestRecordBatch.java:236)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
>  

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539354#comment-16539354
 ] 

ASF GitHub Bot commented on DRILL-6517:
---

Ben-Zvi opened a new pull request #1373: DRILL-6517: Hash-Join: If not OK, exit 
early from prefetchFirstBatchFromBothSides
URL: https://github.com/apache/drill/pull/1373
 
 
When a running query is cancelled (e.g., when Disk is full while spilling), 
many next() calls return a STOP outcome. Any Hash-Join that is "sniffing" its 
incoming batches (by repeatedly calling next()) would proceed to update the 
memory manager regardless of the outcomes returned.
  In some abnormal outcomes (like STOP), the batch's container is **not 
initialized**. The update call needs the batch's row count, which for 
*RemovingRecordBatch* is implemented (differently) by invoking the container's 
getRowCount() -- so in this case the call fails.
 The Fix: Move all the checks for the abnormal conditions immediately after 
the "sniffing", thus returning early and avoiding the update of the memory 
manager.
 Longer term question: How to implement getRowCount() for the record batch 
consistently.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539349#comment-16539349
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

sohami commented on a change in pull request #1366: [DRILL-6581] C++ Client SSL 
Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201521602
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.hpp
 ##
 @@ -199,6 +248,29 @@ class UserProperties;
 :Channel(ioService, host, port){
 }
 connectionStatus_t init();
+protected:
+/// @brief Handle protocol handshake exceptions for SSL specific 
failures.
+/// 
+/// @param in_errmsgThe error message.
+/// 
+/// @return the connectionStatus.
+connectionStatus_t HandleProtocolHandshakeException(const char* 
errmsg) {
+if (!(((SSLChannelContext_t 
*)m_pContext)->GetCertificateHostnameVerificationStatus())){
+return handleError(
+CONN_HANDSHAKE_FAILED,
+getMessage(ERR_CONN_SSL_CN));
 
 Review comment:
   Would be useful to preserve the original error message in this case as well. 
Please change to `getMessage(ERR_CONN_SSL_CN, errmsg)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539350#comment-16539350
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

sohami commented on a change in pull request #1366: [DRILL-6581] C++ Client SSL 
Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201524731
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.hpp
 ##
 @@ -21,6 +21,17 @@
 #include "drill/common.hpp"
 #include "drill/drillClient.hpp"
 #include "streamSocket.hpp"
+#include "errmsgs.hpp"
+
+#if defined(IS_SSL_ENABLED)
+#include 
+#endif
+
+namespace
+{
+// The error message to indicate certificate verification failure.
+#define DRILL_BOOST_SSL_CERT_VERIFY_FAILED  "handshake: certificate verify 
failed\0"
 
 Review comment:
   I don't think we can rely on this error string. Instead it would be good to 
use something like below for decoding ssl errors.
   
https://stackoverflow.com/questions/9828066/how-to-decipher-a-boost-asio-ssl-error-code


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539351#comment-16539351
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

sohami commented on a change in pull request #1366: [DRILL-6581] C++ Client SSL 
Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201519782
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.hpp
 ##
 @@ -215,6 +287,52 @@ class UserProperties;
 static ChannelContext_t* getChannelContext(channelType_t t, 
DrillUserProperties* props);
 };
 
+/// @brief Hostname verification callback wrapper.
+class DrillSSLHostnameVerifier{
+public:
+/// @brief The constructor.
+/// 
+/// @param in_channel  The Channel.
+DrillSSLHostnameVerifier(Channel* in_channel) : 
m_channel(in_channel){
+DRILL_LOG(LOG_INFO)
+<< "DrillSSLHostnameVerifier::DrillSSLHostnameVerifier: 
+ Enter +" 
+<< std::endl;
+}
+
+/// @brief Perform certificate verification.
+/// 
+/// @param in_preverified   Pre-verified indicator.
+/// @param in_ctx   Verify context.
+bool operator()(
+bool in_preverified,
+boost::asio::ssl::verify_context& in_ctx){
+DRILL_LOG(LOG_INFO) << "DrillSSLHostnameVerifier::operator(): 
+ Enter +" << std::endl;
+
+// Gets the channel context.
+SSLChannelContext_t* context = 
(SSLChannelContext_t*)(m_channel->getChannelContext());
+
+// Retrieve the host before we perform Host name verification.
+// This is because host with ZK mode is selected after the 
connect() function is called.
+boost::asio::ssl::rfc2818_verification 
verifier(m_channel->getEndpoint()->getHost().c_str());
+
+// Perform verification.
+bool verified = verifier(in_preverified, in_ctx);
+
+DRILL_LOG(LOG_DEBUG) 
+<< "DrillSSLHostnameVerifier::operator(): Verification 
Result: " 
+<< verified 
+<< std::endl;
+
+// Sets the result back to the context.
+context->SetCertHostnameVerificationStatus(verified);
+return verified && in_preverified;
 
 Review comment:
   I think we should just return the `verified` status here not `(verified && 
in_preverified)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539352#comment-16539352
 ] 

ASF GitHub Bot commented on DRILL-6581:
---

sohami commented on a change in pull request #1366: [DRILL-6581] C++ Client SSL 
Implementation Fixes/Improvements
URL: https://github.com/apache/drill/pull/1366#discussion_r201435888
 
 

 ##
 File path: contrib/native/client/src/clientlib/channel.cpp
 ##
 @@ -211,6 +211,21 @@ ChannelContext* 
ChannelFactory::getChannelContext(channelType_t t, DrillUserProp
 }
 
 pChannelContext = new SSLChannelContext(props, tlsVersion, 
verifyMode);
+
+if (props->isPropSet(USERPROP_CUSTOM_SSLCTXOPTIONS)){
 
 Review comment:
   No need to check `isPropSet` since you are already checking for 
`!sslOptions.empty()` below.
   
   Also I have a question regarding these custom SSL Context options. Based on 
documentation 
[here](https://www.openssl.org/docs/man1.0.2/ssl/SSL_CTX_set_options.html) it 
helps to achieve workaround for listed bugs. But that depends upon the internal 
implementation of SSL if handling for those work around is available or not. If 
the handling is available then it will be used since this option is set during 
`m_SSLContext` creation (See 
[here](https://github.com/apache/drill/pull/1366/commits/093b4cadb4653b2e51fa13e7baadef3d0d6b8c91#diff-4649cdc0895f6abaeb47ff3f6a10eec4R104)).
 So it doesn't look like we need to have this separate custom option setter ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539343#comment-16539343
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201523513
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoDataType.java
 ##
 @@ -48,6 +49,11 @@ public PojoDataType(Class pojoClass) {
   Class type = f.getType();
   names.add(f.getName());
 
+  //Look up @Nullability for nullable property
+  Nullability nullability = f.getDeclaredAnnotation(Nullability.class);
+  nullables.add(nullability == null ? //Absence of annotation => 
(isNullable=true)
 
 Review comment:
   `f.isAnnotationPresent(Nullability.class)` returns a `boolean` result; may 
simplify the code here


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539344#comment-16539344
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201523626
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoDataType.java
 ##
 @@ -84,4 +90,8 @@ public PojoDataType(Class pojoClass) {
 return names;
   }
 
+  @Override
+  public List getFieldNullability() {
 
 Review comment:
   Same comment about the awkwardness of correlated lists...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539340#comment-16539340
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201522572
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordDataType.java
 ##
 @@ -49,15 +55,25 @@
   public final RelDataType getRowType(RelDataTypeFactory factory) {
 final List types = getFieldSqlTypeNames();
 final List names = getFieldNames();
+final List nullables = getFieldNullability();
 final List fields = Lists.newArrayList();
-for (final SqlTypeName typeName : types) {
+Iterator typesIter = types.listIterator();
+Iterator nullabilityIter = nullables.listIterator();
 
 Review comment:
   We now four correlated lists. This is often difficult to reason about and 
maintain.
   
   Is it possible to have a single list of, say, `FieldDefn` objects, each of 
which has a name, type, nullable and `RelDataType`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539342#comment-16539342
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201523179
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoDataType.java
 ##
 @@ -36,6 +36,7 @@
 
   private final List types = Lists.newArrayList();
   private final List names = Lists.newArrayList();
+  private final List nullables = Lists.newArrayList();
 
 Review comment:
   Again, single list of objects rather than three correlated list of 
primitives?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539341#comment-16539341
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

paul-rogers commented on a change in pull request #1371: DRILL-6588: Make Sys 
tables of nullable datatypes
URL: https://github.com/apache/drill/pull/1371#discussion_r201523061
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/Nullability.java
 ##
 @@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pojo;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Indicates Nullability
+ */
+@Retention(RetentionPolicy.RUNTIME)
+@Target(ElementType.FIELD)
+public @interface Nullability {
 
 Review comment:
   Turns out that Java has a existing `@Nonnull` and `@Nullable` annotations in 
`java.annotation`. Can we use those?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539325#comment-16539325
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai opened a new pull request #1372: DRILL-6589: Push transitive closure 
predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372
 
 
   @amansinha100 / @vdiravka please review the PR. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5797) Use more often the new parquet reader

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539310#comment-16539310
 ] 

ASF GitHub Bot commented on DRILL-5797:
---

okalinin commented on a change in pull request #1370: DRILL-5797: Use Parquet 
new reader in all non-complex column queries
URL: https://github.com/apache/drill/pull/1370#discussion_r201517913
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetReaderDecision.java
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.hadoop.metadata.ParquetMetadata;
+import org.apache.parquet.hadoop.ParquetFileReader;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.List;
+import java.util.ArrayList;
+
+import java.io.IOException;
+import java.nio.file.Paths;
+
+/**
+ * DRILL-5797 introduces more granularity on new reader use cases. This test 
is aimed at
+ * checking correctness of function used for new reader usage decision making.
+ */
+public class TestParquetReaderDecision {
+
+  private static final String path = 
"src/test/resources/store/parquet/complex/complex.parquet";
+  private static Configuration conf;
+  private static ParquetMetadata footer;
+
+  @BeforeClass
+  public static void setUpBeforeClass() throws Exception {
+conf = new Configuration();
+
+try {
+  footer = ParquetFileReader.readFooter(conf, new Path(path));
+} catch (IOException ioe) {
+  fail("Could not read Parquet file '" + path + "', error message: " + 
ioe.getMessage()
+  + " cwd: " + Paths.get(".").toAbsolutePath().normalize().toString());
+  throw(ioe);
+}
+  }
+
+  @Test
+  public void testParquetReaderDecision() {
 
 Review comment:
   I will fix this and all other test issues you highlighted. When fixing them 
I realised that existing `complex.parquet` test file possibly doesn't provide 
desired coverage for changes in this PR. Bulk part of changes are aimed at 
making utility functions work with schemas like 
   `a`
   `b`.`a`
   while schema in `complex.parquet` doesn't contain such corner cases.
   Does it make sense to add new resource to cover such cases and base tests on 
that file?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use more often the new parquet reader
> -
>
> Key: DRILL-5797
> URL: https://issues.apache.org/jira/browse/DRILL-5797
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Oleksandr Kalinin
>Priority: Major
> Fix For: 1.15.0
>
>
> The choice of using the regular parquet reader of the optimized one is based 
> of what type of columns is in the file. But the columns that are read by the 
> query doesn't matter. We can increase a little bit the cases where the 
> optimized reader is used by checking is the projected column are simple or 
> not.
> This is an optimization waiting for the fast parquet reader to handle complex 
> structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539284#comment-16539284
 ] 

Boaz Ben-Zvi edited comment on DRILL-6517 at 7/10/18 10:14 PM:
---

  From running instrumented code (on latest master), it looks like the error 
that triggered the cancellation was a disk full while spilling:

{code}
2018-07-10 13:58:41,318 [24bb00cd-1a38-f06b-19a7-cc44d83bde59:frag:4:4] INFO  
o.a.d.e.p.impl.common.HashPartition - User Error Occurred: Hash Join failed to 
write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
 (null)
org.apache.drill.common.exceptions.UserException: DATA_WRITE ERROR: Hash Join 
failed to write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
{code}

Then operators started returning batches with the STOP outcome, where the 
vector container was not initialized.
The original code went on to check those records (ignoring the outcome), 
invoking the *batch* method `getRecordCount()` (which normally just returns the 
internal field in the batch.)  However for `RemovingRecordBatch` -- the 
implementation invokes the container's `getRecordCount()` , which failed as it 
was not initialized.



was (Author: ben-zvi):
  From running instrumented code (on latest master), it looks like the error 
that triggered the cancellation was a disk full while spilling:

{code}
2018-07-10 13:58:41,318 [24bb00cd-1a38-f06b-19a7-cc44d83bde59:frag:4:4] INFO  
o.a.d.e.p.impl.common.HashPartition - User Error Occurred: Hash Join failed to 
write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
 (null)
org.apache.drill.common.exceptions.UserException: DATA_WRITE ERROR: Hash Join 
failed to write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
{code}

Then operators started returning batches with the STOP outcome, where the 
vector container was not initialized.
The original code went on to check those records (ignoring the outcome), 
invoking the *batch* method `getRecordCount()` (which normally just returns the 
internal field in the batch. However for `RemovingRecordBatch` -- the 
implementation invokes the container's `getRecordCount()` , which failed as it 
was not initialized.


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539284#comment-16539284
 ] 

Boaz Ben-Zvi commented on DRILL-6517:
-

  From running instrumented code (on latest master), it looks like the error 
that triggered the cancellation was a disk full while spilling:

{code}
2018-07-10 13:58:41,318 [24bb00cd-1a38-f06b-19a7-cc44d83bde59:frag:4:4] INFO  
o.a.d.e.p.impl.common.HashPartition - User Error Occurred: Hash Join failed to 
write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
 (null)
org.apache.drill.common.exceptions.UserException: DATA_WRITE ERROR: Hash Join 
failed to write to output file: 
/tmp/drill/spill/24bb00cd-1a38-f06b-19a7-cc44d83bde59_HashJoin_4-22-4/spill6_outer
{code}

Then operators started returning batches with the STOP outcome, where the 
vector container was not initialized.
The original code went on to check those records (ignoring the outcome), 
invoking the *batch* method `getRecordCount()` (which normally just returns the 
internal field in the batch. However for `RemovingRecordBatch` -- the 
implementation invokes the container's `getRecordCount()` , which failed as it 
was not initialized.


> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539281#comment-16539281
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

priteshm commented on issue #1358:  DRILL-6516: EMIT support in streaming agg
URL: https://github.com/apache/drill/pull/1358#issuecomment-403981936
 
 
   @Ben-Zvi any more comments on this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in streaming agg
> -
>
> Key: DRILL-6516
> URL: https://issues.apache.org/jira/browse/DRILL-6516
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.14.0
>
>
> Update the streaming aggregator to recognize the EMIT outcome



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6589:
--
Description: 
Here is a sample query that may benefit from this optimization:

SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 

Here the transitive predicate a2 = 5 would be pushed past the aggregate due to 
this optimization.

> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6589:
-

 Summary: Push transitive closure generated predicates past 
aggregates/projects
 Key: DRILL-6589
 URL: https://issues.apache.org/jira/browse/DRILL-6589
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi reassigned DRILL-6517:
---

Assignee: Boaz Ben-Zvi  (was: salim achouche)

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  

[jira] [Updated] (DRILL-6583) Add space between pagination links in Profiles (WebUI) list

2018-07-10 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6583:
-
Summary: Add space between pagination links in Profiles (WebUI) list  (was: 
UI usability issue)

> Add space between pagination links in Profiles (WebUI) list
> ---
>
> Key: DRILL-6583
> URL: https://issues.apache.org/jira/browse/DRILL-6583
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: UI_usability_issue_AD_1_14_0.png
>
>
> When a query is under execution, on the web UI we see this text which is 
> actually a set of different links that help navigate to different pages on 
> the UI, below that query's profile.
> Apache Drill 1.14.0
> git.commit.id.abbrev=f481a7c
> They all appear on a single line with no spacing and a typo, the formatting 
> of the text for those links needs to be changed / improved.
> Attached is a screenshot for the issue, look for "FirstPrevious1NextLast" on 
> the bottom left of the screenshot.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539214#comment-16539214
 ] 

Robert Hou edited comment on DRILL-6566 at 7/10/18 8:43 PM:


This is what we get from running the latest commit:
1.14.0-SNAPSHOT e79db14d81f8ee15d27cc6026bc0ee409e0c0a3cDRILL-6529 
Project Batch Sizing causes two LargeFileCompilation tests to timeout
09.07.2018 @ 04:08:59 PDT   Unknown 09.07.2018 @ 15:34:37 PDT

>> Query: select * from sys.options where status = 'CHANGED';
namekindaccessibleScopesoptionScope status  num_val 
string_val  bool_valfloat_val
drill.exec.hashagg.fallback.enabled BOOLEAN ALL SYSTEM  CHANGED null
nulltruenull
drill.exec.hashjoin.fallback.enabledBOOLEAN ALL SYSTEM  CHANGED null
nulltruenull

The output batch size should be 16MB.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
ship_carriers,
year1,
Sum(jan_sales) AS jan_sales,
Sum(feb_sales) AS feb_sales,
Sum(mar_sales) AS mar_sales,
Sum(apr_sales) AS apr_sales,
Sum(may_sales) AS may_sales,
Sum(jun_sales) AS jun_sales,
Sum(jul_sales) AS jul_sales,
Sum(aug_sales) AS aug_sales,
Sum(sep_sales) AS sep_sales,
Sum(oct_sales) AS oct_sales,
Sum(nov_sales) AS nov_sales,
Sum(dec_sales) AS dec_sales,
Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
Sum(jan_net)   AS jan_net,
Sum(feb_net)   AS feb_net,
Sum(mar_net)   AS mar_net,
Sum(apr_net)   AS apr_net,
Sum(may_net)   AS may_net,
Sum(jun_net)   AS jun_net,
Sum(jul_net)   AS jul_net,
Sum(aug_net)   AS aug_net,
Sum(sep_net)   AS sep_net,
Sum(oct_net)   AS oct_net,
Sum(nov_net)   AS nov_net,
Sum(dec_net)   AS dec_net
FROM   (SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
'ZOUROS'
\|\| ','
\|\| 'ZHOU' AS ship_carriers,
d_yearAS year1,
Sum(CASE
WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jan_sales,
Sum(CASE
WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS feb_sales,
Sum(CASE
WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS mar_sales,
Sum(CASE
WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS apr_sales,
Sum(CASE
WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS may_sales,
Sum(CASE
WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jun_sales,
Sum(CASE
WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jul_sales,
Sum(CASE
WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS aug_sales,
Sum(CASE
WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS sep_sales,
Sum(CASE
WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS oct_sales,
Sum(CASE
WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS nov_sales,
Sum(CASE
WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS dec_sales,
Sum(CASE
WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jan_net,
Sum(CASE
WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS feb_net,
Sum(CASE
WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS mar_net,
Sum(CASE
WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS apr_net,
Sum(CASE
WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS may_net,
Sum(CASE
WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jun_net,
Sum(CASE
WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jul_net,
Sum(CASE
WHEN d_moy = 8 THEN 

[jira] [Commented] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539214#comment-16539214
 ] 

Robert Hou commented on DRILL-6566:
---

This is what we get from running the latest commit:
1.14.0-SNAPSHOT e79db14d81f8ee15d27cc6026bc0ee409e0c0a3cDRILL-6529 
Project Batch Sizing causes two LargeFileCompilation tests to timeout
09.07.2018 @ 04:08:59 PDT   Unknown 09.07.2018 @ 15:34:37 PDT

>> Query: select * from sys.options where status = 'CHANGED';
namekindaccessibleScopesoptionScope status  num_val 
string_val  bool_valfloat_val
drill.exec.hashagg.fallback.enabled BOOLEAN ALL SYSTEM  CHANGED null
nulltruenull
drill.exec.hashjoin.fallback.enabledBOOLEAN ALL SYSTEM  CHANGED null
nulltruenull

The output batch size should be 16MB.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1dot2_hiveplugin/query66.sql
SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
ship_carriers,
year1,
Sum(jan_sales) AS jan_sales,
Sum(feb_sales) AS feb_sales,
Sum(mar_sales) AS mar_sales,
Sum(apr_sales) AS apr_sales,
Sum(may_sales) AS may_sales,
Sum(jun_sales) AS jun_sales,
Sum(jul_sales) AS jul_sales,
Sum(aug_sales) AS aug_sales,
Sum(sep_sales) AS sep_sales,
Sum(oct_sales) AS oct_sales,
Sum(nov_sales) AS nov_sales,
Sum(dec_sales) AS dec_sales,
Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
Sum(jan_net)   AS jan_net,
Sum(feb_net)   AS feb_net,
Sum(mar_net)   AS mar_net,
Sum(apr_net)   AS apr_net,
Sum(may_net)   AS may_net,
Sum(jun_net)   AS jun_net,
Sum(jul_net)   AS jul_net,
Sum(aug_net)   AS aug_net,
Sum(sep_net)   AS sep_net,
Sum(oct_net)   AS oct_net,
Sum(nov_net)   AS nov_net,
Sum(dec_net)   AS dec_net
FROM   (SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
'ZOUROS'
|| ','
|| 'ZHOU' AS ship_carriers,
d_yearAS year1,
Sum(CASE
WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jan_sales,
Sum(CASE
WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS feb_sales,
Sum(CASE
WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS mar_sales,
Sum(CASE
WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS apr_sales,
Sum(CASE
WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS may_sales,
Sum(CASE
WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jun_sales,
Sum(CASE
WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jul_sales,
Sum(CASE
WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS aug_sales,
Sum(CASE
WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS sep_sales,
Sum(CASE
WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS oct_sales,
Sum(CASE
WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS nov_sales,
Sum(CASE
WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS dec_sales,
Sum(CASE
WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jan_net,
Sum(CASE
WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS feb_net,
Sum(CASE
WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS mar_net,
Sum(CASE
WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS apr_net,
Sum(CASE
WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS may_net,
Sum(CASE
WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jun_net,
Sum(CASE
WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jul_net,
Sum(CASE
WHEN d_moy = 8 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS aug_net,

[jira] [Comment Edited] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539214#comment-16539214
 ] 

Robert Hou edited comment on DRILL-6566 at 7/10/18 8:40 PM:


This is what we get from running the latest commit:
1.14.0-SNAPSHOT e79db14d81f8ee15d27cc6026bc0ee409e0c0a3cDRILL-6529 
Project Batch Sizing causes two LargeFileCompilation tests to timeout
09.07.2018 @ 04:08:59 PDT   Unknown 09.07.2018 @ 15:34:37 PDT

>> Query: select * from sys.options where status = 'CHANGED';
namekindaccessibleScopesoptionScope status  num_val 
string_val  bool_valfloat_val
drill.exec.hashagg.fallback.enabled BOOLEAN ALL SYSTEM  CHANGED null
nulltruenull
drill.exec.hashjoin.fallback.enabledBOOLEAN ALL SYSTEM  CHANGED null
nulltruenull

The output batch size should be 16MB.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1dot2_hiveplugin/query66.sql
SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
ship_carriers,
year1,
Sum(jan_sales) AS jan_sales,
Sum(feb_sales) AS feb_sales,
Sum(mar_sales) AS mar_sales,
Sum(apr_sales) AS apr_sales,
Sum(may_sales) AS may_sales,
Sum(jun_sales) AS jun_sales,
Sum(jul_sales) AS jul_sales,
Sum(aug_sales) AS aug_sales,
Sum(sep_sales) AS sep_sales,
Sum(oct_sales) AS oct_sales,
Sum(nov_sales) AS nov_sales,
Sum(dec_sales) AS dec_sales,
Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
Sum(jan_net)   AS jan_net,
Sum(feb_net)   AS feb_net,
Sum(mar_net)   AS mar_net,
Sum(apr_net)   AS apr_net,
Sum(may_net)   AS may_net,
Sum(jun_net)   AS jun_net,
Sum(jul_net)   AS jul_net,
Sum(aug_net)   AS aug_net,
Sum(sep_net)   AS sep_net,
Sum(oct_net)   AS oct_net,
Sum(nov_net)   AS nov_net,
Sum(dec_net)   AS dec_net
FROM   (SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
'ZOUROS'
\|\| ','
\|\| 'ZHOU' AS ship_carriers,
d_yearAS year1,
Sum(CASE
WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jan_sales,
Sum(CASE
WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS feb_sales,
Sum(CASE
WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS mar_sales,
Sum(CASE
WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS apr_sales,
Sum(CASE
WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS may_sales,
Sum(CASE
WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jun_sales,
Sum(CASE
WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jul_sales,
Sum(CASE
WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS aug_sales,
Sum(CASE
WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS sep_sales,
Sum(CASE
WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS oct_sales,
Sum(CASE
WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS nov_sales,
Sum(CASE
WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS dec_sales,
Sum(CASE
WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jan_net,
Sum(CASE
WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS feb_net,
Sum(CASE
WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS mar_net,
Sum(CASE
WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS apr_net,
Sum(CASE
WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS may_net,
Sum(CASE
WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jun_net,
Sum(CASE
WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jul_net,
Sum(CASE
WHEN d_moy = 8 THEN 

[jira] [Commented] (DRILL-6578) Ensure the Flat Parquet Reader can handle query cancellation

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539212#comment-16539212
 ] 

ASF GitHub Bot commented on DRILL-6578:
---

sachouche commented on issue #1360: DRILL-6578: Handle query cancellation in 
Parquet reader
URL: https://github.com/apache/drill/pull/1360#issuecomment-403957132
 
 
   @vrozov,
   can you please review this PR? I have implemented the changes as agreed 
(last week). 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ensure the Flat Parquet Reader can handle query cancellation
> 
>
> Key: DRILL-6578
> URL: https://issues.apache.org/jira/browse/DRILL-6578
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> * The optimized Parquet reader uses an iterator style to load column data 
>  * We need to ensure the code can properly handle query cancellation even in 
> the presence of bugs within the hasNext() .. next() calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539209#comment-16539209
 ] 

Robert Hou commented on DRILL-6566:
---

DC has indicated this query now works for him.  This is an email from June 29.

{noformat}
Hi Salim,
Just verified your PR, looks like the issue is fixed: SF1 TPCDS queries 10, 35, 
66, and 69 all completed:
http://10.10.106.202:8047/profiles/24c9011d-66f0-62c5-db66-3521609683c6
{noformat}

The PR is 1354.

> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
>
> This is TPCDS Query 66.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS dec_sales,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS jan_net,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity

[jira] [Commented] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539208#comment-16539208
 ] 

ASF GitHub Bot commented on DRILL-6579:
---

sachouche commented on issue #1361: DRILL-6579: Added sanity checks to the 
Parquet reader to avoid infini…
URL: https://github.com/apache/drill/pull/1361#issuecomment-403956690
 
 
   @vrozov 
   can you please approve this PR as there is a deadline for drill 1.14?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539205#comment-16539205
 ] 

ASF GitHub Bot commented on DRILL-6579:
---

sachouche commented on a change in pull request #1361: DRILL-6579: Added sanity 
checks to the Parquet reader to avoid infini…
URL: https://github.com/apache/drill/pull/1361#discussion_r201484528
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenNullableFixedEntryReader.java
 ##
 @@ -38,14 +39,16 @@
   /** {@inheritDoc} */
   @Override
   final VarLenColumnBulkEntry getEntry(int valuesToRead) {
-assert columnPrecInfo.precision >= 0 : "Fixed length precision cannot be 
lower than zero";
+Preconditions.checkArgument(columnPrecInfo.precision >= 0, "Fixed length 
precision cannot be lower than zero");
 
 Review comment:
   You are correct; pushed the check to the CTOR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6579) Add sanity checks to Parquet Reader

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539203#comment-16539203
 ] 

ASF GitHub Bot commented on DRILL-6579:
---

sachouche commented on a change in pull request #1361: DRILL-6579: Added sanity 
checks to the Parquet reader to avoid infini…
URL: https://github.com/apache/drill/pull/1361#discussion_r201484409
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenFixedEntryReader.java
 ##
 @@ -37,14 +38,15 @@
   /** {@inheritDoc} */
   @Override
   final VarLenColumnBulkEntry getEntry(int valuesToRead) {
-assert columnPrecInfo.precision >= 0 : "Fixed length precision cannot be 
lower than zero";
+Preconditions.checkArgument(columnPrecInfo.precision >= 0, "Fixed length 
precision cannot be lower than zero");
 
 Review comment:
   Pushed the check to the CTOR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add sanity checks to Parquet Reader 
> 
>
> Key: DRILL-6579
> URL: https://issues.apache.org/jira/browse/DRILL-6579
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Add sanity checks to the Parquet reader to avoid infinite loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou reopened DRILL-6566:
---

> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
>
> This is TPCDS Query 66.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS dec_sales,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS jan_net,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS feb_net,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS mar_net,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS apr_net,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END)  AS may_net,
> Sum(CASE
> WHEN d_moy = 6 

[jira] [Comment Edited] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-07-10 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528397#comment-16528397
 ] 

Boaz Ben-Zvi edited comment on DRILL-6566 at 7/10/18 8:30 PM:
--

Note: "Estimated batch size: 31260672." That's about 31M !!  And "Memory limit: 
2302755" - that's only 2M.
 [~rhou] - Was the Hash-Agg *fallback* option enabled ?
 By default (the 2nd phase) should have failed and given an error asking for 
more memory.

Side comment: Padma's PR is ready to commit ( #1324 - DRILL-6310 ) - it would 
lower the batch size for Hash-Agg (but not enough to fix this example).


was (Author: ben-zvi):
Note: "Estimated batch size: 31260672." That's about 31M !!  And "Memory limit: 
2302755" - that's only 23M.
[~rhou] - Was the Hash-Agg *fallback* option enabled ?
By default (the 2nd phase) should have failed and given an error asking for 
more memory. 

Side comment:  Padma's PR is ready to commit ( #1324 - DRILL-6310 ) - it would 
lower the batch size for Hash-Agg (but not enough to fix this example).


> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.14.0
>
>
> This is TPCDS Query 66.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> 

[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539133#comment-16539133
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201167369
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -704,5 +704,8 @@ public static String bootDefaultFor(String name) {
   public static final String STATS_LOGGING_FG_BATCH_SIZE_OPTION = 
"drill.exec.stats.logging.fine_grained.batch_size";
   public static final BooleanValidator STATS_LOGGING_BATCH_FG_SIZE_VALIDATOR = 
new BooleanValidator(STATS_LOGGING_FG_BATCH_SIZE_OPTION);
 
+  /** Controls the list of operators for which batch sizing stats should be 
enabled */
 
 Review comment:
   Can you please explain the motivation for the naming hierarchy that you have 
chosen for this option ? I would suggest 
"drill.exec.stats.logging.batch_size.enabled_operators"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539135#comment-16539135
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201441390
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
 
 Review comment:
   are the operator-ids in statsLoggingOperator  supposed to have "_"s in them 
? Otherwise, looks they will not match with something like 
"3:[PARQUET_ROW_GROUP_SCAN]"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539137#comment-16539137
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201459812
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
+RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+if (!batchStatsContext.isEnableBatchSzLogging()) {
+  return; // NOOP
+}
+
+final String statsId = batchStatsContext.getContextOperatorId();
+final boolean verbose = batchStatsContext.isEnableFgBatchSzLogging();
+final String msg = printRecordBatchStats(statsId, sourceId, recordBatch, 
verbose);
+
+logBatchStatsMsg(batchStatsContext, msg, false);
+  }
+
+  /**
+   * Logs a generic batch statistics message
+   *
+   * @param message log message
+   * @param batchStatsLogging
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String message,
+RecordBatchStatsContext batchStatsContext) {
+
+if (!batchStatsContext.isEnableBatchSzLogging()) {
+  return; // NOOP
+}
+
+logBatchStatsMsg(batchStatsContext, message, true);
+  }
+
+  /**
+   * Prints a materialized field type
+   * @param field materialized field
+   * @param msg string builder where to append the field type
+   */
+  /*
+  public static void printFieldType(MaterializedField field, StringBuilder 
msg) {
 
 Review comment:
   commented-out function. please remove.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539132#comment-16539132
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201186631
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenBinaryReader.java
 ##
 @@ -34,11 +34,12 @@
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowState;
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowStateContainer;
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.VarLenColumnBatchStats;
+import org.apache.drill.exec.util.record.RecordBatchStats;
 import org.apache.drill.exec.vector.ValueVector;
 
 /** Class which handles reading a batch of rows from a set of variable columns 
*/
 public class VarLenBinaryReader {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class);
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class);
 
 Review comment:
   please remove commented line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539134#comment-16539134
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201465077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
 
 Review comment:
   Can 'sourceId' be renamed to scanSourceId ? 
   
   If an operator wants to print both incoming and outgoing batches, how can 
that be done in this logging framework in a way where you can distinguish 
between the both?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6560) Allow options for controlling the batch size per operator

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539136#comment-16539136
 ] 

ASF GitHub Bot commented on DRILL-6560:
---

bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201458543
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
 
 Review comment:
   there seem to be no callers of this function. is this meant for operators 
which don't have a sourceId ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow options for controlling the batch size per operator
> -
>
> Key: DRILL-6560
> URL: https://issues.apache.org/jira/browse/DRILL-6560
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This Jira is for internal Drill DEV use; the following capabilities are 
> needed for automating the batch sizing functionality testing:
>  * Control the enablement of batch sizing statistics at session (per query) 
> and server level (all queries)
>  * Control the granularity of batch sizing statistics (summary or verbose)
>  * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6475) Unnest: Null fieldId Pointer

2018-07-10 Thread Hanumath Rao Maduri (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539120#comment-16539120
 ] 

Hanumath Rao Maduri commented on DRILL-6475:


[~priteshm] I think I should be able to open a PR by Thu for this JIRA.

> Unnest: Null fieldId Pointer 
> -
>
> Key: DRILL-6475
> URL: https://issues.apache.org/jira/browse/DRILL-6475
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
>  Executing the following (in TestE2EUnnestAndLateral.java) causes an NPE as 
> `fieldId` is null in `schemaChanged()`: 
> {code}
> @Test
> public void testMultipleBatchesLateral_twoUnnests() throws Exception {
>  String sql = "SELECT t5.l_quantity FROM dfs.`lateraljoin/multipleFiles/` t, 
> LATERAL " +
>  "(SELECT t2.ordrs FROM UNNEST(t.c_orders) t2(ordrs)) t3(ordrs), LATERAL " +
>  "(SELECT t4.l_quantity FROM UNNEST(t3.ordrs) t4(l_quantity)) t5";
>  test(sql);
> }
> {code}
>  
> And the error is:
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 25f42765-8f68-418e-840a-ffe65788e1e2 on 10.254.130.25:31020]
> (java.lang.NullPointerException) null
>  
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged():381
>  org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext():199
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides():241
>  org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema():264
>  org.apache.drill.exec.record.AbstractRecordBatch.next():152
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1657
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>  java.lang.Thread.run():745 (state=,code=0)
> {code} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6583) UI usability issue

2018-07-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6583:
-
Labels: ready-to-commit  (was: )

> UI usability issue
> --
>
> Key: DRILL-6583
> URL: https://issues.apache.org/jira/browse/DRILL-6583
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: UI_usability_issue_AD_1_14_0.png
>
>
> When a query is under execution, on the web UI we see this text which is 
> actually a set of different links that help navigate to different pages on 
> the UI, below that query's profile.
> Apache Drill 1.14.0
> git.commit.id.abbrev=f481a7c
> They all appear on a single line with no spacing and a typo, the formatting 
> of the text for those links needs to be changed / improved.
> Attached is a screenshot for the issue, look for "FirstPrevious1NextLast" on 
> the bottom left of the screenshot.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-07-10 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539048#comment-16539048
 ] 

salim achouche commented on DRILL-6569:
---

Robert,

According to the original comment:
 * Using the DFS command is successful; this invokes the Parquet reader
 * Running the complex query (without the explicit DFS clause) fails; the stack 
trace indicates the Hive reader was invoked
 ** 
org.apache.drill.exec.store.hive.readers.{color:#d04437}*HiveParquetReader.*{color}next():54
 
 ** org.apache.drill.exec.physical.impl.ScanBatch.next():172

 

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> 

[jira] [Updated] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6588:

Reviewer: Aman Sinha

> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539040#comment-16539040
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua opened a new pull request #1371: DRILL-6588: Make Sys tables of 
nullable datatypes
URL: https://github.com/apache/drill/pull/1371
 
 
   This addresses the issue of columns in the System tables being marked as 
non-nullable. While these tables are immutable, they can carry nullable values 
as well. (e.g. `num_val` in `sys.options`)
   This commit introduces an annotation for the `PojoReader` that applies the 
nullable property if explicitly defined for a column (i.e. explicitly defined 
`isNullable` for a member of the POJO instance).
   ```
   apache drill 1.14.0-SNAPSHOT 
   "just drill it"
   0: jdbc:drill:schema=sys> select count(*) from sys.options where num_val is 
null;
   +-+
   | EXPR$0  |
   +-+
   | 108 |
   +-+
   1 row selected (2.703 seconds)
   0: jdbc:drill:schema=sys> select count(*) from sys.options where 
isnull(num_val);
   +-+
   | EXPR$0  |
   +-+
   | 108 |
   +-+
   1 row selected (0.3 seconds)
   0: jdbc:drill:schema=sys> select distinct is_nullable, count(*) from 
INFORMATION_SCHEMA.`COLUMNS` where table_schema = 'sys' group by is_nullable;
   +--+-+
   | is_nullable  | EXPR$1  |
   +--+-+
   | NO   | 36  |
   | YES  | 50  |
   +--+-+
   2 rows selected (0.69 seconds)
   0: jdbc:drill:schema=sys> describe options;
   +---++--+
   |COLUMN_NAME| DATA_TYPE  | IS_NULLABLE  |
   +---++--+
   | name  | CHARACTER VARYING  | NO   |
   | kind  | CHARACTER VARYING  | NO   |
   | accessibleScopes  | CHARACTER VARYING  | NO   |
   | optionScope   | CHARACTER VARYING  | NO   |
   | status| CHARACTER VARYING  | NO   |
   | num_val   | BIGINT | YES  |
   | string_val| CHARACTER VARYING  | YES  |
   | bool_val  | BOOLEAN| YES  |
   | float_val | DOUBLE | YES  |
   +---++--+
   9 rows selected (0.221 seconds)
   0: jdbc:drill:schema=sys> describe internal_options;
   +---++--+
   |COLUMN_NAME| DATA_TYPE  | IS_NULLABLE  |
   +---++--+
   | name  | CHARACTER VARYING  | NO   |
   | kind  | CHARACTER VARYING  | NO   |
   | accessibleScopes  | CHARACTER VARYING  | NO   |
   | optionScope   | CHARACTER VARYING  | NO   |
   | status| CHARACTER VARYING  | NO   |
   | num_val   | BIGINT | YES  |
   | string_val| CHARACTER VARYING  | YES  |
   | bool_val  | BOOLEAN| YES  |
   | float_val | DOUBLE | YES  |
   +---++--+
   9 rows selected (0.185 seconds)
   0: jdbc:drill:schema=sys> describe options_val;
   +---++--+
   |COLUMN_NAME| DATA_TYPE  | IS_NULLABLE  |
   +---++--+
   | name  | CHARACTER VARYING  | NO   |
   | kind  | CHARACTER VARYING  | NO   |
   | accessibleScopes  | CHARACTER VARYING  | NO   |
   | val   | CHARACTER VARYING  | YES  |
   | optionScope   | CHARACTER VARYING  | NO   |
   +---++--+
   5 rows selected (0.183 seconds)
   0: jdbc:drill:schema=sys> describe profiles_json;
   +--++--+
   | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
   +--++--+
   | queryId  | CHARACTER VARYING  | NO   |
   | json | CHARACTER VARYING  | YES  |
   +--++--+
   2 rows selected (0.727 seconds)
   0: jdbc:drill:schema=sys> describe profiles;
   +--++--+
   | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
   +--++--+
   | queryId  | CHARACTER VARYING  | NO   |
   | startTime| TIMESTAMP  | YES  |
   | foreman  | CHARACTER VARYING  | NO   |
   | fragments| BIGINT | YES  |
   | user | CHARACTER VARYING  | YES  |
   | queue| CHARACTER VARYING  | YES  |
   | planTime | BIGINT 

[jira] [Commented] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539042#comment-16539042
 ] 

ASF GitHub Bot commented on DRILL-6588:
---

kkhatua commented on issue #1371: DRILL-6588: Make Sys tables of nullable 
datatypes
URL: https://github.com/apache/drill/pull/1371#issuecomment-403914728
 
 
   @amansinha100 can you review this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6588) System table columns incorrectly marked as non-nullable

2018-07-10 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6588:

Fix Version/s: 1.14.0

> System table columns incorrectly marked as non-nullable 
> 
>
> Key: DRILL-6588
> URL: https://issues.apache.org/jira/browse/DRILL-6588
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Aman Sinha
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> System table columns can contain null values but they are incorrectly marked 
> as non-nullable as shown in example table below:  
> {noformat}
> 0: jdbc:drill:drillbit=10.10.10.191> describe sys.boot;
> +---++--+
> |    COLUMN_NAME    |     DATA_TYPE      | IS_NULLABLE  |
> +---++--+
> | name              | CHARACTER VARYING  | NO           |
> | kind              | CHARACTER VARYING  | NO           |
> | accessibleScopes  | CHARACTER VARYING  | NO           |
> | optionScope       | CHARACTER VARYING  | NO           |
> | status            | CHARACTER VARYING  | NO           |
> | num_val           | BIGINT             | NO           |
> | string_val        | CHARACTER VARYING  | NO           |
> | bool_val          | BOOLEAN            | NO           |
> | float_val         | DOUBLE             | NO           |
> +---++--+{noformat}
>  
> Note that several columns are nulls: 
> {noformat}
> +---+--+--+-++-++--+---+
> |                       name                        |   kind   | 
> accessibleScopes | optionScope | status | num_val | string_val | bool_val | 
> float_val |
> +---+--+--+-++-++--+---+
> drill.exec.options.exec.udf.enable_dynamic_support | BOOLEAN | BOOT | BOOT | 
> BOOT | null | null | true | null |{noformat}
>  
> Because of the not-null metadata, the predicates on these tables such as 
> `WHERE  IS NULL` evaluate to FALSE which is incorrect. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6475) Unnest: Null fieldId Pointer

2018-07-10 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539017#comment-16539017
 ] 

Pritesh Maker commented on DRILL-6475:
--

[~hanu.ncr] will you be ready with this fix by Thu? [~ben-zvi] sent an email 
for this.

> Unnest: Null fieldId Pointer 
> -
>
> Key: DRILL-6475
> URL: https://issues.apache.org/jira/browse/DRILL-6475
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
>  Executing the following (in TestE2EUnnestAndLateral.java) causes an NPE as 
> `fieldId` is null in `schemaChanged()`: 
> {code}
> @Test
> public void testMultipleBatchesLateral_twoUnnests() throws Exception {
>  String sql = "SELECT t5.l_quantity FROM dfs.`lateraljoin/multipleFiles/` t, 
> LATERAL " +
>  "(SELECT t2.ordrs FROM UNNEST(t.c_orders) t2(ordrs)) t3(ordrs), LATERAL " +
>  "(SELECT t4.l_quantity FROM UNNEST(t3.ordrs) t4(l_quantity)) t5";
>  test(sql);
> }
> {code}
>  
> And the error is:
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 25f42765-8f68-418e-840a-ffe65788e1e2 on 10.254.130.25:31020]
> (java.lang.NullPointerException) null
>  
> org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged():381
>  org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext():199
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  
> org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides():241
>  org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema():264
>  org.apache.drill.exec.record.AbstractRecordBatch.next():152
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.record.AbstractRecordBatch.next():119
>  org.apache.drill.exec.record.AbstractRecordBatch.next():109
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>  org.apache.drill.exec.record.AbstractRecordBatch.next():172
>  
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1657
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>  java.lang.Thread.run():745 (state=,code=0)
> {code} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-07-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538994#comment-16538994
 ] 

Robert Hou commented on DRILL-6569:
---

We are accessing the parquet file using the dfs storage plugin.  Should we be 
using the Hive parquet reader?  When do we use the Drill parquet reader?

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 

[jira] [Updated] (DRILL-6581) Improve C++ Client SSL Implementation

2018-07-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6581:
-
Fix Version/s: 1.14.0

> Improve C++ Client SSL Implementation
> -
>
> Key: DRILL-6581
> URL: https://issues.apache.org/jira/browse/DRILL-6581
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.12.0
>Reporter: Rob Wu
>Assignee: Rob Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> # Fix: Hostname verification doesnt function as expected: Host and port in 
> the ssl hostname verification callback is always empty.
>  # Fix: Certificate load verification exceptions are swallowed and not 
> propagated.
>  # Improvement: SSL V3 is not disabled.
>  # Improvement: Hostname verification failure exception is the same as other 
> certificate verification failures, we should separate them
>  # Improvement: Create individual error messages to allow error handling of 
> the application using the client and follows the standard of the rest of the 
> errors
>  # Improvement: Add SSL Hostname verification with zookeeper connection mode 
> support
>  # Added support for custom SSL CTX Options



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5796:
-
Reviewer: Vlad Rozov  (was: Arina Ielchiieva)

> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6517) IllegalStateException: Record count not set for this vector container

2018-07-10 Thread salim achouche (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538974#comment-16538974
 ] 

salim achouche commented on DRILL-6517:
---

[~khfaraaz], when queries are cancelled, we anticipate exceptions to be thrown 
(e.g., an interrupted thread will receive an exception on a blocking call). The 
questions which I am trying to figure out:
 * Is the IllegalException thrown only on query cancellation?
 * Is there a more important bug causing the foreman to cancel the query?

 

So I'll use your real cluster to debug this issue.

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an 
> IllegalStateException: Record count not set for this vector container, in 
> drillbit.log
> Steps to reproduce the problem, query profile 
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u 
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=" -f 
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this 
> vector container
>  at com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:690)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.RecordBatchSizer.(RecordBatchSizer.java:662)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
>  ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>  at 
> 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538960#comment-16538960
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201422339
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,70 +62,72 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
   private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics)exprStat).getMax()) {
+return RowsMatch.NONE;
+  }
+  return ((BooleanStatistics)exprStat).getMin() && 
((BooleanStatistics)exprStat).getMax() ? checkNull(exprStat) : RowsMatch.SOME;
+});
   }
 
   /**
* IS FALSE predicate.
*/
   private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
 return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if min value is not false or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+  exprStat.hasNonNullValue() && ((BooleanStatistics)exprStat).getMin() || 
isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.NONE : 
checkNull(exprStat)
 
 Review comment:
   Is it necessary to 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538959#comment-16538959
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201420744
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,70 +62,72 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
   private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics)exprStat).getMax()) {
+return RowsMatch.NONE;
+  }
+  return ((BooleanStatistics)exprStat).getMin() && 
((BooleanStatistics)exprStat).getMax() ? checkNull(exprStat) : RowsMatch.SOME;
 
 Review comment:
   It is the same pattern: if min is True, max must be True. It is necessary to 
check for min only.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538952#comment-16538952
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on issue #1298: DRILL-5796: Filter pruning for multi 
rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#issuecomment-403894483
 
 
   @jbimbert I am asking to change existing code, just the parts you have made 
changes in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538922#comment-16538922
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on issue #1298: DRILL-5796: Filter pruning for multi 
rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#issuecomment-403890029
 
 
@arina-ielchiieva  I have been asked to " keep the original format. Note 
that the new format makes it harder to review on the github." and when I 
applied checkstyle, I had to "revert back all formatting only changes." (see 
previous comment)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538907#comment-16538907
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201412121
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,90 +62,90 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
-  private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+  private static > LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && (!((BooleanStatistics)exprStat).getMin() && 
!((BooleanStatistics)exprStat).getMax())) {
 
 Review comment:
   OK. Suppressed getMin. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538899#comment-16538899
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201410502
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,90 +62,90 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
-  private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+  private static > LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
 
 Review comment:
   C dropped. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538886#comment-16538886
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on issue #1298: DRILL-5796: Filter pruning for multi 
rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#issuecomment-403882818
 
 
   I don't think that would be hard to fix, @jbimbert please make sure you 
correct checkstyle. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538882#comment-16538882
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201406425
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,90 +62,90 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
-  private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+  private static > LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
 
 Review comment:
   It is not a question of preference, but validity. What will it mean to use 
`ParquetIsPredicate.createIsTruePredicate()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538873#comment-16538873
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201405251
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,90 +62,90 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
-  private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+  private static > LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && (!((BooleanStatistics)exprStat).getMin() && 
!((BooleanStatistics)exprStat).getMax())) {
 
 Review comment:
   min=True and max=False point to corrupted statistics, please see comparator 
(`Boolean.compare`) for BooleanStatistics. If you have an example, it is 
necessary to see why the statistics is corrupted. Note that if the order for 
Boolean is not properly defined, it is not valid to assume that if min is 
True(False) and max is True(False), the rest values are also True(False).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, 

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538865#comment-16538865
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on issue #1298: DRILL-5796: Filter pruning for multi rowgroup 
parquet file
URL: https://github.com/apache/drill/pull/1298#issuecomment-403878332
 
 
   @arina-ielchiieva There are few formatting issues (like double spaces and 
missing spaces), but I am not paying attention to them as they are not enforced 
by the Drill checkstyle. If you want them to be fixed, please let @jbimbert 
know.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538844#comment-16538844
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201400269
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -59,7 +59,7 @@ public ParquetMetaStatCollector(ParquetTableMetadataBase 
parquetTableMetadata,
 // Reasons to pass implicit columns and their values:
 // 1. Differentiate implicit columns from regular non-exist columns. 
Implicit columns do not
 //exist in parquet metadata. Without such knowledge, implicit columns 
is treated as non-exist
-//column.  A condition on non-exist column would lead to canDrop = 
true, which is not the
+//column.  A condition on non-exist column would lead to matches = 
true, which is not the
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538841#comment-16538841
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201399967
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetFooterStatCollector.java
 ##
 @@ -59,7 +59,7 @@ public ParquetFooterStatCollector(ParquetMetadata footer, 
int rowGroupIndex, Map
 // Reasons to pass implicit columns and their values:
 // 1. Differentiate implicit columns from regular non-exist columns. 
Implicit columns do not
 //exist in parquet metadata. Without such knowledge, implicit columns 
is treated as non-exist
-//column.  A condition on non-exist column would lead to canDrop = 
true, which is not the
+//column.  A condition on non-exist column would lead to matches = 
true, which is not the
 
 Review comment:
   ALL. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538838#comment-16538838
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201399120
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetMetaStatCollector.java
 ##
 @@ -59,7 +59,7 @@ public ParquetMetaStatCollector(ParquetTableMetadataBase 
parquetTableMetadata,
 // Reasons to pass implicit columns and their values:
 // 1. Differentiate implicit columns from regular non-exist columns. 
Implicit columns do not
 //exist in parquet metadata. Without such knowledge, implicit columns 
is treated as non-exist
-//column.  A condition on non-exist column would lead to canDrop = 
true, which is not the
+//column.  A condition on non-exist column would lead to matches = 
true, which is not the
 
 Review comment:
   The same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538837#comment-16538837
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201398953
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/stat/ParquetFooterStatCollector.java
 ##
 @@ -59,7 +59,7 @@ public ParquetFooterStatCollector(ParquetMetadata footer, 
int rowGroupIndex, Map
 // Reasons to pass implicit columns and their values:
 // 1. Differentiate implicit columns from regular non-exist columns. 
Implicit columns do not
 //exist in parquet metadata. Without such knowledge, implicit columns 
is treated as non-exist
-//column.  A condition on non-exist column would lead to canDrop = 
true, which is not the
+//column.  A condition on non-exist column would lead to matches = 
true, which is not the
 
 Review comment:
   I guess it is either ALL or NONE, but not `true`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538818#comment-16538818
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r201395530
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,90 +62,90 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if(expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
-  private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+  private static > LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && (!((BooleanStatistics)exprStat).getMin() && 
!((BooleanStatistics)exprStat).getMax())) {
+return RowsMatch.NONE;
+  }
+  return ((BooleanStatistics)exprStat).getMin() && 
((BooleanStatistics)exprStat).getMax() ? checkNull(exprStat) : RowsMatch.SOME;
+});
   }
 
   /**
* IS FALSE predicate.
*/
-  private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if min value is not false or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+  private static > LogicalExpression 

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538804#comment-16538804
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

parthchandra commented on a change in pull request #1358:  DRILL-6516: EMIT 
support in streaming agg
URL: https://github.com/apache/drill/pull/1358#discussion_r201168413
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
 ##
 @@ -154,83 +188,174 @@ public void buildSchema() throws SchemaChangeException {
   public IterOutcome innerNext() {
 
 // if a special batch has been sent, we have no data in the incoming so 
exit early
-if (specialBatchSent) {
-  return IterOutcome.NONE;
+if ( done || specialBatchSent) {
+  return NONE;
+}
+
+// We sent an OK_NEW_SCHEMA and also encountered the end of a data set. So 
we need to send
+// an EMIT with an empty batch now
+if (sendEmit) {
+  sendEmit = false;
+  firstBatchForDataSet = true;
+  recordCount = 0;
+  return EMIT;
 }
 
 // this is only called on the first batch. Beyond this, the aggregator 
manages batches.
 if (aggregator == null || first) {
-  IterOutcome outcome;
   if (first && incoming.getRecordCount() > 0) {
 first = false;
-outcome = IterOutcome.OK_NEW_SCHEMA;
+lastKnownOutcome = OK_NEW_SCHEMA;
   } else {
-outcome = next(incoming);
+lastKnownOutcome = next(incoming);
   }
-  logger.debug("Next outcome of {}", outcome);
-  switch (outcome) {
-  case NONE:
-if (first && popConfig.getKeys().size() == 0) {
+  logger.debug("Next outcome of {}", lastKnownOutcome);
+  switch (lastKnownOutcome) {
+case NONE:
+  if (firstBatchForDataSet && popConfig.getKeys().size() == 0) {
+// if we have a straight aggregate and empty input batch, we need 
to handle it in a different way
+constructSpecialBatch();
+// set state to indicate the fact that we have sent a special 
batch and input is empty
+specialBatchSent = true;
+// If outcome is NONE then we send the special batch in the first 
iteration and the NONE
+// outcome in the next iteration. If outcome is EMIT, we can send 
the special
+// batch and the EMIT outcome at the same time.
+return getFinalOutcome();
+  }
+  // else fall thru
+case OUT_OF_MEMORY:
+case NOT_YET:
+case STOP:
+  return lastKnownOutcome;
+case OK_NEW_SCHEMA:
+  if (!createAggregator()) {
+done = true;
+return IterOutcome.STOP;
+  }
+  break;
+case EMIT:
+  if (firstBatchForDataSet && popConfig.getKeys().size() == 0) {
+// if we have a straight aggregate and empty input batch, we need 
to handle it in a different way
+constructSpecialBatch();
+// set state to indicate the fact that we have sent a special 
batch and input is empty
+specialBatchSent = true;
+firstBatchForDataSet = true; // reset on the next iteration
+// If outcome is NONE then we send the special batch in the first 
iteration and the NONE
+// outcome in the next iteration. If outcome is EMIT, we can send 
the special
+// batch and the EMIT outcome at the same time.
+return getFinalOutcome();
+  }
+  // else fall thru
+case OK:
+  break;
+default:
+  throw new IllegalStateException(String.format("unknown outcome %s", 
lastKnownOutcome));
+  }
+} else {
+  if ( lastKnownOutcome != NONE && firstBatchForDataSet && 
!aggregator.isDone()) {
+lastKnownOutcome = incoming.next();
+if (!first && firstBatchForDataSet) {
+  //Setup needs to be called again. During setup, generated code saves 
a reference to the vectors
+  // pointed to by the incoming batch so that the dereferencing of the 
vector wrappers to get to
+  // the vectors  does not have to be done at each call to eval. 
However, after an EMIT is seen,
+  // the vectors are replaced and the reference to the old vectors is 
no longer valid
+  try {
+aggregator.setup(oContext, incoming, this);
+  } catch (SchemaChangeException e) {
+UserException.Builder exceptionBuilder = 
UserException.functionError(e)
+.message("A Schema change exception occured in calling setup() 
in generated code.");
+throw exceptionBuilder.build(logger);
+  }
+}
+  }
+  // We sent an EMIT in the previous iteration, so we must be starting a 
new data set
+  if (firstBatchForDataSet) {
+done = false;
+sendEmit = false;
+specialBatchSent = 

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538812#comment-16538812
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

parthchandra commented on a change in pull request #1358:  DRILL-6516: EMIT 
support in streaming agg
URL: https://github.com/apache/drill/pull/1358#discussion_r201170448
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
 ##
 @@ -189,83 +209,128 @@ else if (isSame( previousIndex, currentIndex )) {
   logger.debug("Received IterOutcome of {}", out);
 }
 switch (out) {
-case NONE:
-  done = true;
-  lastOutcome = out;
-  if (first && addedRecordCount == 0) {
-return setOkAndReturn();
-  } else if (addedRecordCount > 0) {
-outputToBatchPrev(previous, previousIndex, outputCount); // No 
need to check the return value
-// (output container full or not) as we are not going to 
insert any more records.
-if (EXTRA_DEBUG) {
-  logger.debug("Received no more batches, returning.");
+  case NONE:
+done = true;
+lastOutcome = out;
+if (firstBatchForDataSet && addedRecordCount == 0) {
+  return setOkAndReturn(out);
+} else if (addedRecordCount > 0) {
+  outputToBatchPrev(previous, previousIndex, outputCount); // 
No need to check the return value
+  // (output container full or not) as we are not going to 
insert any more records.
+  if (EXTRA_DEBUG) {
+logger.debug("Received no more batches, returning.");
+  }
+  return setOkAndReturn(out);
+} else {
+  // not first batch and record Count == 0
+  outcome = out;
+  return AggOutcome.CLEANUP_AND_RETURN;
 }
-return setOkAndReturn();
-  } else {
-if (first && out == IterOutcome.OK) {
-  out = IterOutcome.OK_NEW_SCHEMA;
+// EMIT is handled like OK, except that we do not loop back to 
process the
+// next incoming batch; we return instead
+  case EMIT:
+if (incoming.getRecordCount() == 0) {
+  if (addedRecordCount > 0) {
+outputToBatchPrev(previous, previousIndex, outputCount);
+  }
+  resetIndex();
+  return setOkAndReturn(out);
+} else {
+  resetIndex();
+  if (previousIndex != -1 && isSamePrev(previousIndex, 
previous, currentIndex)) {
+if (EXTRA_DEBUG) {
+  logger.debug("New value was same as last value of 
previous batch, adding.");
+}
+addRecordInc(currentIndex);
+previousIndex = currentIndex;
+incIndex();
+if (EXTRA_DEBUG) {
+  logger.debug("Continuing outside");
+}
+processRemainingRecordsInBatch();
+// currentIndex has been reset to int_max so use previous 
index.
+outputToBatch(previousIndex);
+resetIndex();
+return setOkAndReturn(out);
+  } else { // not the same
+if (EXTRA_DEBUG) {
+  logger.debug("This is not the same as the previous, add 
record and continue outside.");
+}
+if (addedRecordCount > 0) {
+  if (outputToBatchPrev(previous, previousIndex, 
outputCount)) {
+if (EXTRA_DEBUG) {
+  logger.debug("Output container is full. flushing 
it.");
+}
+return setOkAndReturn(out);
+  }
+}
+previousIndex = -1;
+processRemainingRecordsInBatch();
+outputToBatch(previousIndex); // currentIndex has been 
reset to int_max so use previous index.
+resetIndex();
+return setOkAndReturn(out);
+  }
 }
-outcome = out;
-return AggOutcome.CLEANUP_AND_RETURN;
-  }
-
-case NOT_YET:
-  this.outcome = out;
-  return AggOutcome.RETURN_OUTCOME;
-
-case OK_NEW_SCHEMA:
-  if (EXTRA_DEBUG) {
-logger.debug("Received new schema.  Batch has {} records.", 
incoming.getRecordCount());
-  }
- 

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538810#comment-16538810
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

parthchandra commented on a change in pull request #1358:  DRILL-6516: EMIT 
support in streaming agg
URL: https://github.com/apache/drill/pull/1358#discussion_r201170290
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
 ##
 @@ -189,83 +209,128 @@ else if (isSame( previousIndex, currentIndex )) {
   logger.debug("Received IterOutcome of {}", out);
 }
 switch (out) {
-case NONE:
-  done = true;
-  lastOutcome = out;
-  if (first && addedRecordCount == 0) {
-return setOkAndReturn();
-  } else if (addedRecordCount > 0) {
-outputToBatchPrev(previous, previousIndex, outputCount); // No 
need to check the return value
-// (output container full or not) as we are not going to 
insert any more records.
-if (EXTRA_DEBUG) {
-  logger.debug("Received no more batches, returning.");
+  case NONE:
+done = true;
+lastOutcome = out;
+if (firstBatchForDataSet && addedRecordCount == 0) {
+  return setOkAndReturn(out);
+} else if (addedRecordCount > 0) {
+  outputToBatchPrev(previous, previousIndex, outputCount); // 
No need to check the return value
+  // (output container full or not) as we are not going to 
insert any more records.
+  if (EXTRA_DEBUG) {
+logger.debug("Received no more batches, returning.");
+  }
+  return setOkAndReturn(out);
+} else {
+  // not first batch and record Count == 0
+  outcome = out;
+  return AggOutcome.CLEANUP_AND_RETURN;
 }
-return setOkAndReturn();
-  } else {
-if (first && out == IterOutcome.OK) {
-  out = IterOutcome.OK_NEW_SCHEMA;
+// EMIT is handled like OK, except that we do not loop back to 
process the
+// next incoming batch; we return instead
+  case EMIT:
+if (incoming.getRecordCount() == 0) {
+  if (addedRecordCount > 0) {
+outputToBatchPrev(previous, previousIndex, outputCount);
+  }
+  resetIndex();
+  return setOkAndReturn(out);
+} else {
+  resetIndex();
+  if (previousIndex != -1 && isSamePrev(previousIndex, 
previous, currentIndex)) {
+if (EXTRA_DEBUG) {
+  logger.debug("New value was same as last value of 
previous batch, adding.");
+}
+addRecordInc(currentIndex);
+previousIndex = currentIndex;
+incIndex();
+if (EXTRA_DEBUG) {
+  logger.debug("Continuing outside");
+}
+processRemainingRecordsInBatch();
+// currentIndex has been reset to int_max so use previous 
index.
+outputToBatch(previousIndex);
+resetIndex();
+return setOkAndReturn(out);
+  } else { // not the same
+if (EXTRA_DEBUG) {
+  logger.debug("This is not the same as the previous, add 
record and continue outside.");
+}
+if (addedRecordCount > 0) {
+  if (outputToBatchPrev(previous, previousIndex, 
outputCount)) {
+if (EXTRA_DEBUG) {
+  logger.debug("Output container is full. flushing 
it.");
+}
 
 Review comment:
   Yes it does appear out of place but  it isn't really. It actually is tied to 
the fact that we must reset several state variables (this being one of them) 
every time EMIT is processed.
   Also, while debugging I found places where I had missed resetting the 
previousIndex, so this is a safe place to do so. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in streaming agg
> -
>
> Key: DRILL-6516
> URL: https://issues.apache.org/jira/browse/DRILL-6516
> 

[jira] [Commented] (DRILL-6516) Support for EMIT outcome in streaming agg

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538811#comment-16538811
 ] 

ASF GitHub Bot commented on DRILL-6516:
---

parthchandra commented on a change in pull request #1358:  DRILL-6516: EMIT 
support in streaming agg
URL: https://github.com/apache/drill/pull/1358#discussion_r201170576
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -394,4 +394,47 @@ public void testLateral_HashAgg_with_nulls() throws 
Exception {
   .baselineValues("dd",222L)
   .build().run();
   }
+
+  @Test
+  public void testMultipleBatchesLateral_WithStreamingAgg() throws Exception {
+String sql = "SELECT t2.maxprice FROM (SELECT customer.c_orders AS 
c_orders FROM "
++ "dfs.`lateraljoin/multipleFiles/` customer) t1, LATERAL (SELECT 
CAST(MAX(t.ord.o_totalprice)"
++ " AS int) AS maxprice FROM UNNEST(t1.c_orders) t(ord) GROUP BY 
t.ord.o_orderstatus) t2";
+
+testBuilder()
+.optionSettingQueriesForTestQuery("alter session set `%s` = true",
+PlannerSettings.STREAMAGG.getOptionName())
 
 Review comment:
   Had to do this because the HashAgg tests were disabling the streaming agg. 
Removed this and reset the option in the HashAgg tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in streaming agg
> -
>
> Key: DRILL-6516
> URL: https://issues.apache.org/jira/browse/DRILL-6516
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.14.0
>
>
> Update the streaming aggregator to recognize the EMIT outcome



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >