[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-26 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394980#comment-15394980
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

[~amansinha100]] I have updated the pull request 
(https://github.com/apache/drill/pull/552) to account for latest Calcite 
changes. Can you please take a look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390107#comment-15390107
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

I have updated the pull request (https://github.com/apache/drill/pull/552) 
based on your comments [~sudheeshkatkam][~amansinha100]. Can you please take a 
look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390089#comment-15390089
 ] 

ASF GitHub Bot commented on DRILL-3710:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71934130
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
@@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws 
Exception {
 testIncludeFilter(query4, 4, "Filter", 16);
   }
 
+  @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST 
size
+  public void testPartitionFilterWithInSubquery() throws Exception {
+String query = String.format("select * from 
dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 
1994, 1994, 1994)", TEST_RES_PATH);
+/* In list size exceeds threshold - no partition pruning since 
predicate converted to join */
+test("alter session set `planner.in_subquery_threshold` = 2");
--- End diff --

A bug could cause us to not obey the option at all i.e. we always do 
partition pruning regardless of the option setting. This unit test checks we do 
obey the option.


> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389992#comment-15389992
 ] 

ASF GitHub Bot commented on DRILL-3710:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71923747
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
@@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws 
Exception {
 testIncludeFilter(query4, 4, "Filter", 16);
   }
 
+  @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST 
size
+  public void testPartitionFilterWithInSubquery() throws Exception {
+String query = String.format("select * from 
dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 
1994, 1994, 1994)", TEST_RES_PATH);
+/* In list size exceeds threshold - no partition pruning since 
predicate converted to join */
+test("alter session set `planner.in_subquery_threshold` = 2");
--- End diff --

Not sure if it is necessary to check the no-partition-pruning case.  
Basically, the goal of the test is to see if partition pruning works with large 
IN lists. 


> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389127#comment-15389127
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

I have created the pull request (https://github.com/apache/drill/pull/552). 
[~jni] [~amansinha100] can you please take a look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2015-08-25 Thread Hao Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712244#comment-14712244
 ] 

Hao Zhu commented on DRILL-3710:


a. No optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02Project($f0=[1])
00-03  SelectionVectorRemover
00-04Filter(condition=[OR(=(CAST($0):INTEGER, 1), 
=(CAST($0):INTEGER, 2), =(CAST($0):INTEGER, 3), =(CAST($0):INTEGER, 4), 
=(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER, 7), 
=(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), 
=(CAST($0):INTEGER, 11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), 
=(CAST($0):INTEGER, 14), =(CAST($0):INTEGER, 15), =(CAST($0):INTEGER, 16), 
=(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER, 19))])
00-05  Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}
b. With optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in 
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02Project($f0=[1])
00-03  Project(f6=[$1], ROW_VALUE=[$0])
00-04MergeJoin(condition=[=($1, $0)], joinType=[inner])
00-06  SelectionVectorRemover
00-08Sort(sort0=[$0], dir0=[ASC])
00-10  HashAgg(group=[{0}])
00-12Values
00-05  SelectionVectorRemover
00-07Sort(sort0=[$0], dir0=[ASC])
00-09  Project(f6=[CAST($0):INTEGER])
00-11Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:h1_passwords), 
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], 
columns=[`col2`], partitions= null]])
{code}

 Make the 20 in-list optimization configurable
 -

 Key: DRILL-3710
 URL: https://issues.apache.org/jira/browse/DRILL-3710
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning  Optimization
Affects Versions: 1.1.0
Reporter: Hao Zhu
Assignee: Jinfeng Ni

 If Drill has more than 20 in-lists , Drill can do an optimization to convert 
 that in-lists into a small hash table in memory, and then do a table join 
 instead.
 This can improve the performance of the query which has many in-lists.
 Could we make 20 configurable? So that we do not need to add duplicate/junk 
 in-list to make it more than 20.
 Sample query is :
 select count(*) from table where col in 
 (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)