[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394980#comment-15394980 ] Gautam Kumar Parai commented on DRILL-3710: --- [~amansinha100]] I have updated the pull request (https://github.com/apache/drill/pull/552) to account for latest Calcite changes. Can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390107#comment-15390107 ] Gautam Kumar Parai commented on DRILL-3710: --- I have updated the pull request (https://github.com/apache/drill/pull/552) based on your comments [~sudheeshkatkam][~amansinha100]. Can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390089#comment-15390089 ] ASF GitHub Bot commented on DRILL-3710: --- Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/552#discussion_r71934130 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws Exception { testIncludeFilter(query4, 4, "Filter", 16); } + @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST size + public void testPartitionFilterWithInSubquery() throws Exception { +String query = String.format("select * from dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 1994, 1994, 1994)", TEST_RES_PATH); +/* In list size exceeds threshold - no partition pruning since predicate converted to join */ +test("alter session set `planner.in_subquery_threshold` = 2"); --- End diff -- A bug could cause us to not obey the option at all i.e. we always do partition pruning regardless of the option setting. This unit test checks we do obey the option. > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389992#comment-15389992 ] ASF GitHub Bot commented on DRILL-3710: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/552#discussion_r71923747 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws Exception { testIncludeFilter(query4, 4, "Filter", 16); } + @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST size + public void testPartitionFilterWithInSubquery() throws Exception { +String query = String.format("select * from dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 1994, 1994, 1994)", TEST_RES_PATH); +/* In list size exceeds threshold - no partition pruning since predicate converted to join */ +test("alter session set `planner.in_subquery_threshold` = 2"); --- End diff -- Not sure if it is necessary to check the no-partition-pruning case. Basically, the goal of the test is to see if partition pruning works with large IN lists. > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389127#comment-15389127 ] Gautam Kumar Parai commented on DRILL-3710: --- I have created the pull request (https://github.com/apache/drill/pull/552). [~jni] [~amansinha100] can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712244#comment-14712244 ] Hao Zhu commented on DRILL-3710: a. No optimization {code} explain plan for select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02Project($f0=[1]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(=(CAST($0):INTEGER, 1), =(CAST($0):INTEGER, 2), =(CAST($0):INTEGER, 3), =(CAST($0):INTEGER, 4), =(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER, 7), =(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), =(CAST($0):INTEGER, 11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), =(CAST($0):INTEGER, 14), =(CAST($0):INTEGER, 15), =(CAST($0):INTEGER, 16), =(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER, 19))]) 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords), inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`], partitions= null]]) {code} b. With optimization {code} explain plan for select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02Project($f0=[1]) 00-03 Project(f6=[$1], ROW_VALUE=[$0]) 00-04MergeJoin(condition=[=($1, $0)], joinType=[inner]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 HashAgg(group=[{0}]) 00-12Values 00-05 SelectionVectorRemover 00-07Sort(sort0=[$0], dir0=[ASC]) 00-09 Project(f6=[CAST($0):INTEGER]) 00-11Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords), inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`], partitions= null]]) {code} Make the 20 in-list optimization configurable - Key: DRILL-3710 URL: https://issues.apache.org/jira/browse/DRILL-3710 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization Affects Versions: 1.1.0 Reporter: Hao Zhu Assignee: Jinfeng Ni If Drill has more than 20 in-lists , Drill can do an optimization to convert that in-lists into a small hash table in memory, and then do a table join instead. This can improve the performance of the query which has many in-lists. Could we make 20 configurable? So that we do not need to add duplicate/junk in-list to make it more than 20. Sample query is : select count(*) from table where col in (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)