[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407034#comment-16407034 ] ASF GitHub Bot commented on DRILL-6199: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1152 > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406945#comment-16406945 ] ASF GitHub Bot commented on DRILL-6199: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/1152 +1 > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405227#comment-16405227 ] ASF GitHub Bot commented on DRILL-6199: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1152 Thanks, @chunhui-shi - marked it as ready-to-commit since the original feature was already merged to 1.13. The batch committer this week can take another look as well. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405210#comment-16405210 ] ASF GitHub Bot commented on DRILL-6199: --- Github user chunhui-shi commented on the issue: https://github.com/apache/drill/pull/1152 +1, good to me. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402106#comment-16402106 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1152 @HanumathRao thanks for the review. Applied code review comment. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402103#comment-16402103 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r175137249 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestPushDownAndPruningWithItemStar.java --- @@ -180,4 +248,38 @@ public void testFilterPushDownMultipleConditions() throws Exception { .build(); } + @Test + public void testFilterPushDownWithSeveralNestedStarSubQueries() throws Exception { +String subQuery = String.format("select * from `%s`.`%s`", DFS_TMP_SCHEMA, TABLE_NAME); +String query = String.format("select * from (select * from (select * from (%s))) where o_orderdate = date '1992-01-01'", subQuery); + +String[] expectedPlan = {"numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=\\[`\\*\\*`, `o_orderdate`\\]"}; +String[] excludedPlan = {}; + +PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, excludedPlan); + +testBuilder() +.sqlQuery(query) +.unOrdered() +.sqlBaselineQuery("select * from `%s`.`%s` where o_orderdate = date '1992-01-01'", DFS_TMP_SCHEMA, TABLE_NAME) +.build(); + } + + @Test + public void testFilterPushDownWithSeveralNestedStarSubQueriesWithAdditionalColumns() throws Exception { +String subQuery = String.format("select * from `%s`.`%s`", DFS_TMP_SCHEMA, TABLE_NAME); +String query = String.format("select * from (select * from (select *, o_orderdate from (%s))) where o_orderdate = date '1992-01-01'", subQuery); --- End diff -- Done. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402104#comment-16402104 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r175120182 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java --- @@ -54,83 +44,189 @@ import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter; /** - * Rule will transform filter -> project -> scan call with item star fields in filter - * into project -> filter -> project -> scan where item star fields are pushed into scan - * and replaced with actual field references. + * Rule will transform item star fields in filter and replaced with actual field references. * * This will help partition pruning and push down rules to detect fields that can be pruned or push downed. * Item star operator appears when sub-select or cte with star are used as source. */ -public class DrillFilterItemStarReWriterRule extends RelOptRule { +public class DrillFilterItemStarReWriterRule { - public static final DrillFilterItemStarReWriterRule INSTANCE = new DrillFilterItemStarReWriterRule( - RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, RelOptHelper.any( TableScan.class))), - "DrillFilterItemStarReWriterRule"); + public static final DrillFilterItemStarReWriterRule.ProjectOnScan PROJECT_ON_SCAN = new ProjectOnScan( + RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.ProjectOnScan"); - private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, String id) { -super(operand, id); - } + public static final DrillFilterItemStarReWriterRule.FilterOnScan FILTER_ON_SCAN = new FilterOnScan( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.FilterOnScan"); - @Override - public void onMatch(RelOptRuleCall call) { -Filter filterRel = call.rel(0); -Project projectRel = call.rel(1); -TableScan scanRel = call.rel(2); + public static final DrillFilterItemStarReWriterRule.FilterOnProject FILTER_ON_PROJECT = new FilterOnProject( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), + "DrillFilterItemStarReWriterRule.FilterOnProject"); -ItemStarFieldsVisitor itemStarFieldsVisitor = new ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames()); -filterRel.getCondition().accept(itemStarFieldsVisitor); -// there are no item fields, no need to proceed further -if (!itemStarFieldsVisitor.hasItemStarFields()) { - return; + private static class ProjectOnScan extends RelOptRule { + +ProjectOnScan(RelOptRuleOperand operand, String id) { + super(operand, id); } -MapitemStarFields = itemStarFieldsVisitor.getItemStarFields(); +@Override +public boolean matches(RelOptRuleCall call) { + DrillScanRel scan = call.rel(1); + return scan.getGroupScan() instanceof ParquetGroupScan && super.matches(call); +} -// create new scan -RelNode newScan = constructNewScan(scanRel, itemStarFields.keySet()); +@Override +public void onMatch(RelOptRuleCall call) { + DrillProjectRel projectRel = call.rel(0); + DrillScanRel scanRel = call.rel(1); + + ItemStarFieldsVisitor itemStarFieldsVisitor = new ItemStarFieldsVisitor(scanRel.getRowType().getFieldNames()); + List projects = projectRel.getProjects(); + for (RexNode project : projects) { +project.accept(itemStarFieldsVisitor); + } -// combine original and new projects -List newProjects = new ArrayList<>(projectRel.getProjects()); + Map itemStarFields = itemStarFieldsVisitor.getItemStarFields(); -// prepare node mapper to replace item star calls with new input field references -Map fieldMapper = new HashMap<>(); + // if there are no item fields, no need to proceed further + if (itemStarFieldsVisitor.hasNoItemStarFields()) { --- End diff -- Sure, moved. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 >
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402102#comment-16402102 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r175136589 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java --- @@ -54,83 +44,189 @@ import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter; /** - * Rule will transform filter -> project -> scan call with item star fields in filter - * into project -> filter -> project -> scan where item star fields are pushed into scan - * and replaced with actual field references. + * Rule will transform item star fields in filter and replaced with actual field references. * * This will help partition pruning and push down rules to detect fields that can be pruned or push downed. * Item star operator appears when sub-select or cte with star are used as source. */ -public class DrillFilterItemStarReWriterRule extends RelOptRule { +public class DrillFilterItemStarReWriterRule { - public static final DrillFilterItemStarReWriterRule INSTANCE = new DrillFilterItemStarReWriterRule( - RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, RelOptHelper.any( TableScan.class))), - "DrillFilterItemStarReWriterRule"); + public static final DrillFilterItemStarReWriterRule.ProjectOnScan PROJECT_ON_SCAN = new ProjectOnScan( + RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.ProjectOnScan"); - private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, String id) { -super(operand, id); - } + public static final DrillFilterItemStarReWriterRule.FilterOnScan FILTER_ON_SCAN = new FilterOnScan( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.FilterOnScan"); - @Override - public void onMatch(RelOptRuleCall call) { -Filter filterRel = call.rel(0); -Project projectRel = call.rel(1); -TableScan scanRel = call.rel(2); + public static final DrillFilterItemStarReWriterRule.FilterOnProject FILTER_ON_PROJECT = new FilterOnProject( --- End diff -- Done. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399078#comment-16399078 ] ASF GitHub Bot commented on DRILL-6199: --- Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r174568699 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestPushDownAndPruningWithItemStar.java --- @@ -180,4 +248,38 @@ public void testFilterPushDownMultipleConditions() throws Exception { .build(); } + @Test + public void testFilterPushDownWithSeveralNestedStarSubQueries() throws Exception { +String subQuery = String.format("select * from `%s`.`%s`", DFS_TMP_SCHEMA, TABLE_NAME); +String query = String.format("select * from (select * from (select * from (%s))) where o_orderdate = date '1992-01-01'", subQuery); + +String[] expectedPlan = {"numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=\\[`\\*\\*`, `o_orderdate`\\]"}; +String[] excludedPlan = {}; + +PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, excludedPlan); + +testBuilder() +.sqlQuery(query) +.unOrdered() +.sqlBaselineQuery("select * from `%s`.`%s` where o_orderdate = date '1992-01-01'", DFS_TMP_SCHEMA, TABLE_NAME) +.build(); + } + + @Test + public void testFilterPushDownWithSeveralNestedStarSubQueriesWithAdditionalColumns() throws Exception { +String subQuery = String.format("select * from `%s`.`%s`", DFS_TMP_SCHEMA, TABLE_NAME); +String query = String.format("select * from (select * from (select *, o_orderdate from (%s))) where o_orderdate = date '1992-01-01'", subQuery); --- End diff -- Is it better to use other column than o_orderdate in the inside subquery? > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399011#comment-16399011 ] ASF GitHub Bot commented on DRILL-6199: --- Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r174558288 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java --- @@ -54,83 +44,189 @@ import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter; /** - * Rule will transform filter -> project -> scan call with item star fields in filter - * into project -> filter -> project -> scan where item star fields are pushed into scan - * and replaced with actual field references. + * Rule will transform item star fields in filter and replaced with actual field references. * * This will help partition pruning and push down rules to detect fields that can be pruned or push downed. * Item star operator appears when sub-select or cte with star are used as source. */ -public class DrillFilterItemStarReWriterRule extends RelOptRule { +public class DrillFilterItemStarReWriterRule { - public static final DrillFilterItemStarReWriterRule INSTANCE = new DrillFilterItemStarReWriterRule( - RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, RelOptHelper.any( TableScan.class))), - "DrillFilterItemStarReWriterRule"); + public static final DrillFilterItemStarReWriterRule.ProjectOnScan PROJECT_ON_SCAN = new ProjectOnScan( + RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.ProjectOnScan"); - private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, String id) { -super(operand, id); - } + public static final DrillFilterItemStarReWriterRule.FilterOnScan FILTER_ON_SCAN = new FilterOnScan( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.FilterOnScan"); - @Override - public void onMatch(RelOptRuleCall call) { -Filter filterRel = call.rel(0); -Project projectRel = call.rel(1); -TableScan scanRel = call.rel(2); + public static final DrillFilterItemStarReWriterRule.FilterOnProject FILTER_ON_PROJECT = new FilterOnProject( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))), + "DrillFilterItemStarReWriterRule.FilterOnProject"); -ItemStarFieldsVisitor itemStarFieldsVisitor = new ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames()); -filterRel.getCondition().accept(itemStarFieldsVisitor); -// there are no item fields, no need to proceed further -if (!itemStarFieldsVisitor.hasItemStarFields()) { - return; + private static class ProjectOnScan extends RelOptRule { + +ProjectOnScan(RelOptRuleOperand operand, String id) { + super(operand, id); } -MapitemStarFields = itemStarFieldsVisitor.getItemStarFields(); +@Override +public boolean matches(RelOptRuleCall call) { + DrillScanRel scan = call.rel(1); + return scan.getGroupScan() instanceof ParquetGroupScan && super.matches(call); +} -// create new scan -RelNode newScan = constructNewScan(scanRel, itemStarFields.keySet()); +@Override +public void onMatch(RelOptRuleCall call) { + DrillProjectRel projectRel = call.rel(0); + DrillScanRel scanRel = call.rel(1); + + ItemStarFieldsVisitor itemStarFieldsVisitor = new ItemStarFieldsVisitor(scanRel.getRowType().getFieldNames()); + List projects = projectRel.getProjects(); + for (RexNode project : projects) { +project.accept(itemStarFieldsVisitor); + } -// combine original and new projects -List newProjects = new ArrayList<>(projectRel.getProjects()); + Map itemStarFields = itemStarFieldsVisitor.getItemStarFields(); -// prepare node mapper to replace item star calls with new input field references -Map fieldMapper = new HashMap<>(); + // if there are no item fields, no need to proceed further + if (itemStarFieldsVisitor.hasNoItemStarFields()) { --- End diff -- Can this be moved to before getItemStarFields call?. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL:
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399008#comment-16399008 ] ASF GitHub Bot commented on DRILL-6199: --- Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/1152#discussion_r174558063 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java --- @@ -54,83 +44,189 @@ import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter; /** - * Rule will transform filter -> project -> scan call with item star fields in filter - * into project -> filter -> project -> scan where item star fields are pushed into scan - * and replaced with actual field references. + * Rule will transform item star fields in filter and replaced with actual field references. * * This will help partition pruning and push down rules to detect fields that can be pruned or push downed. * Item star operator appears when sub-select or cte with star are used as source. */ -public class DrillFilterItemStarReWriterRule extends RelOptRule { +public class DrillFilterItemStarReWriterRule { - public static final DrillFilterItemStarReWriterRule INSTANCE = new DrillFilterItemStarReWriterRule( - RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, RelOptHelper.any( TableScan.class))), - "DrillFilterItemStarReWriterRule"); + public static final DrillFilterItemStarReWriterRule.ProjectOnScan PROJECT_ON_SCAN = new ProjectOnScan( + RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.ProjectOnScan"); - private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, String id) { -super(operand, id); - } + public static final DrillFilterItemStarReWriterRule.FilterOnScan FILTER_ON_SCAN = new FilterOnScan( + RelOptHelper.some(DrillFilterRel.class, RelOptHelper.any(DrillScanRel.class)), + "DrillFilterItemStarReWriterRule.FilterOnScan"); - @Override - public void onMatch(RelOptRuleCall call) { -Filter filterRel = call.rel(0); -Project projectRel = call.rel(1); -TableScan scanRel = call.rel(2); + public static final DrillFilterItemStarReWriterRule.FilterOnProject FILTER_ON_PROJECT = new FilterOnProject( --- End diff -- Would it be good to rename this as FILTER_PROJECT_SCAN? > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398421#comment-16398421 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1152 @chunhui-shi yes, you are correct, we are trying to find item star columns and push them into scan This case if F is optional then we don't have filter and there will no filter push down, only project push down can happen in this case. Such case is covered in `testProjectIntoScanWithSeveralNestedStarSubQueries`. Regarding additional columns in the outermost 'select' , could you please give an example? > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.14.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391769#comment-16391769 ] ASF GitHub Bot commented on DRILL-6199: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1152 @chunhui-shi or @HanumathRao can you please review this? > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388128#comment-16388128 ] ASF GitHub Bot commented on DRILL-6199: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1152 DRILL-6199: Add support for filter push down and partition pruning wi… …th several nested star sub-queries Re-written original solution to apply rule on later stages when we work with Drill rels rather then with Calcite rels. With several nested sub-queries we end up with several projects each for sub-query: Filter - Project - Scan. When applying rule with Drill rels, other rules will take care of such intermediate projects and we end up checking only three cases: Project - Scan, Filter - Scan, Filter - Project - Scan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-6199 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1152 commit ae64b3f5afbfe779d297e568c5228a443737b449 Author: Arina IelchiievaDate: 2018-03-04T20:12:06Z DRILL-6199: Add support for filter push down and partition pruning with several nested star sub-queries > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388129#comment-16388129 ] ASF GitHub Bot commented on DRILL-6199: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1152 @chunhui-shi please review. > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries
[ https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382002#comment-16382002 ] Anton Gozhiy commented on DRILL-6199: - Additional cases where this issue is reproduced: *Partition pruning:* - *Data:* {code:sql} create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_files` (c1, c2, c3, c4, c5) partition by (c1) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv`; {code} - *Query:* {code:sql} explain plan for select * from (select * from (select * from dfs.tmp.`DRILL_6118_parquet_partitioned_by_files`)) where c1 between 2 and 4 {code} - *Expected result:* numFiles=3, numRowGroups=3 (scanning 3 partitions) - *Actual result:* numFiles=1, numRowGroups=5 (scanning all partitions) *Directory pruning:* - *Query:* {code:sql} explain plan for select * from (select * from (select * from dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where dir0='d2' {code} - *Expected result:* numFiles=1, numRowGroups=1 - *Actual result:* numFiles=3, numRowGroups=3 > Filter push down doesn't work with more than one nested subqueries > -- > > Key: DRILL-6199 > URL: https://issues.apache.org/jira/browse/DRILL-6199 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Anton Gozhiy >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > Attachments: DRILL_6118_data_source.csv > > > *Data set:* > The data is generated used the attached file: *DRILL_6118_data_source.csv* > Data gen commands: > {code:sql} > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0] in (1, 3); > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]=2; > create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, > c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] > c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` > where columns[0]>3; > {code} > *Steps:* > # Execute the following query: > {code:sql} > explain plan for select * from (select * from (select * from > dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3 > {code} > *Expected result:* > numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be > scanned. > *Actual result:* > Filter push down doesn't work: > numFiles=3, numRowGroups=3, scanning from all files -- This message was sent by Atlassian JIRA (v7.6.3#76005)