[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407034#comment-16407034
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1152


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406945#comment-16406945
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1152
  
+1


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405227#comment-16405227
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1152
  
Thanks, @chunhui-shi - marked it as ready-to-commit since the original 
feature was already merged to 1.13. The batch committer this week can take 
another look as well.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405210#comment-16405210
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1152
  
+1, good to me.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402106#comment-16402106
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1152
  
@HanumathRao thanks for the review. Applied code review comment.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402103#comment-16402103
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175137249
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestPushDownAndPruningWithItemStar.java
 ---
@@ -180,4 +248,38 @@ public void testFilterPushDownMultipleConditions() 
throws Exception {
 .build();
   }
 
+  @Test
+  public void testFilterPushDownWithSeveralNestedStarSubQueries() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select * 
from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
+
+String[] expectedPlan = {"numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=\\[`\\*\\*`, `o_orderdate`\\]"};
+String[] excludedPlan = {};
+
+PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, 
excludedPlan);
+
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.sqlBaselineQuery("select * from `%s`.`%s` where o_orderdate = 
date '1992-01-01'", DFS_TMP_SCHEMA, TABLE_NAME)
+.build();
+  }
+
+  @Test
+  public void 
testFilterPushDownWithSeveralNestedStarSubQueriesWithAdditionalColumns() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select *, 
o_orderdate from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
--- End diff --

Done.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402104#comment-16402104
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175120182
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))),
+  "DrillFilterItemStarReWriterRule.FilterOnProject");
 
-ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames());
-filterRel.getCondition().accept(itemStarFieldsVisitor);
 
-// there are no item fields, no need to proceed further
-if (!itemStarFieldsVisitor.hasItemStarFields()) {
-  return;
+  private static class ProjectOnScan extends RelOptRule {
+
+ProjectOnScan(RelOptRuleOperand operand, String id) {
+  super(operand, id);
 }
 
-Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scan = call.rel(1);
+  return scan.getGroupScan() instanceof ParquetGroupScan && 
super.matches(call);
+}
 
-// create new scan
-RelNode newScan = constructNewScan(scanRel, itemStarFields.keySet());
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillProjectRel projectRel = call.rel(0);
+  DrillScanRel scanRel = call.rel(1);
+
+  ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(scanRel.getRowType().getFieldNames());
+  List projects = projectRel.getProjects();
+  for (RexNode project : projects) {
+project.accept(itemStarFieldsVisitor);
+  }
 
-// combine original and new projects
-List newProjects = new ArrayList<>(projectRel.getProjects());
+  Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
 
-// prepare node mapper to replace item star calls with new input field 
references
-Map fieldMapper = new HashMap<>();
+  // if there are no item fields, no need to proceed further
+  if (itemStarFieldsVisitor.hasNoItemStarFields()) {
--- End diff --

Sure, moved.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
>  

[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402102#comment-16402102
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175136589
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
--- End diff --

Done.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399078#comment-16399078
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user HanumathRao commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r174568699
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestPushDownAndPruningWithItemStar.java
 ---
@@ -180,4 +248,38 @@ public void testFilterPushDownMultipleConditions() 
throws Exception {
 .build();
   }
 
+  @Test
+  public void testFilterPushDownWithSeveralNestedStarSubQueries() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select * 
from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
+
+String[] expectedPlan = {"numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=\\[`\\*\\*`, `o_orderdate`\\]"};
+String[] excludedPlan = {};
+
+PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, 
excludedPlan);
+
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.sqlBaselineQuery("select * from `%s`.`%s` where o_orderdate = 
date '1992-01-01'", DFS_TMP_SCHEMA, TABLE_NAME)
+.build();
+  }
+
+  @Test
+  public void 
testFilterPushDownWithSeveralNestedStarSubQueriesWithAdditionalColumns() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select *, 
o_orderdate from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
--- End diff --

Is it better to use other column than o_orderdate in the inside subquery?


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399011#comment-16399011
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user HanumathRao commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r174558288
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))),
+  "DrillFilterItemStarReWriterRule.FilterOnProject");
 
-ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames());
-filterRel.getCondition().accept(itemStarFieldsVisitor);
 
-// there are no item fields, no need to proceed further
-if (!itemStarFieldsVisitor.hasItemStarFields()) {
-  return;
+  private static class ProjectOnScan extends RelOptRule {
+
+ProjectOnScan(RelOptRuleOperand operand, String id) {
+  super(operand, id);
 }
 
-Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scan = call.rel(1);
+  return scan.getGroupScan() instanceof ParquetGroupScan && 
super.matches(call);
+}
 
-// create new scan
-RelNode newScan = constructNewScan(scanRel, itemStarFields.keySet());
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillProjectRel projectRel = call.rel(0);
+  DrillScanRel scanRel = call.rel(1);
+
+  ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(scanRel.getRowType().getFieldNames());
+  List projects = projectRel.getProjects();
+  for (RexNode project : projects) {
+project.accept(itemStarFieldsVisitor);
+  }
 
-// combine original and new projects
-List newProjects = new ArrayList<>(projectRel.getProjects());
+  Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
 
-// prepare node mapper to replace item star calls with new input field 
references
-Map fieldMapper = new HashMap<>();
+  // if there are no item fields, no need to proceed further
+  if (itemStarFieldsVisitor.hasNoItemStarFields()) {
--- End diff --

Can this be moved to before getItemStarFields call?.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: 

[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399008#comment-16399008
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user HanumathRao commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r174558063
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
--- End diff --

Would it be good to rename this as FILTER_PROJECT_SCAN?


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398421#comment-16398421
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1152
  
@chunhui-shi yes, you are correct, we are trying to find item star columns 
and push them into scan 
This case if F is optional then we don't have filter and there will no 
filter push down, only project push down can happen in this case. Such case is 
covered in `testProjectIntoScanWithSeveralNestedStarSubQueries`.
Regarding additional columns in the outermost 'select' , could you please 
give an example?


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391769#comment-16391769
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1152
  
@chunhui-shi or @HanumathRao can you please review this?


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388128#comment-16388128
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1152

DRILL-6199: Add support for filter push down and partition pruning wi…

…th several nested star sub-queries

Re-written original solution to apply rule on later stages when we work 
with Drill rels rather then with Calcite rels. With several nested sub-queries 
we end up with several projects each for sub-query: Filter - Project  - 
Scan. When applying rule with Drill rels, other rules will take care of such 
intermediate projects and we end up checking only three cases: Project - Scan, 
Filter - Scan, Filter - Project - Scan.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6199

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1152


commit ae64b3f5afbfe779d297e568c5228a443737b449
Author: Arina Ielchiieva 
Date:   2018-03-04T20:12:06Z

DRILL-6199: Add support for filter push down and partition pruning with 
several nested star sub-queries




> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388129#comment-16388129
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1152
  
@chunhui-shi please review.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-01 Thread Anton Gozhiy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382002#comment-16382002
 ] 

Anton Gozhiy commented on DRILL-6199:
-

Additional cases where this issue is reproduced:
*Partition pruning:*
- *Data:*
{code:sql}
create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_files` (c1, c2, c3, c4, 
c5) partition by (c1) as select cast(columns[0] as int) c1, columns[1] c2, 
columns[2] c3, columns[3] c4, columns[4] c5 from 
dfs.tmp.`DRILL_6118_data_source.csv`;
{code}
- *Query:*
{code:sql}
explain plan for select * from (select * from (select * from 
dfs.tmp.`DRILL_6118_parquet_partitioned_by_files`)) where c1 between 2 and 4
{code}
- *Expected result:*
numFiles=3, numRowGroups=3 (scanning 3 partitions)

- *Actual result:*
numFiles=1, numRowGroups=5 (scanning all partitions)

*Directory pruning:*
- *Query:*
{code:sql}
explain plan for select * from (select * from (select * from 
dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where dir0='d2'
{code}

- *Expected result:*
numFiles=1, numRowGroups=1

- *Actual result:*
numFiles=3, numRowGroups=3

> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)