[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147610#comment-15147610 ] Victoria Markman commented on DRILL-4201: - I verified that partial filter is getting pushed down, however it is not going to happen always. It depends on the costing and heuristic there is a bit tricky. In the case below, filter is not going to be pushed pass project, because file vicky.json contains only 2 rows: {code} 0: jdbc:drill:schema=dfs> explain plan for select . . . . . . . . . . . . > * . . . . . . . . . . . . > from . . . . . . . . . . . . > hive.lineitem_text_hive l . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( select . . . . . . . . . . . . > flatten(test) as test, . . . . . . . . . . . . > o_orderkey as orderkey . . . . . . . . . . . . > from . . . . . . . . . . . . > dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o . . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey ) . . . . . . . . . . . . > where test = 1 and o.orderkey = 22; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-02Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-03 HashJoin(condition=[=($0, $17)], joinType=[inner]) 00-05Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= null, inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]]) 00-04SelectionVectorRemover 00-06 Filter(condition=[AND(=($0, 1), =($1, 22))]) 00-07Flatten(flattenField=[$0]) 00-08 Project(test=[$1], orderkey=[$0]) 00-09Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, numFiles=1, columns=[`test`, `o_orderkey`], files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]]) {code} It's not going to be pushed pass project even if I add 40 columns to be projected (json file with 2 rows): {code} 0: jdbc:drill:schema=dfs> explain plan for select . . . . . . . . . . . . > * . . . . . . . . . . . . > from . . . . . . . . . . . . > hive.lineitem_text_hive l . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( select . . . . . . . . . . . . > flatten(test) as test, . . . . . . . . . . . . > o_orderkey as orderkey, . . . . . . . . . . . . > o_orderkey + 1 as o1, . . . . . . . . . . . . > o_orderkey + 2 as o2, . . . . . . . . . . . . > o_orderkey + 3 as o3, . . . . . . . . . . . . > o_orderkey + 4 as o4, . . . . . . . . . . . . > o_orderkey + 5 as o5, . . . . . . . . . . . . > o_orderkey + 6 as o6, . . . . . . . . . . . . > o_orderkey + 7 as o7, . . . . . . . . . . . . > o_orderkey + 8 as o8, . . . . . . . . . . . . > o_orderkey + 9 as o9, . . . . . . . . . . . . > o_orderkey + 10 as o10, . . . . . . . . . . . . > o_orderkey + 11 as o11, . . . . . . . . . . . . > o_orderkey + 12 as o12, . . . . . . . . . . . . > o_orderkey + 13 as o13, . . . . . . . . . . . . > o_orderkey + 14 as o14, . . . . . . . . . . . . > o_orderkey + 15 as o15, . . . . . . . . . . . . > o_orderkey + 16 as o16, . . . . . . . . . . . . > o_orderkey + 17 as o17, . . . . . . . . . . . . > o_orderkey + 18 as o18, . . . . . . . . . . . . > o_orderkey + 19 as o19, . . . . . . . . . . . . > o_orderkey + 20 as o20, . . . . . . . . . . . . > o_orderkey + 21 as o21, . . . . . . . . . . . . > o_orderkey + 22 as o22, . . . . . . . . . . . . > o_orderkey + 23 as o23, . . . . . . . . . . . . > o_orderkey + 24 as
[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068717#comment-15068717 ] Jinfeng Ni commented on DRILL-4201: --- Fixed in commit: 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > DrillPushFilterPastProject should allow partial filter pushdown. > - > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068716#comment-15068716 ] ASF GitHub Bot commented on DRILL-4201: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/305 > DrillPushFilterPastProject should allow partial filter pushdown. > - > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062688#comment-15062688 ] ASF GitHub Bot commented on DRILL-4201: --- GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/305 DRILL-4201 : Allow partial filter to be pushed down project for bette… …r performance. Partial filter pushdown has performance benefits because: 1) enable partition pruning, if the pushed down involves partitioning columns, 2) allow the filter to be applied in upper stream. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinfengni/incubator-drill DRILL-4201 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/305.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #305 commit 68203ad035f65d1708ce228df432d5b23f4af3ba Author: Jinfeng NiDate: 2015-12-12T00:00:13Z DRILL-4201 : Allow partial filter to be pushed down project for better performance. Partial filter pushdown has performance benefits because: 1) enable partition pruning, if the pushed down involves partitioning columns, 2) allow the filter to be applied in upper stream. > DrillPushFilterPastProject should allow partial filter pushdown. > - > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062690#comment-15062690 ] ASF GitHub Bot commented on DRILL-4201: --- Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/305#issuecomment-165567570 @amansinha100 , could you please review the patch for DRILL-4201? Thanks! > DrillPushFilterPastProject should allow partial filter pushdown. > - > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.
[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063266#comment-15063266 ] ASF GitHub Bot commented on DRILL-4201: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/305#issuecomment-165635050 +1 LGTM. > DrillPushFilterPastProject should allow partial filter pushdown. > - > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Aman Sinha > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)