[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2016-02-15 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147610#comment-15147610
 ] 

Victoria Markman commented on DRILL-4201:
-

I verified that partial filter is getting pushed down, however it is not going 
to happen always. It depends on the costing and heuristic there is a bit tricky.

In the case below, filter is not going to be pushed pass project, because file 
vicky.json contains only 2 rows:
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . > *
. . . . . . . . . . . . > from
. . . . . . . . . . . . > hive.lineitem_text_hive l
. . . . . . . . . . . . > inner join
. . . . . . . . . . . . > ( select
. . . . . . . . . . . . > flatten(test)   as test,
. . . . . . . . . . . . > o_orderkey  as orderkey
. . . . . . . . . . . . > from
. . . . . . . . . . . . > 
dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o
. . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey )
. . . . . . . . . . . . > where test = 1 and o.orderkey = 22;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-02Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-03  HashJoin(condition=[=($0, $17)], joinType=[inner])
00-05Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= 
null, 
inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]])
00-04SelectionVectorRemover
00-06  Filter(condition=[AND(=($0, 1), =($1, 22))])
00-07Flatten(flattenField=[$0])
00-08  Project(test=[$1], orderkey=[$0])
00-09Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, 
numFiles=1, columns=[`test`, `o_orderkey`], 
files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]])
{code}

It's not going to be pushed pass project even if I add 40 columns to be 
projected (json file with 2 rows):
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . > *
. . . . . . . . . . . . > from
. . . . . . . . . . . . > hive.lineitem_text_hive l
. . . . . . . . . . . . > inner join
. . . . . . . . . . . . > ( select
. . . . . . . . . . . . > flatten(test)   as test,
. . . . . . . . . . . . > o_orderkey  as orderkey,
. . . . . . . . . . . . > o_orderkey + 1  as o1,
. . . . . . . . . . . . > o_orderkey + 2  as o2,
. . . . . . . . . . . . > o_orderkey + 3  as o3,
. . . . . . . . . . . . > o_orderkey + 4  as o4,
. . . . . . . . . . . . > o_orderkey + 5  as o5,
. . . . . . . . . . . . > o_orderkey + 6  as o6,
. . . . . . . . . . . . > o_orderkey + 7  as o7,
. . . . . . . . . . . . > o_orderkey + 8  as o8,
. . . . . . . . . . . . > o_orderkey + 9  as o9,
. . . . . . . . . . . . > o_orderkey + 10 as o10,
. . . . . . . . . . . . > o_orderkey + 11 as o11,
. . . . . . . . . . . . > o_orderkey + 12 as o12,
. . . . . . . . . . . . > o_orderkey + 13 as o13,
. . . . . . . . . . . . > o_orderkey + 14 as o14,
. . . . . . . . . . . . > o_orderkey + 15 as o15,
. . . . . . . . . . . . > o_orderkey + 16 as o16,
. . . . . . . . . . . . > o_orderkey + 17 as o17,
. . . . . . . . . . . . > o_orderkey + 18 as o18,
. . . . . . . . . . . . > o_orderkey + 19 as o19,
. . . . . . . . . . . . > o_orderkey + 20 as o20,
. . . . . . . . . . . . > o_orderkey + 21 as o21,
. . . . . . . . . . . . > o_orderkey + 22 as o22,
. . . . . . . . . . . . > o_orderkey + 23 as o23,
. . . . . . . . . . . . > o_orderkey + 24 as 

[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-22 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068717#comment-15068717
 ] 

Jinfeng Ni commented on DRILL-4201:
---

Fixed in commit:  1ea3d6c3f144614caf460648c1c27c6d0f5b06b8


> DrillPushFilterPastProject should allow partial filter pushdown. 
> -
>
> Key: DRILL-4201
> URL: https://issues.apache.org/jira/browse/DRILL-4201
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068716#comment-15068716
 ] 

ASF GitHub Bot commented on DRILL-4201:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/305


> DrillPushFilterPastProject should allow partial filter pushdown. 
> -
>
> Key: DRILL-4201
> URL: https://issues.apache.org/jira/browse/DRILL-4201
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062688#comment-15062688
 ] 

ASF GitHub Bot commented on DRILL-4201:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/305

DRILL-4201 : Allow partial filter to be pushed down project for bette…

…r performance.

Partial filter pushdown has performance benefits because:
1) enable partition pruning, if the pushed down involves partitioning 
columns,
2) allow the filter to be applied in upper stream.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill DRILL-4201

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/305.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #305


commit 68203ad035f65d1708ce228df432d5b23f4af3ba
Author: Jinfeng Ni 
Date:   2015-12-12T00:00:13Z

DRILL-4201 : Allow partial filter to be pushed down project for better 
performance.

Partial filter pushdown has performance benefits because:
1) enable partition pruning, if the pushed down involves partitioning 
columns,
2) allow the filter to be applied in upper stream.




> DrillPushFilterPastProject should allow partial filter pushdown. 
> -
>
> Key: DRILL-4201
> URL: https://issues.apache.org/jira/browse/DRILL-4201
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062690#comment-15062690
 ] 

ASF GitHub Bot commented on DRILL-4201:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/305#issuecomment-165567570
  
@amansinha100 , could you please review the patch for DRILL-4201? Thanks!



> DrillPushFilterPastProject should allow partial filter pushdown. 
> -
>
> Key: DRILL-4201
> URL: https://issues.apache.org/jira/browse/DRILL-4201
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063266#comment-15063266
 ] 

ASF GitHub Bot commented on DRILL-4201:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/305#issuecomment-165635050
  
+1 LGTM. 


> DrillPushFilterPastProject should allow partial filter pushdown. 
> -
>
> Key: DRILL-4201
> URL: https://issues.apache.org/jira/browse/DRILL-4201
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Aman Sinha
> Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)