[ 
https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637340#comment-16637340
 ] 

Vitalii Diravka edited comment on DRILL-6465 at 10/3/18 6:23 PM:
-----------------------------------------------------------------

So the part of this Jira is solved.

The other part is a new Calcite issue:
 For local join conditions the logical plan can involve predicates in the 
Project operator. But this case isn't considered in the 
_RelMdPredicates#getPredicates - \{RexCall} ">=($1, 1988)"._ Or if Project 
operator shouldn't be used for such kind of predicates the planner should 
replace Project with a Filter. I will follow with creating a new Calcite Jira 
ticket for it.

_Example:_
{code:java}
SELECT * FROM first t1 JOIN second t2 ON t1.`year` = t2.`year` and t2.`year` 
BETWEEN 1988 AND 1991
{code}
{code:java}
LogicalProject(**=[$0], **0=[$2])
  LogicalProject(**=[$0], year=[$1], **0=[$2], year0=[$3])
    LogicalJoin(condition=[AND(=($1, $3), $4, $5)], joinType=[inner])
      EnumerableTableScan(table=[[dfs, 
parquetFilterPush/transitiveClosure/first]])
      LogicalProject(**=[$0], year=[$1], $f2=[>=($1, 1988)], $f3=[<=($1, 1991)])
        EnumerableTableScan(table=[[dfs, 
parquetFilterPush/transitiveClosure/second]])
{code}


was (Author: vitalii):
So the part of this Jira is solved.

The other part is a new Calcite issue:
For local join conditions the logical plan can involve predicates in the 
Project operator. But this case isn't considered in the 
_RelMdPredicates#getPredicates - \{RexCall} ">=($1, 1988)"._ Or if Project 
operator shouldn't be used for such kind of predicates the planner should 
replace Project with a Filter. I will follow up with creating a new Calcite 
Jira ticket for it.

_Example:_
{code}
SELECT * FROM first t1 JOIN second t2 ON t1.`year` = t2.`year` and t2.`year` 
BETWEEN 1988 AND 1991
{code}
{code}
LogicalProject(**=[$0], **0=[$2])
  LogicalProject(**=[$0], year=[$1], **0=[$2], year0=[$3])
    LogicalJoin(condition=[AND(=($1, $3), $4, $5)], joinType=[inner])
      EnumerableTableScan(table=[[dfs, 
parquetFilterPush/transitiveClosure/first]])
      LogicalProject(**=[$0], year=[$1], $f2=[>=($1, 1988)], $f3=[<=($1, 1991)])
        EnumerableTableScan(table=[[dfs, 
parquetFilterPush/transitiveClosure/second]])
{code}

> Transitive closure is not working in Drill for Join with multiple local 
> conditions
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-6465
>                 URL: https://issues.apache.org/jira/browse/DRILL-6465
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>            Reporter: Denys Ordynskiy
>            Assignee: Vitalii Diravka
>            Priority: Major
>             Fix For: 1.15.0
>
>         Attachments: drill.zip
>
>
> For several SQL operators Transitive closure is not working during Partition 
> Pruning and Filter Pushdown for the left table in Join.
>  If I use several local conditions, then Drill scans full left table in Join.
>  But if we move additional conditions to the WHERE statement, then Transitive 
> closure works fine for all joined tables
> *Query BETWEEN:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2])]{code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, 
> 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), 
> Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), 
> Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), 
> Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), 
> Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), 
> Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), 
> Partition(values:[1991, 3, 4])
> ]
> {code}
> *There is the same Transitive closure behavior for this logical operators:*
>  * NOT IN
>  * LIKE
>  * NOT LIKE
> Also Transitive closure is not working during Partition Pruning and Filter 
> Pushdown for this comparison operators:
> *Query <*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> AND t2.y < 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2])]{code}
> *Actual result:*
> {code:java}
> 00-00 Screen
> 00-01 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], 
> category0=[$5], y0=[$6], m0=[$7])
> 00-02 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], 
> category0=[$5], y0=[$6], m0=[$7])
> 00-03 HashJoin(condition=[=($1, $6)], joinType=[inner])
> 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:h_tab1), columns=[`**`], numPartitions=16, partitions= 
> [Partition(values:[1987, 5, 1]), Partition(values:[1987, 5, 2]), 
> Partition(values:[1987, 7, 1]), Partition(values:[1987, 7, 2]), 
> Partition(values:[1988, 11, 1]), Partition(values:[1988, 11, 2]), 
> Partition(values:[1988, 12, 1]), Partition(values:[1988, 12, 2]), 
> Partition(values:[1990, 4, 1]), Partition(values:[1990, 4, 2]), 
> Partition(values:[1990, 5, 1]), Partition(values:[1990, 5, 2]), 
> Partition(values:[1991, 3, 1]), Partition(values:[1991, 3, 2]), 
> Partition(values:[1991, 3, 3]), Partition(values:[1991, 3, 4])], 
> inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/1, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/2, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/3, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/4, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/5, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/6, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/7, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/8, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/9, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/10, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/11, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/12, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/13, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/14, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/15, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/16]]])
> 00-04 Project(itm0=[$0], category0=[$1], y0=[$2], m0=[$3], $f4=[$4])
> 00-06 Project(itm=[$0], category=[$1], y=[$2], m=[$3], $f4=[<($2, 1988)])
> 00-07 Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:h_tab2), columns=[`itm`, `category`, `y`, `m`], numPartitions=2, 
> partitions= [Partition(values:[1987, 5]), Partition(values:[1987, 7])], 
> inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab2/1, 
> maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab2/2]]])
> {code}
> *There is the same Transitive closure behavior for this comparison 
> operators:***
> >, <=, >=, <>
>  But if I use the WHERE clause instead of local Join condition, Transitive 
> closure works correct for all joined tables.
> *Query:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM hive.`h_tab1` t1
> JOIN hive.`h_tab2` t2
> ON t1.y=t2.y
> WHERE t2.y < 1988;
> {code}
> *Result:*
> {code:java}
> Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), 
> columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), 
> Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), 
> Partition(values:[1987, 7, 2])]
> {code}
>  
> It was examples for Drill Partition Pruning queries with hive. The same 
> issues are in the Drill Filter Pushdown queries.
> *Filter Pushdown 'BETWEEN' query example:*
> {code:java}
> EXPLAIN PLAN FOR
> SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1
> JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2
> ON t1.y=t2.y
> AND t2.y BETWEEN 1987 AND 1988;
> {code}
> *Expected result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet], 
> ReadEntryWithPath 
> ...
> ReadEntryWithPath 
> [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]], 
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
> numFiles=8, numRowGroups=8
> {code}
> *Actual result:*
> {code:java}
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]], 
> selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, 
> numFiles=1, numRowGroups=16
> {code}
>  
> Issues can be reproduced using parquet files in the attachment "drill.zip"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to