Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16346 )

Change subject: IMPALA-10064: Support constant propagation for eligible range 
predicates
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16346/8/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java:

http://gerrit.cloudera.org:8080/#/c/16346/8/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@78
PS8, Line 78: isRangeOp && !(constant instanceof DateLiteral ||
            :             constant instanceof TimestampLiteral))
> I am not aware of other products doing the same :-).
Yes, even without the CAST, in the simple cases e.g  a1 = b1 AND b1 > 10, where 
these are numeric columns it should be fine. But won't be ok for something like 
 'a1 = 1/b1  AND b1 > 10'.   So we would need to do additional checks for the 
expression.
I haven't seen a good use case for the range propagation for other types 
besides date, timestamp.


http://gerrit.cloudera.org:8080/#/c/16346/8/fe/src/main/java/org/apache/impala/analysis/ConstantPredicateHandler.java@93
PS8, Line 93:  Propagate equality constant predicates to other conjuncts.  
Propagate
            :    * range constant predicates to conjuncts involving date and 
timestamp
            :    * columns.
> Thanks for the background info. Appreciate it. The Trafodion has similar co
I tried an example with a join but given the sequence in which the 
optimizations are done (join transitivity is done earlier and then constant 
propagation), it doesn't do the propagation on the other side of the join.  See 
below. I think it is not straightforward.  But note that due to Impala's 
runtime filter propagation, the runtime filter eventually gets applied on the 
other side of the join.

 explain select t1.* from functional_parquet.alltypes_date_partition t1, 
functional_parquet.alltypes_date_partition t2  where t1.date_col = t2.date_col 
and t2.date_col = cast(t2.timestamp_col as date) and t2.timestamp_col <= 
'2010-10-01';

+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=3.02MB Threads=5                    
     |
| Per-Host Resource Estimates: Memory=152MB                                     
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| functional_parquet.alltypes_date_partition                                    
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 04:EXCHANGE [UNPARTITIONED]                                                   
     |
| |                                                                             
     |
| 02:HASH JOIN [INNER JOIN, BROADCAST]                                          
     |
| |  hash predicates: t1.date_col = t2.date_col                                 
     |
| |  runtime filters: RF000 <- t2.date_col                                      
     |
| |  row-size=84B cardinality=180.66K                                           
     |
| |                                                                             
     |
| |--03:EXCHANGE [BROADCAST]                                                    
     |
| |  |                                                                          
     |
| |  01:SCAN HDFS [functional_parquet.alltypes_date_partition t2]               
     |
| |     partition predicates: t2.date_col <= DATE '2010-10-01'                  
     |
| |     HDFS partitions=639/730 files=639 size=1.94MB                           
     |
| |     predicates: t2.timestamp_col <= TIMESTAMP '2010-10-01 00:00:00'         
     |
| |     row-size=20B cardinality=15.81K                                         
     |
| |                                                                             
     |
| 00:SCAN HDFS [functional_parquet.alltypes_date_partition t1]                  
     |
|    HDFS partitions=730/730 files=730 size=2.22MB                              
     |
|    runtime filters: RF000 -> t1.date_col                                      
     |
|    row-size=64B cardinality=180.66K                                           
     |
+------------------------------------------------------------------------------------+


Also, just to be clear if you have range predicate on one side of the join : 
t1.a1 = t2.a2 AND t2.a2 > 10,  this already will be propagated to the other 
side.


http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

http://gerrit.cloudera.org:8080/#/c/16346/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test@461
PS7, Line 461: timestamp_col <= '2010-12-01';
> It should work but will add a test for it.
Added a test for the swapped order.



--
To view, visit http://gerrit.cloudera.org:8080/16346
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Gerrit-Change-Number: 16346
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 28 Aug 2020 22:15:58 +0000
Gerrit-HasComments: Yes

Reply via email to