liupc opened a new pull request #25020: [SPARK-28220]Fix foldable join 
condition not pushed down when parent filter is wholly pushed down
URL: https://github.com/apache/spark/pull/25020
 
 
   ## What changes were proposed in this pull request?
   
   Optimizer rule `PushPredicateThroughJoin` will try to push parent filter 
down though the join, however, when the parent filter is wholly pushed down 
through the join, the join will become the top node, and then the `transform` 
method will skip the join to apply the rule. 
   
   Suppose we have two tables: table1 and table2:
   
   ```
   table1: (a: string, b: string, c: string)
   
   table2: (d: string)
   ```
   
   sql as:
   
   `select * from table1 left join (select d, 'w1' as r from table2) on a = d 
and r = 'w2' where b = 2`
    
   
   let's focus on the following optimizer rules:
   
   ```
   PushPredicateThroughJoin
   
   FodablePropagation
   
   BooleanSimplification
   
   PruneFilters
   ```
   
    
   
   In the above case, on the first iteration of these rules:
   
   PushPredicateThroughJoin -> 
   `
   select * from table1 where b=2 left join (select d, 'w1' as r from table2) 
on a = d and r = 'w2'`
   FodablePropagation ->
   
   `select * from table1 where b=2 left join (select d, 'w1' as r from table2) 
on a = d and 'w1' = 'w2'`
   BooleanSimplification ->
   
   `select * from table1 where b=2 left join (select d, 'w1' as r from table2) 
on false`
   PruneFilters -> No effective
   
    
   
   After several iteration of these rules, the join condition will still never 
be pushed to the 
   
   right hand of the left join. thus, in some case(e.g. Large right table), the 
`BroadcastNestedLoopJoin` may be slow or oom.
   
   This PR will fix this problem!
   
   ## How was this patch tested?
   
   exist UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to