agubichev commented on PR #48145:
URL: https://github.com/apache/spark/pull/48145#issuecomment-2361679216

   > Do all the existing optimizer rules work fine with this single join? I 
understand that we need to implement the single-match check in all the physical 
join nodes, but semantic wise, is there anything we need to take care?
   
   @cloud-fan 
   
   I've traced all the usages of LeftOuter in the catalyst rules (see the full 
list below). In general, the rules act on the basis of "allow-list", so if the 
join type is not explicitly matched by the rule, it is not applied. As 
LeftOuter is a "close relative" to LeftSingle (in fact, at HEAD we are using 
LeftOuter in place of LeftSingle), it is enough to check the rules that already 
reference LeftOuter explicitly. Since LeftOuter joins are already super 
restrictive as to what kind of optimizations can be applied to them (and 
frequently LeftOuter joins restrict optimizations in the plan nodes around them 
too), I am not aware of many jointype-agnostic rules. The ones that I do know 
of, like ReplaceNullWithFalseInPredicate, apply to both LeftOuter and 
LeftSingle without change. 
   
   These rules have been updated for LeftSingle join:
   
   - EliminateOuterJoin -- should not apply to LeftSingle, updated
   - PushPredicateThroughJoin -- not all cases should apply to LeftSingle, 
updated
   - FoldablePropagation
   
   The following rules are only matching LeftOuter join for now, therefore 
skipping LeftSingle join unchanged. Semantics-wise, it is ok to skip every 
single one of these rules for the LeftSingle join. Further analysis is needed 
on whether we can/should enable them for LeftSingle joins:
   
   - InferFiltersFromConstraints
   - LimitPushDown
   - PropagateEmptyRelation
   - PushLeftSemiLeftAntiThroughJoin
   - PushExtraPredicateThroughJoin
   
   There are couple of rules that apply to LeftOuter, but do not make sense to 
LeftSingle. In both cases LeftOuter is explicitly matched, so they will skip 
LeftSingle as they should:
   
   - CheckCartesianProducts
   - RewriteAsOfJoin
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to