cloud-fan commented on a change in pull request #24253:
[SPARK-19712][SQL][FOLLOW-UP] Don't do partial pushdown when pushing down
LeftAnti joins below Aggregate or Window operators.
URL: https://github.com/apache/spark/pull/24253#discussion_r270740695
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala
##########
@@ -105,16 +114,25 @@ object PushDownLeftSemiAntiJoin extends
Rule[LogicalPlan] with PredicateHelper {
}
// Check if the remaining predicates do not contain columns from the
right
- // hand side of the join. Since the remaining predicates will be kept
- // as a filter over window, this check is necessary after the left semi
- // or left anti join is moved below window. The reason is, for this
kind
- // of join, we only output from the left leg of the join.
+ // hand side of the join. In case of LeftSemi join, since remaining
predicates
+ // will be kept as a filter over aggregate, this check is necessary
after the left semi join
+ // is moved below aggregate. The reason is, for this kind of join, we
only output from the
+ // left leg of the join.
val rightOpColumns =
AttributeSet(stayUp.toSet).intersect(rightOp.outputSet)
if (pushDown.nonEmpty && rightOpColumns.isEmpty) {
val predicate = pushDown.reduce(And)
val newPlan = w.copy(child = Join(w.child, rightOp, joinType,
Option(predicate), hint))
- if (stayUp.isEmpty) newPlan else Filter(stayUp.reduce(And), newPlan)
+ if (stayUp.isEmpty) {
+ newPlan
+ } else {
+ // In case of left anti join, the join is pushed down when the
entire join condition
+ // is eligible to be pushed down to preserve the semantics of left
anti join.
+ joinType match {
+ case LeftSemi => Filter(stayUp.reduce(And), newPlan)
+ case _ => join
Review comment:
how about
```
joinType match {
// In case of left-semi join, ...
case LeftSemi => Filter(stayUp.reduce(And), newPlan)
// In case of left-anti join, ...
case _ => join
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]