dbtsai commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r706562712
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +456,25 @@ object BooleanSimplification extends Rule[LogicalPlan]
with PredicateHelper {
case Not(IsNull(e)) => IsNotNull(e)
case Not(IsNotNull(e)) => IsNull(e)
+
+ // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other
side if it's beneficial.
+ // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+ // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+ // In addition, `if canSimplifyNot(b)` checks if the optimization can
converge
+ // that avoids the situation two conditions are returning to each other.
+ case EqualTo(Not(a), b) if canSimplifyNot(b) => EqualTo(a, Not(b))
+ case EqualTo(a, Not(b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
+ case EqualNullSafe(Not(a), b) if canSimplifyNot(b) => EqualNullSafe(a,
Not(b))
+ case EqualNullSafe(a, Not(b)) if canSimplifyNot(a) =>
EqualNullSafe(Not(a), b)
+
+ // Push `Not` to one side of `EqualTo`/`EqualNullSafe` if it's
beneficial.
+ // E.g. Not(EqualTo(x, false)) => EqualTo(x, true)
+ case Not(EqualTo(a, b)) if canSimplifyNot(b) => EqualTo(a, Not(b))
+ case Not(EqualTo(a, b)) if canSimplifyNot(a) => EqualTo(Not(a), b)
Review comment:
Let's say if both `a` and `b` can be simplified, does it mean we always
try to simplify `Not(b)` first? It becomes ordering dependent, and how do we
know which one to simplify first is better?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]