sunchao commented on a change in pull request #33930:
URL: https://github.com/apache/spark/pull/33930#discussion_r707916583



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is 
NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to 
resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then 
IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other 
expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove 
e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&

Review comment:
       I wonder if we should check whether `e` is deterministic - are we 
allowed to skip evaluating it if it is not?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -797,6 +834,19 @@ object NullPropagation extends Rule[LogicalPlan] {
       // a null literal.
       case e: NullIntolerant if e.children.exists(isNullLiteral) =>
         Literal.create(null, e.dataType)
+
+      // [SPARK-36665] Unwrap inside of IsNull/IsNotNull if the inside is 
NullIntolerant
+      // E.g. IsNull(Not(null)) == IsNull(null)
+      // Cannot apply to `ExtractValue` as the query planner uses the trait to 
resolve the columns.
+      // E.g. the planner may resolve column `a` to `a#123`, then 
IsNull(a#123) cannot be optimized
+      // UnaryExpression only for now as applying this optimization to other 
expressions is too
+      // disruptive for some tests (e.g. [SPARK-32290].) TODO remove 
e.isInstanceOf[UnaryExpression]
+      case IsNull(e: NullIntolerant) if e.isInstanceOf[UnaryExpression] &&
+        !e.isInstanceOf[ExtractValue] && e.children.nonEmpty =>
+        e.children.map(IsNull(_): Expression).reduceLeft(Or)

Review comment:
       can we simplify this? shouldn't `UnaryExpression` only have one child?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -441,6 +457,27 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
 
       case Not(IsNull(e)) => IsNotNull(e)
       case Not(IsNotNull(e)) => IsNull(e)
+
+      // Move `Not` from one side of `EqualTo`/`EqualNullSafe` to the other 
side if it's beneficial.
+      // E.g. `EqualTo(Not(a), b)` where `b = Not(c)`, it will become
+      // `EqualTo(a, Not(b))` => `EqualTo(a, Not(Not(c)))` => `EqualTo(a, c)`
+      // In addition, `if canSimplifyNot(b)` checks if the optimization can 
converge
+      // that avoids the situation two conditions are returning to each other.
+      case EqualTo(Not(a), b) if !canSimplifyNot(a) && canSimplifyNot(b) => 
EqualTo(a, Not(b))

Review comment:
       I wonder how much extra costs will this incur. Since now this transforms 
the original expression to `EqualTo(a, Not(b))`, we'll need to apply 
`BooleanSimplification` again to simplify the `Not(b)` (since it is a child of 
`EqualTo` and the rule runs bottom-up. Does this mean we'd need to run more 
iterations on the `operatorOptimizationBatch` (see `Optimizer`) until it can 
reach a fix point?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to