davidvrba commented on a change in pull request #27231: [SPARK-28478][SQL] 
Remove redundant null checks
URL: https://github.com/apache/spark/pull/27231#discussion_r376953374
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ##########
 @@ -434,6 +434,27 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
     case _ => false
   }
 
+  /**
+   * Condition for redundant null check based on intolerant expressions.
+   * @param ifNullExpr expression that takes place if checkedExpr is null
+   * @param ifNotNullExpr expression that takes place if checkedExpr is not 
null
+   * @param checkedExpr expression that is checked for null value
+   */
+  private def isRedundantNullCheck(
+      ifNullExpr: Expression,
+      ifNotNullExpr: Expression,
+      checkedExpr: Expression): Boolean = {
+    val isNullIntolerant = ifNotNullExpr.find { x =>
+      !x.isInstanceOf[NullIntolerant] && x.find(e => 
e.semanticEquals(checkedExpr)).nonEmpty
 
 Review comment:
   Well, the think is that if we use the simple version with 
`FilterExec.isNullIntolerant(ifNutNullExpr)` we will loose (because of the 
recursive check) all expressions that contain literals (because literals are 
null-tolerant), so for example expressions like this `substring(title#5, 0, 3)` 
will not be included in the optimization (which the jira was targeted for in 
the first place). So I suggest one of these 2 options:
   
   1. Use the complex version of the code and thus include more expressions in 
the optimization
   2. Have the code more simple and use the original version before the 
generalization step, i.e. 
   ```
   private def isRedundantNullCheck(
       ifNullExpr: Expression,
       ifNotNullExpr: Expression,
       checkedExpr: Expression): Boolean = {
     ifNotNullExpr.isInstanceOf[NullIntolerant] && {
       (ifNullExpr == checkedExpr || ifNullExpr == Literal.create(null, 
checkedExpr.dataType)) &&
         ifNotNullExpr.children.contains(checkedExpr)
     }
   }
   ```
   where `checkedExpr` must be direct child and thus we don't have to check the 
whole subtree for null-intolerance (so expressions that have Literals in the 
subtree are still included).
   I am fine with either of these 2 options. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to