davidvrba commented on a change in pull request #27231: [SPARK-28478][SQL]
Remove redundant null checks
URL: https://github.com/apache/spark/pull/27231#discussion_r376953374
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -434,6 +434,27 @@ object SimplifyConditionals extends Rule[LogicalPlan]
with PredicateHelper {
case _ => false
}
+ /**
+ * Condition for redundant null check based on intolerant expressions.
+ * @param ifNullExpr expression that takes place if checkedExpr is null
+ * @param ifNotNullExpr expression that takes place if checkedExpr is not
null
+ * @param checkedExpr expression that is checked for null value
+ */
+ private def isRedundantNullCheck(
+ ifNullExpr: Expression,
+ ifNotNullExpr: Expression,
+ checkedExpr: Expression): Boolean = {
+ val isNullIntolerant = ifNotNullExpr.find { x =>
+ !x.isInstanceOf[NullIntolerant] && x.find(e =>
e.semanticEquals(checkedExpr)).nonEmpty
Review comment:
Well, the think is that if we use the simple version with
`FilterExec.isNullIntolerant(ifNutNullExpr)` we will loose (because of the
recursive check) all expressions that contain literals (because literals are
null-tolerant), so for example expressions like this `substring(title#5, 0, 3)`
will not be included in the optimization (which the jira was targeted for in
the first place). So I suggest one of these 2 options:
1. Use the complex version of the code and thus include more expressions in
the optimization
2. Have the code more simple and use the original version before the
generalization step, i.e.
```
private def isRedundantNullCheck(
ifNullExpr: Expression,
ifNotNullExpr: Expression,
checkedExpr: Expression): Boolean = {
ifNotNullExpr.isInstanceOf[NullIntolerant] && {
(ifNullExpr == checkedExpr || ifNullExpr == Literal.create(null,
checkedExpr.dataType)) &&
ifNotNullExpr.children.contains(checkedExpr)
}
}
```
where `checkedExpr` must be direct child and thus we don't have to check the
whole subtree for null-intolerance (so expressions that have Literals in the
subtree are still included).
I am fine with either of these 2 options. What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]