Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16055
The above fix does not cover all the cases. Found the root cause.
The `constraints` of an operator is the expressions that evaluate to `true`
for all the rows produced. That means, the expression result should be neither
`false` nor `unknown` (NULL). Thus, we can conclude that `IsNotNull` on all the
constraints, which are generated by its own predicates or propagated from the
children. The constraint can be a complex expression. For better usage of these
constraints, we try to push down `IsNotNull` to the lowest-level expressions.
`IsNotNull` can be pushed through an expression when it is null intolerant.
(When the input is NULL, the null-intolerant expression always evaluates to
null.)
Below is the code we have for `IsNotNull` pushdown.
```Scala
private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] =
expr match {
case a: Attribute => Seq(a)
case _: NullIntolerant | IsNotNull(_: NullIntolerant) =>
expr.children.flatMap(scanNullIntolerantExpr)
case _ => Seq.empty[Attribute]
}
```
`IsNotNull` is not null-intolerant. It converts `null` to `false`. If there
does not exist any `Not`-like expression, it works; otherwise, it could
generate a wrong result. The above function needs to be corrected to
```Scala
private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] =
expr match {
case a: Attribute => Seq(a)
case _: NullIntolerant => expr.children.flatMap(scanNullIntolerantExpr)
case _ => Seq.empty[Attribute]
}
```
This fixes the problem, but we need a smarter fix for avoiding regressions.
Now, working on a better fix.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]