Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/11809#issuecomment-199413031
This fix seems okay, but I feel like we are just adding one-offs instead of
taking a step back and thinking about how to generally infer null-intollerance
from an expression. For example, after this PR we still aren't doing great in
this case:
```scala
scala> val df = Seq((1,2,3)).toDF("a", "b", "c")
scala> df.where("a + b = c").queryExecution.analyzed.constraints
res2: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set(((a#4 +
b#5) = c#6), isnotnull((a#4 + b#5)), isnotnull(c#6))
```
Given that it seems most useful to infer `IsNotNull` for `Attributes`, it
seems that we just fix `constructIsNotNullConstraints` to recurse through any
Expression that is null intolerant (i.e. any null input results in null
output). Then it can just return a `Seq[Attribute]`.
We could even consider making `NullIntolerant` into a trait so we could
easily mark all the valid expressions as such. Which would make the logic in
`QueryPlan`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]