Github user maryannxue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22259#discussion_r224295469
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
---
@@ -47,7 +48,8 @@ case class ScalaUDF(
inputTypes: Seq[DataType] = Nil,
udfName: Option[String] = None,
nullable: Boolean = true,
- udfDeterministic: Boolean = true)
+ udfDeterministic: Boolean = true,
+ nullableTypes: Seq[Boolean] = Nil)
--- End diff --
Yes, the test should not pass after removing `isInstanceOf[KnownNotNull]`
condition from `needsNullCheck` test
(https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2160).
The idea was to add a `KnownNotNull` node on top of the original node to mark
it as null-checked, so the rule won't add redundant null checks even if it is
accidentally applied again. I'm not sure about the exact reason why you removed
`isInstanceOf[KnownNotNull]` condition in this PR, but I think it should be
left there alongside the new nullable type check.
After adding the `nullableTypes` parameter in the test, the issue can be
reproduced:
```
test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
val a = testRelation.output(0)
val func = (x: Int, y: Int) => x + y
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes =
false :: false :: Nil)
val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes
= false :: false :: Nil)
val plan = Project(Alias(udf2, "")() :: Nil, testRelation)
comparePlans(plan.analyze, plan.analyze.analyze)
}
```
BTW, I'm just curious: It looks like `nullableTypes` indicates something
opposite to "nullable" used in schema. I would assume when `nullableTypes` is
`Seq(false)`, it means this type is not nullable and we need not add the null
check, vice versa. Did I miss something here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]