Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/9770#discussion_r45166023
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1063,6 +1065,34 @@ class Analyzer(
Project(p.output, newPlan.withNewChildren(newChild :: Nil))
}
}
+
+ /**
+ * Correctly handle null primitive inputs for UDF by adding extra [[If]]
expression to do the
+ * null check. When user defines a UDF with primitive parameters, there
is no way to tell if the
+ * primitive parameter is null or not, so here we assume the primitive
input is null-propagatable
+ * and we should return null if the input is null.
+ */
+ object HandleNullInputsForUDF extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan
resolveOperators {
+ case p if !p.resolved => p // Skip unresolved nodes.
+
+ case plan => plan transformExpressionsUp {
+
+ case udf @ ScalaUDF(func, _, inputs, _) =>
+ val parameterTypes = ScalaReflection.getParameterTypes(func)
+ assert(parameterTypes.length == inputs.length)
+
+ val inputsNullCheck = parameterTypes.zip(inputs)
+ // TODO: skip null handling for not-nullable primitive inputs
after we can completely
--- End diff --
Given the fact that most of the common code passes are not using `nullable`
(for example generated expression, join), it could have some corner cases that
the `nullable` is not generated correctly (for some data sources), I think it's
risky for 1.6.
I'd vote to do that in next release (consider `nullable` in most places)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]