Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/9770#discussion_r45238084
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1063,6 +1065,34 @@ class Analyzer(
Project(p.output, newPlan.withNewChildren(newChild :: Nil))
}
}
+
+ /**
+ * Correctly handle null primitive inputs for UDF by adding extra [[If]]
expression to do the
+ * null check. When user defines a UDF with primitive parameters, there
is no way to tell if the
+ * primitive parameter is null or not, so here we assume the primitive
input is null-propagatable
+ * and we should return null if the input is null.
+ */
+ object HandleNullInputsForUDF extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan
resolveOperators {
+ case p if !p.resolved => p // Skip unresolved nodes.
+
+ case plan => plan transformExpressionsUp {
+
+ case udf @ ScalaUDF(func, _, inputs, _) =>
+ val parameterTypes = ScalaReflection.getParameterTypes(func)
+ assert(parameterTypes.length == inputs.length)
+
+ val inputsNullCheck = parameterTypes.zip(inputs)
+ // TODO: skip null handling for not-nullable primitive inputs
after we can completely
--- End diff --
To play devils advocate, I think when the info is wrong is usually likely
to be too conservative (allow nulls when there are none). Also, I'm not really
sure what is going to change between now and 1.7 (i.e. if there are bugs we
need to find them eventually).
That said, I'm fine waiting, but we should use this info eventually given
the amount of effort we spend passing it around.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]