koertkuipers commented on a change in pull request #27937:
URL: https://github.com/apache/spark/pull/27937#discussion_r422682853
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
##########
@@ -93,7 +93,7 @@ sealed abstract class UserDefinedFunction {
private[spark] case class SparkUserDefinedFunction(
f: AnyRef,
dataType: DataType,
- inputSchemas: Seq[Option[ScalaReflection.Schema]],
Review comment:
while its nice to see case classes supported, i was surprised to find
that another feature of Encoder is not supported: Option to indicate a udf
argument could be null. for example one could define a udf for
`(Option[String], Int) => String` to explicitly handle the case where the first
input argument is null (or should i say its now None).
a better example is `(String, Option[Int]) => String` to work around the udf
behavior that if primitivess are null the udf is not called and the output is
also null.
especially when designing generic systems that use udfs this becomes really
important. say you write something that does transform `(X, Y) => X` for
generic types X and Y (with typetags). now say Y could be null (the udf could
be called after a left join for example where Y could be joined in). the
behavior would now change based on the concrete type of Y... for Strings nulls
would get passed in to the udf while for Ints the udf would be skipped. i dont
think anyone would want to deal with that. so instead you wants to write a udf
for `(X, Option[Y]) => X` in this case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]