[GitHub] [spark] koertkuipers commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

GitBox Sun, 10 May 2020 12:17:17 -0700


koertkuipers commented on a change in pull request #27937:
URL: https://github.com/apache/spark/pull/27937#discussion_r422682853




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
##########
@@ -93,7 +93,7 @@ sealed abstract class UserDefinedFunction {
 private[spark] case class SparkUserDefinedFunction(
     f: AnyRef,
     dataType: DataType,
-    inputSchemas: Seq[Option[ScalaReflection.Schema]],

Review comment:
       while its nice to see case classes supported, i was surprised to find 
that another feature of Encoder is not supported: Option to indicate a udf 
argument could be null. for example one could define a udf for 
`(Option[String], Int) => String` to explicitly handle the case where the first 
input argument is null (or should i say its now None).
   a better example is `(String, Option[Int]) => String` to work around the udf 
behavior that if primitivess are null the udf is not called and the output is 
also null.
   
   especially when designing generic systems that use udfs this becomes really 
important. say you write something that does transform  `(X, Y) => X` for 
generic types X and Y (with typetags). now say Y could be null (the udf could 
be called after a left join for example where Y could be joined in). the 
behavior would now change based on the concrete type of Y... for Strings nulls 
would get passed in to the udf while for Ints the udf would be skipped. i dont 
think anyone would want to deal with that. so instead you wants to write a udf 
for  `(X, Option[Y]) => X` in this case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] koertkuipers commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

Reply via email to