Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225724505 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- Looks like the only place where we'd get a not-specified `inputSchemas` is when `ScalaReflection.schemaFor` doesn't recognize a type and throws an exception (https://github.com/apache/spark/blob/1fd59c129a7aa16f9960b109128b166952992f32/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L786). The caller seems to be doing a bad job by calling it this way, for example: ``` val inputSchemas = Try(ScalaReflection.schemaFor(typeTag[A1]) :: ScalaReflection.schemaFor(typeTag[A2]) :: Nil).toOption val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas) ``` It would mean if the type of only one of the parameters is unrecognizable by `ScalaReflection`, we'd end up having the entire `Seq` as `None`. I think it's fine not to check null for user-defined types that we don't know, coz they can't be primitive types anyway, but I do think we should make the type inference of each parameter independent so we do handle the nulls that need to be taken care of.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org