Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22732#discussion_r225724505
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
    @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] (
           f,
           dataType,
           exprs.map(_.expr),
    +      nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)),
    --- End diff --
    
    Looks like the only place where we'd get a not-specified `inputSchemas` is 
when `ScalaReflection.schemaFor` doesn't recognize a type and throws an 
exception 
(https://github.com/apache/spark/blob/1fd59c129a7aa16f9960b109128b166952992f32/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L786).
 The caller seems to be doing a bad job by calling it this way, for example:
    ```
        val inputSchemas = Try(ScalaReflection.schemaFor(typeTag[A1]) :: 
ScalaReflection.schemaFor(typeTag[A2]) :: Nil).toOption
        val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas)
    ```
    It would mean if the type of only one of the parameters is unrecognizable 
by `ScalaReflection`, we'd end up having the entire `Seq` as `None`. I think 
it's fine not to check null for user-defined types that we don't know, coz they 
can't be primitive types anyway, but I do think we should make the type 
inference of each parameter independent so we do handle the nulls that need to 
be taken care of.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to