Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212354787
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -375,8 +375,11 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
         import org.apache.spark.sql.functions.{rand, udf}
         val c = Column(col)
         val r = rand(seed)
    -    val f = udf { (stratum: Any, x: Double) =>
    -      x < fractions.getOrElse(stratum.asInstanceOf[T], 0.0)
    +    // Hack to get around the fact that type T is Any and we can't use a 
UDF whose arg
    +    // is Any. Convert everything to a string rep.
    --- End diff --
    
    I feel udf using `Any` as input type is rare but a valid use case. 
Sometimes they just want to accept any input type.
    
    How about we create a few `udfInternal` methods that takes `Any` as inputs? 
e.g.
    ```
    def udfInternal[R: TypeTag](f: Function1[Any, R]): UserDefinedFunction = {
      val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[R]
      val udf = UserDefinedFunction(f, dataType, Nil)
      if (nullable) udf else udf.asNonNullable()
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to