Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22063#discussion_r212393002
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -375,8 +375,11 @@ final class DataFrameStatFunctions private[sql](df:
DataFrame) {
import org.apache.spark.sql.functions.{rand, udf}
val c = Column(col)
val r = rand(seed)
- val f = udf { (stratum: Any, x: Double) =>
- x < fractions.getOrElse(stratum.asInstanceOf[T], 0.0)
+ // Hack to get around the fact that type T is Any and we can't use a
UDF whose arg
+ // is Any. Convert everything to a string rep.
--- End diff --
You are right that the change I made here is not bulletproof. Unfortunately
there are several more problems like this. Anywhere there's a UDF on `Row` it
now fails, and workarounds are ugly.
I like your idea, let me work on that. Because the alternative I've been
working on is driving me nuts.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]