HyukjinKwon commented on a change in pull request #25130:
[SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust
by making UDFs (virtually) no-op
URL: https://github.com/apache/spark/pull/25130#discussion_r303278200
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala
##########
@@ -198,11 +203,22 @@ object IntegratedUDFTestUtils extends SQLHelper {
}
/**
- * A Python UDF that takes one column and returns a string column.
- * Equivalent to `udf(lambda x: str(x), "string")`
+ * A Python UDF that takes one column, casts into string, executes the
Python native function,
+ * and casts back to the type of input column.
+ *
+ * Virtually equivalent to:
+ *
+ * {{{
+ * from pyspark.sql.functions import udf
+ *
+ * df = spark.range(3).toDF("col")
+ * python_udf = udf(lambda x: str(x), "string")
+ * casted_col = python_udf(df.col.cast("string"))
+ * casted_col.cast(df.schema["col"].dataType)
+ * }}}
*/
case class TestPythonUDF(name: String) extends TestUDF {
- private[IntegratedUDFTestUtils] lazy val udf = UserDefinedPythonFunction(
+ private[IntegratedUDFTestUtils] lazy val udf = new
UserDefinedPythonFunction(
Review comment:
BTW, I think case to case inheritance is forbidden but regular class to case
is fine. I don't think this is a good practice but at least affected scope is
only tests and it's minimised change. So I guess it's fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]