HyukjinKwon commented on a change in pull request #25130: 
[SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust 
by making UDFs (virtually) no-op
URL: https://github.com/apache/spark/pull/25130#discussion_r303278200
 
 

 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala
 ##########
 @@ -198,11 +203,22 @@ object IntegratedUDFTestUtils extends SQLHelper {
   }
 
   /**
-   * A Python UDF that takes one column and returns a string column.
-   * Equivalent to `udf(lambda x: str(x), "string")`
+   * A Python UDF that takes one column, casts into string, executes the 
Python native function,
+   * and casts back to the type of input column.
+   *
+   * Virtually equivalent to:
+   *
+   * {{{
+   *   from pyspark.sql.functions import udf
+   *
+   *   df = spark.range(3).toDF("col")
+   *   python_udf = udf(lambda x: str(x), "string")
+   *   casted_col = python_udf(df.col.cast("string"))
+   *   casted_col.cast(df.schema["col"].dataType)
+   * }}}
    */
   case class TestPythonUDF(name: String) extends TestUDF {
-    private[IntegratedUDFTestUtils] lazy val udf = UserDefinedPythonFunction(
+    private[IntegratedUDFTestUtils] lazy val udf = new 
UserDefinedPythonFunction(
 
 Review comment:
   BTW, I think case to case inheritance is forbidden but regular class to case 
is fine. I don't think this is a good practice but at least affected scope is 
only tests and it's minimised change. So I guess it's fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to