[GitHub] [spark] dtenedor commented on a diff in pull request #42617: [SPARK-44918][SQL][PYTHON] Support named arguments in scalar Python/Pandas UDFs

via GitHub Tue, 22 Aug 2023 17:35:16 -0700


dtenedor commented on code in PR #42617:
URL: https://github.com/apache/spark/pull/42617#discussion_r1302322467



##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1467,6 +1467,96 @@ def udf(x):
         finally:
             shutil.rmtree(path)
 
+    def test_named_arguments(self):
+        @pandas_udf("int")
+        def test_udf(a, b):
+            return a + 10 * b
+
+        self.spark.udf.register("test_udf", test_udf)
+
+        for i, df in enumerate(
+            [
+                self.spark.range(2).select(test_udf(a=col("id"), b=col("id") * 
10)),
+                self.spark.range(2).select(test_udf(b=col("id") * 10, 
a=col("id"))),
+                self.spark.sql("SELECT test_udf(a => id, b => id * 10) FROM 
range(2)"),

Review Comment:
   can we also have a test case with a positional argument first and then a 
named argument after? We can have a positive version of this test case where 
the positional argument maps to the first argument of the UDF and the named 
argument refers to the second, and another negative version where the named 
argument uses the same name as the first (positional) argument? Same for 
`test_udf.py`?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##########
@@ -50,6 +50,19 @@ case class UserDefinedPythonFunction(
     udfDeterministic: Boolean) {
 
   def builder(e: Seq[Expression]): Expression = {
+    if (pythonEvalType == PythonEvalType.SQL_BATCHED_UDF
+        || pythonEvalType ==PythonEvalType.SQL_ARROW_BATCHED_UDF
+        || pythonEvalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF) {
+      /*
+       * Check if the named arguments:
+       * - don't have duplicated names
+       * - don't contain positional arguments

Review Comment:
   ```suggestion
          * - don't contain positional arguments after named arguments
          * - all map to valid argument names from the function declaration
   ```



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala:
##########
@@ -146,4 +157,30 @@ object PythonUDFRunner {
       }
     }
   }
+
+  def writeUDFs(
+    dataOut: DataOutputStream,

Review Comment:
   please indent each of these function arguments by 2 more spaces (for a total 
of +4 spaces each) per the style guide?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor commented on a diff in pull request #42617: [SPARK-44918][SQL][PYTHON] Support named arguments in scalar Python/Pandas UDFs

Reply via email to