[GitHub] [spark] ueshin commented on a diff in pull request #42422: [SPARK-44749][SQL][PYTHON] Support named arguments in Python UDTF

via GitHub Wed, 09 Aug 2023 17:55:24 -0700


ueshin commented on code in PR #42422:
URL: https://github.com/apache/spark/pull/42422#discussion_r1289405029



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -1795,6 +1796,65 @@ def terminate(self):
                     assertSchemaEqual(df.schema, StructType().add("col1", 
IntegerType()))
                     assertDataFrameEqual(df, [Row(col1=10), Row(col1=100)])
 
+    def test_udtf_with_named_arguments(self):
+        @udtf(returnType="a: int")
+        class TestUDTF:
+            def eval(self, a, b):
+                yield a,
+
+        self.spark.udtf.register("test_udtf", TestUDTF)
+
+        for i, df in enumerate(
+            [
+                self.spark.sql("SELECT * FROM test_udtf(a=>10, b=>'x')"),
+                self.spark.sql("SELECT * FROM test_udtf(b=>'x', a=>10)"),
+            ]
+        ):
+            with self.subTest(query_no=i):
+                assertDataFrameEqual(df, [Row(a=10)])
+
+    def test_udtf_with_kwargs(self):
+        @udtf(returnType="a: int, b: string")
+        class TestUDTF:
+            def eval(self, **kwargs):
+                yield kwargs["a"], kwargs["b"]
+
+        self.spark.udtf.register("test_udtf", TestUDTF)
+
+        for i, df in enumerate(
+            [
+                self.spark.sql("SELECT * FROM test_udtf(a=>10, b=>'x')"),
+                self.spark.sql("SELECT * FROM test_udtf(b=>'x', a=>10)"),

Review Comment:
   That's a good point. So far just rely on the Python's error.
   @dtenedor What's the error message like when applying name arguments with 
the above cases to other functions? Are there anything we can follow here?



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -1795,6 +1796,65 @@ def terminate(self):
                     assertSchemaEqual(df.schema, StructType().add("col1", 
IntegerType()))
                     assertDataFrameEqual(df, [Row(col1=10), Row(col1=100)])
 
+    def test_udtf_with_named_arguments(self):
+        @udtf(returnType="a: int")
+        class TestUDTF:
+            def eval(self, a, b):
+                yield a,
+
+        self.spark.udtf.register("test_udtf", TestUDTF)
+
+        for i, df in enumerate(
+            [
+                self.spark.sql("SELECT * FROM test_udtf(a=>10, b=>'x')"),
+                self.spark.sql("SELECT * FROM test_udtf(b=>'x', a=>10)"),
+            ]
+        ):
+            with self.subTest(query_no=i):
+                assertDataFrameEqual(df, [Row(a=10)])
+
+    def test_udtf_with_kwargs(self):
+        @udtf(returnType="a: int, b: string")
+        class TestUDTF:
+            def eval(self, **kwargs):
+                yield kwargs["a"], kwargs["b"]
+
+        self.spark.udtf.register("test_udtf", TestUDTF)
+
+        for i, df in enumerate(
+            [
+                self.spark.sql("SELECT * FROM test_udtf(a=>10, b=>'x')"),
+                self.spark.sql("SELECT * FROM test_udtf(b=>'x', a=>10)"),

Review Comment:
   That's a good point. So far just rely on the Python's error.
   
   @dtenedor What's the error message like when applying name arguments with 
the above cases to other functions? Are there anything we can follow here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a diff in pull request #42422: [SPARK-44749][SQL][PYTHON] Support named arguments in Python UDTF

Reply via email to