[GitHub] [spark] ueshin commented on a diff in pull request #42157: [SPARK-43968][PYTHON] Improve error messages for Python UDTFs with wrong number of outputs

via GitHub Thu, 27 Jul 2023 13:02:32 -0700


ueshin commented on code in PR #42157:
URL: https://github.com/apache/spark/pull/42157#discussion_r1276759271



##########
python/pyspark/worker.py:
##########
@@ -654,6 +657,19 @@ def wrap_udtf(f, return_type):
             assert return_type.needConversion()
             toInternal = return_type.toInternal
 
+            def verify_and_convert_result(result):
+                # TODO(SPARK-44005): support returning non-tuple values
+                if result is not None and hasattr(result, "__len__"):
+                    if len(result) != len(return_type):

Review Comment:
   nit: ditto.



##########
python/pyspark/worker.py:
##########
@@ -604,8 +604,11 @@ def verify_result(result):
                         },
                     )
 
-                # Check when the dataframe has both rows and columns.
-                if not result.empty or len(result.columns) != 0:
+                # Validate the output schema when the result dataframe has 
either output
+                # rows or columns. Note that we avoid using `df.empty` here 
because the
+                # result dataframe may contain an empty row. For example, when 
a UDTF is
+                # defined as follows: def eval(self): yield tuple().
+                if len(result) > 0 or len(result.columns) > 0:
                     if len(result.columns) != len(return_type):

Review Comment:
   nit: We might want to have an variable for `len(return_type)` outside of 
`verify_result` to avoid any potential overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a diff in pull request #42157: [SPARK-43968][PYTHON] Improve error messages for Python UDTFs with wrong number of outputs

Reply via email to