allisonwang-db commented on code in PR #43611:
URL: https://github.com/apache/spark/pull/43611#discussion_r1399758278


##########
python/pyspark/worker.py:
##########
@@ -709,24 +708,28 @@ def read_udtf(pickleSer, infile, eval_type):
     # with one argument containing the previous AnalyzeResult. If that fails, 
then try a constructor
     # with no arguments. In this way each UDTF class instance can decide if it 
wants to inspect the
     # AnalyzeResult.
+    udtf_init_args = inspect.getfullargspec(handler)
+
+    udtf_name = handler.__name__

Review Comment:
   The tricky thing here is that users can register the UDTF using a different 
name from the handler name. 
   ```
   class MyUDTF:
       ...
       
   spark.udtf.register("foo", MyUDTF)
   
   spark.sqk("SELECT * FROM foo(...)")
   ```
   Then should the error message here use `foo` instead of `MyUDTF`?



##########
python/pyspark/sql/worker/analyze_udtf.py:
##########
@@ -116,12 +118,89 @@ def main(infile: IO, outfile: IO) -> None:
         handler = read_udtf(infile)
         args, kwargs = read_arguments(infile)
 
+        error_prefix = f"Failed to evaluate the user-defined table function 
'{handler.__name__}'"
+
+        def format_error(msg: str) -> str:
+            return dedent(msg).replace("\n", " ")
+
+        # Check that the arguments provided to the UDTF call match the 
expected parameters defined
+        # in the static 'analyze' method signature.
+        try:
+            inspect.signature(handler.analyze).bind(*args, **kwargs)  # type: 
ignore[attr-defined]
+        except TypeError as e:
+            # The UDTF call's arguments did not match the expected signature.
+            raise PySparkValueError(
+                format_error(
+                    f"""
+                    {error_prefix} because the function arguments did not 
match the expected
+                    signature of the static 'analyze' method ({e}). Please 
update the query so that
+                    this table function call provides arguments matching the 
expected signature, or
+                    else update the table function so that its static 
'analyze' method accepts the
+                    provided arguments, and then try the query again."""
+                )
+            )
+
+        # Invoke the UDTF's 'analyze' method.
         result = handler.analyze(*args, **kwargs)  # type: ignore[attr-defined]
 
+        # Check invariants about the 'analyze' method after running it.
         if not isinstance(result, AnalyzeResult):
             raise PySparkValueError(
-                "Output of `analyze` static method of Python UDTFs expects "
-                f"a pyspark.sql.udtf.AnalyzeResult but got: {type(result)}"
+                format_error(
+                    f"""
+                    {error_prefix} because the static 'analyze' method expects 
a result of type
+                    pyspark.sql.udtf.AnalyzeResult, but instead this method 
returned a value of
+                    type: {type(result)}"""
+                )

Review Comment:
   Shall we consider using error classes for them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to