allisonwang-db commented on code in PR #43611:
URL: https://github.com/apache/spark/pull/43611#discussion_r1399758278
##########
python/pyspark/worker.py:
##########
@@ -709,24 +708,28 @@ def read_udtf(pickleSer, infile, eval_type):
# with one argument containing the previous AnalyzeResult. If that fails,
then try a constructor
# with no arguments. In this way each UDTF class instance can decide if it
wants to inspect the
# AnalyzeResult.
+ udtf_init_args = inspect.getfullargspec(handler)
+
+ udtf_name = handler.__name__
Review Comment:
The tricky thing here is that users can register the UDTF using a different
name from the handler name.
```
class MyUDTF:
...
spark.udtf.register("foo", MyUDTF)
spark.sqk("SELECT * FROM foo(...)")
```
Then should the error message here use `foo` instead of `MyUDTF`?
##########
python/pyspark/sql/worker/analyze_udtf.py:
##########
@@ -116,12 +118,89 @@ def main(infile: IO, outfile: IO) -> None:
handler = read_udtf(infile)
args, kwargs = read_arguments(infile)
+ error_prefix = f"Failed to evaluate the user-defined table function
'{handler.__name__}'"
+
+ def format_error(msg: str) -> str:
+ return dedent(msg).replace("\n", " ")
+
+ # Check that the arguments provided to the UDTF call match the
expected parameters defined
+ # in the static 'analyze' method signature.
+ try:
+ inspect.signature(handler.analyze).bind(*args, **kwargs) # type:
ignore[attr-defined]
+ except TypeError as e:
+ # The UDTF call's arguments did not match the expected signature.
+ raise PySparkValueError(
+ format_error(
+ f"""
+ {error_prefix} because the function arguments did not
match the expected
+ signature of the static 'analyze' method ({e}). Please
update the query so that
+ this table function call provides arguments matching the
expected signature, or
+ else update the table function so that its static
'analyze' method accepts the
+ provided arguments, and then try the query again."""
+ )
+ )
+
+ # Invoke the UDTF's 'analyze' method.
result = handler.analyze(*args, **kwargs) # type: ignore[attr-defined]
+ # Check invariants about the 'analyze' method after running it.
if not isinstance(result, AnalyzeResult):
raise PySparkValueError(
- "Output of `analyze` static method of Python UDTFs expects "
- f"a pyspark.sql.udtf.AnalyzeResult but got: {type(result)}"
+ format_error(
+ f"""
+ {error_prefix} because the static 'analyze' method expects
a result of type
+ pyspark.sql.udtf.AnalyzeResult, but instead this method
returned a value of
+ type: {type(result)}"""
+ )
Review Comment:
Shall we consider using error classes for them?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]