xinrong-meng opened a new pull request, #49299:
URL: https://github.com/apache/spark/pull/49299
### What changes were proposed in this pull request?
Fix check for ‘terminate’ method existence in UDTF evaluation
### Why are the changes needed?
To ensure that UDTFs without a terminate method can still be used with
partitioning without causing an AttributeError.
Previously, udtf with partitioning will raise an AttributeError if the
terminate method is not defined, as shown below
```
>>> from pyspark.sql.functions import udtf
>>> from pyspark.sql import Row
>>>
>>> @udtf(returnType="a: int")
... class TestUDTF:
... def eval(self, row: Row):
... if row[0] > 5:
... yield row[0],
...
>>> df = spark.range(8)
>>> TestUDTF(df.asTable().partitionBy("id").orderBy("id")).show()
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
...
File "...pyspark/worker.py", line 1052, in eval
if self._udtf.terminate is not None:
AttributeError: 'TestUDTF' object has no attribute 'terminate'
```
However, the terminate method is not required in such cases.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]