itholic opened a new pull request, #48074:
URL: https://github.com/apache/spark/pull/48074
### What changes were proposed in this pull request?
This PR proposes to support non-column arguments in UDTF
### Why are the changes needed?
Simplifies UDTF usage by removing the need for explicit column conversions.
### Does this PR introduce _any_ user-facing change?
It does not break the existing behavior, but users can now also pass
non-Column arguments directly to UDTFs.
For example, let's say we have an simple UDTF `RepeatItem` as below:
```python
from pyspark.sql.functions import udtf, lit
from pyspark.sql.types import StructType, IntegerType
@udtf(returnType="item: int")
class RepeatItem:
"""
A UDTF that returns `item` as many rows as specified in `num_rows`.
"""
def eval(self, item: int, num_rows: int):
for i in range(0, num_rows):
yield (item,)
```
UDTF allows direct input without `lit()`:
**Before**
```python
RepeatItem(lit(2024), lit(6)).show()
```
**After** (NOTE: Before also still works)
```
RepeatItem(2024, 6).show()
```
### How was this patch tested?
Added UTs for the existing test suites
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]