[PR] [SPARK-54722][PYTHON] Register Pandas Grouped Iter Aggregate UDF for SQL usage [spark]

via GitHub Tue, 16 Dec 2025 16:29:13 -0800


Yicong-Huang opened a new pull request, #53493:
URL: https://github.com/apache/spark/pull/53493


   ### What changes were proposed in this pull request?
   
   This PR adds `SQL_GROUPED_AGG_PANDAS_ITER_UDF` to the list of supported eval 
types in `UDFRegistration.register()` method, allowing users to register Pandas 
Grouped Iter Aggregate UDFs for SQL usage.
   
   ### Why are the changes needed?
   
   Currently, the iterator API for grouped aggregate Pandas UDFs cannot be 
registered for SQL usage via `spark.udf.register()`. This is inconsistent with 
other UDF types like `SQL_GROUPED_AGG_ARROW_ITER_UDF` which is already 
supported.
   
   With this change, users can now register iterator-based grouped aggregate 
UDFs and use them in SQL queries:
   
   ```python
   @pandas_udf("double")
   def sum_iter_udf(it: Iterator[pd.Series]) -> float:
       total = 0.0
       for series in it:
           total += series.sum()
       return total
   
   spark.udf.register("sum_iter_udf", sum_iter_udf)
   spark.sql("SELECT sum_iter_udf(v) FROM table GROUP BY id")
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Users can now register Pandas Grouped Iter Aggregate UDFs 
(`Iterator[pd.Series] -> scalar`) for SQL usage.
   
   ### How was this patch tested?
   
   Added a new test case `test_register_grouped_agg_iter_udf` in 
`python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54722][PYTHON] Register Pandas Grouped Iter Aggregate UDF for SQL usage [spark]

Reply via email to