[PR] [SPARK-54589][PYTHON] Consolidate ArrowStreamAggPandasIterUDFSerializer into ArrowStreamAggPandasUDFSerializer [spark]

via GitHub Thu, 11 Dec 2025 17:15:34 -0800


Yicong-Huang opened a new pull request, #53449:
URL: https://github.com/apache/spark/pull/53449


   ### What changes were proposed in this pull request?
   
   This PR consolidates `ArrowStreamAggPandasIterUDFSerializer` into 
`ArrowStreamAggPandasUDFSerializer` for `SQL_GROUPED_AGG_PANDAS`.
   
   Changes:
   1. **Removed `ArrowStreamAggPandasIterUDFSerializer`** - The class was 
nearly identical to `ArrowStreamAggPandasUDFSerializer`
   2. **Unified serializer** - `ArrowStreamAggPandasUDFSerializer` now serves 
`SQL_GROUPED_AGG_PANDAS_UDF`, `SQL_GROUPED_AGG_PANDAS_ITER_UDF`, and 
`SQL_WINDOW_AGG_PANDAS_UDF`
   3. **Added mapper for non-iter UDFs** - A new mapper in `worker.py` handles 
batch concatenation for `SQL_GROUPED_AGG_PANDAS_UDF` and 
`SQL_WINDOW_AGG_PANDAS_UDF`
   
   ### Why are the changes needed?
   
   Similar to SPARK-54316, the two serializer classes had nearly identical 
implementations:
   - Identical `__init__` methods
   - Same base class (`ArrowStreamPandasUDFSerializer`)
   - Only `load_stream` differed slightly in output format
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. It's an internal refactor.
   
   ### How was this patch tested?
   
   Existing unit tests:
   - `python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py`
   - `python/pyspark/sql/tests/pandas/test_pandas_udf_window.py`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54589][PYTHON] Consolidate ArrowStreamAggPandasIterUDFSerializer into ArrowStreamAggPandasUDFSerializer [spark]

Reply via email to