Yicong-Huang opened a new pull request, #53449: URL: https://github.com/apache/spark/pull/53449
### What changes were proposed in this pull request? This PR consolidates `ArrowStreamAggPandasIterUDFSerializer` into `ArrowStreamAggPandasUDFSerializer` for `SQL_GROUPED_AGG_PANDAS`. Changes: 1. **Removed `ArrowStreamAggPandasIterUDFSerializer`** - The class was nearly identical to `ArrowStreamAggPandasUDFSerializer` 2. **Unified serializer** - `ArrowStreamAggPandasUDFSerializer` now serves `SQL_GROUPED_AGG_PANDAS_UDF`, `SQL_GROUPED_AGG_PANDAS_ITER_UDF`, and `SQL_WINDOW_AGG_PANDAS_UDF` 3. **Added mapper for non-iter UDFs** - A new mapper in `worker.py` handles batch concatenation for `SQL_GROUPED_AGG_PANDAS_UDF` and `SQL_WINDOW_AGG_PANDAS_UDF` ### Why are the changes needed? Similar to SPARK-54316, the two serializer classes had nearly identical implementations: - Identical `__init__` methods - Same base class (`ArrowStreamPandasUDFSerializer`) - Only `load_stream` differed slightly in output format ### Does this PR introduce _any_ user-facing change? No. It's an internal refactor. ### How was this patch tested? Existing unit tests: - `python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py` - `python/pyspark/sql/tests/pandas/test_pandas_udf_window.py` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
