Yicong-Huang opened a new pull request, #53946: URL: https://github.com/apache/spark/pull/53946
### What changes were proposed in this pull request? This PR extracts the struct flattening/wrapping logic from `ArrowStreamUDFSerializer` into reusable transformer classes in a new `transformers.py` module: - `FlattenStructTransformer`: Flattens a single struct column into a RecordBatch - `WrapStructTransformer`: Wraps a RecordBatch's columns into a single struct column `ArrowStreamUDFSerializer` now composes these transformers instead of containing inline transformation logic. ### Why are the changes needed? This is part of [SPARK-55159](https://issues.apache.org/jira/browse/SPARK-55159) to improve the composability of Arrow serializers by separating data transformation from serialization. Benefits: - Clear separation of concerns (serialization vs transformation) - Transformers are reusable and testable in isolation - Easier to understand data flow as a pipeline ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests for both transformers in `test_transformers.py`. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
