[PR] [SPARK-55162][PYTHON] Extract transformers from ArrowStreamUDFSerializer [spark]

via GitHub Fri, 23 Jan 2026 17:43:05 -0800


Yicong-Huang opened a new pull request, #53946:
URL: https://github.com/apache/spark/pull/53946


   ### What changes were proposed in this pull request?
   
   This PR extracts the struct flattening/wrapping logic from 
`ArrowStreamUDFSerializer` into reusable transformer classes in a new 
`transformers.py` module:
   
   - `FlattenStructTransformer`: Flattens a single struct column into a 
RecordBatch
   - `WrapStructTransformer`: Wraps a RecordBatch's columns into a single 
struct column
   
   `ArrowStreamUDFSerializer` now composes these transformers instead of 
containing inline transformation logic.
   
   ### Why are the changes needed?
   
   This is part of 
[SPARK-55159](https://issues.apache.org/jira/browse/SPARK-55159) to improve the 
composability of Arrow serializers by separating data transformation from 
serialization.
   
   Benefits:
   - Clear separation of concerns (serialization vs transformation)
   - Transformers are reusable and testable in isolation
   - Easier to understand data flow as a pipeline
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added unit tests for both transformers in `test_transformers.py`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55162][PYTHON] Extract transformers from ArrowStreamUDFSerializer [spark]

Reply via email to