[PR] [SPARK-55221][PYTHON] Add `to_arrow` transformer and remove `_create_struct_array` [spark]

via GitHub Mon, 26 Jan 2026 14:38:59 -0800


Yicong-Huang opened a new pull request, #53989:
URL: https://github.com/apache/spark/pull/53989


   ### What changes were proposed in this pull request?
   
   Add `PandasBatchTransformer.to_arrow` transformer method that converts a 
pandas DataFrame to an Arrow RecordBatch.
   
   Replace `_create_struct_array` with transformer composition:
   ```python
   ArrowBatchTransformer.wrap_struct(
       PandasBatchTransformer.to_arrow(df, schema, ...)
   ).column(0)
   ```
   
   ### Why are the changes needed?
   
   Part of [SPARK-55159](https://issues.apache.org/jira/browse/SPARK-55159).
   
   `_create_struct_array` mixed two transformations: pandas→arrow conversion 
and wrapping into struct. By extracting `to_arrow` as a standalone transformer, 
we can compose it with existing `ArrowBatchTransformer.wrap_struct`, following 
the single-responsibility principle.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   - Unit tests for `to_arrow` transformer
   - Existing pandas UDF tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55221][PYTHON] Add `to_arrow` transformer and remove `_create_struct_array` [spark]

Reply via email to