Yicong-Huang opened a new pull request, #53992:
URL: https://github.com/apache/spark/pull/53992

   ### What changes were proposed in this pull request?
   
   Refactor `_create_batch` and `_create_array` in PySpark's Pandas serializers 
to use Spark's `DataType` as the single source of truth, deriving Arrow types 
internally when needed.
   
   **Before**: Callers in `worker.py` pre-computed `arrow_return_type = 
to_arrow_type(return_type, ...)` and passed both `arrow_type` and `spark_type` 
through the serialization pipeline.
   
   **After**: Callers pass only `spark_type` (Spark DataType). The serializers 
derive `arrow_type` internally via `to_arrow_type()`.
   
   Key changes:
   - `_create_array`: signature changed from `(series, arrow_type, 
spark_type=None, ...)` to `(series, spark_type, *, ...)`
   - `_create_struct_array`: signature changed from `(df, arrow_struct_type, 
spark_type=None)` to `(df, spark_type, *, ...)`
   - `_create_batch`: input format changed from `(data, arrow_type, 
spark_type)` to `(data, spark_type)`
   - ~15 Pandas-based wrapper functions in `worker.py` updated to yield 
`return_type` instead of `arrow_return_type`
   - Arrow UDF functions (which use `ArrowStreamArrowUDFSerializer`) unchanged 
- they still pass `arrow_type` directly
   
   ### Why are the changes needed?
   
   1. **Single source of truth**: `spark_type` is the canonical type 
representation defined by users
   2. **Simplified API**: Callers no longer need to pre-compute `arrow_type`
   3. **Consistency**: Both `_create_batch` and `_create_array` now follow the 
same pattern
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This is an internal refactoring with no user-facing API changes.
   
   ### How was this patch tested?
   
   Existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to