[PR] [SPARK-55506][PYTHON] Pass explicit input schema to `to_pandas` in `CogroupPandasUDFSerializer` [spark]

via GitHub Thu, 12 Feb 2026 16:37:47 -0800


Yicong-Huang opened a new pull request, #54293:
URL: https://github.com/apache/spark/pull/54293


   ### What changes were proposed in this pull request?
   
   Pass explicit Spark schema (derived from each Arrow table's schema via 
`from_arrow_schema`) to `ArrowBatchTransformer.to_pandas()` in 
`CogroupPandasUDFSerializer.load_stream()`, instead of passing `None` (the 
inherited `_input_type`).
   
   ### Why are the changes needed?
   
   `CogroupPandasUDFSerializer` is constructed without `input_type`, so 
`_input_type` defaults to `None`. When `to_pandas()` receives `schema=None`, it 
infers the Spark schema from the Arrow batch internally via 
`from_arrow_type()`. This works, but:
   
   1. The same `None` is used for both left and right DataFrames, which is 
conceptually wrong since they can have different schemas.
   2. The schema inference is implicit rather than explicit.
   3. Other serializers like `ArrowBatchUDFSerializer` receive and pass 
explicit schemas.
   
   This was raised in 
https://github.com/apache/spark/pull/53963#discussion_r2167770076.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests in `test_pandas_cogrouped_map.py`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55506][PYTHON] Pass explicit input schema to `to_pandas` in `CogroupPandasUDFSerializer` [spark]

Reply via email to