[PR] [SPARK-55390][PYTHON] Consolidate SQL_SCALAR_ARROW_UDF wrapper, mapper, and serializer logic [spark]

via GitHub Thu, 12 Feb 2026 17:25:20 -0800


Yicong-Huang opened a new pull request, #54296:
URL: https://github.com/apache/spark/pull/54296


   ### What changes were proposed in this pull request?
   
   This PR consolidates the `SQL_SCALAR_ARROW_UDF` execution path by:
   
   1. Extracting `verify_scalar_result()` as a reusable helper to replace 
inline `verify_result_type` and `verify_result_length` closures in 
`wrap_scalar_arrow_udf`
   2. Removing the dedicated `wrap_scalar_arrow_udf` wrapper and replacing it 
with the general `ArrowStreamGroupSerializer`-based path
   3. Adding `ArrowBatchTransformer.enforce_schema()` to handle schema 
enforcement (column reordering and type coercion) in a centralized way
   4. Unifying the mapper logic so `SQL_SCALAR_ARROW_UDF` follows the same 
pattern as `SQL_MAP_ARROW_ITER_UDF`
   
   This is a follow-up to SPARK-55389 which consolidated 
`SQL_MAP_ARROW_ITER_UDF`.
   
   ### Why are the changes needed?
   
   The scalar Arrow UDF path had its own dedicated wrapper 
(`wrap_scalar_arrow_udf`), mapper, and serializer logic that duplicated 
patterns already available in the consolidated `ArrowStreamGroupSerializer` 
infrastructure. This refactoring reduces code duplication and makes the UDF 
execution paths more consistent.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing tests for scalar Arrow UDFs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55390][PYTHON] Consolidate SQL_SCALAR_ARROW_UDF wrapper, mapper, and serializer logic [spark]

Reply via email to