Yicong-Huang opened a new pull request, #54296: URL: https://github.com/apache/spark/pull/54296
### What changes were proposed in this pull request? This PR consolidates the `SQL_SCALAR_ARROW_UDF` execution path by: 1. Extracting `verify_scalar_result()` as a reusable helper to replace inline `verify_result_type` and `verify_result_length` closures in `wrap_scalar_arrow_udf` 2. Removing the dedicated `wrap_scalar_arrow_udf` wrapper and replacing it with the general `ArrowStreamGroupSerializer`-based path 3. Adding `ArrowBatchTransformer.enforce_schema()` to handle schema enforcement (column reordering and type coercion) in a centralized way 4. Unifying the mapper logic so `SQL_SCALAR_ARROW_UDF` follows the same pattern as `SQL_MAP_ARROW_ITER_UDF` This is a follow-up to SPARK-55389 which consolidated `SQL_MAP_ARROW_ITER_UDF`. ### Why are the changes needed? The scalar Arrow UDF path had its own dedicated wrapper (`wrap_scalar_arrow_udf`), mapper, and serializer logic that duplicated patterns already available in the consolidated `ArrowStreamGroupSerializer` infrastructure. This refactoring reduces code duplication and makes the UDF execution paths more consistent. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests for scalar Arrow UDFs. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
