ilicmarkodb opened a new pull request, #54324: URL: https://github.com/apache/spark/pull/54324
### What changes were proposed in this pull request? This PR adds default collation support for SQL user-defined functions, enabling UDFs to inherit schema-level collations and specify explicit default collations via the `DEFAULT COLLATION` clause. ### Why are the changes needed? Currently, SQL UDFs in Spark don't support collation specifications. This PR enables: - UDFs to specify DEFAULT COLLATION clause in CREATE FUNCTION statements - UDFs to automatically inherit the schema's default collation when not explicitly specified - Proper handling of explicit UTF8_BINARY collations without override - Collation support for table function return columns ### Does this PR introduce any user-facing change? Yes. Users can now: - Use `DEFAULT COLLATION <collation_name>` in CREATE FUNCTION statements - Have UDFs automatically inherit the schema's default collation Example: ```sql CREATE FUNCTION my_func(p1 STRING) RETURNS STRING DEFAULT COLLATION UTF8_LCASE RETURN SELECT upper(p1); ``` ### How was this patch tested? Added comprehensive tests in `DefaultCollationTestSuite` covering: - UDF with explicit UTF8_BINARY collation for params/return type - UDF applies default collation to params - Table UDF applies default collation - UDF with multiple collation contexts ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with Claude Sonnet 4.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
