[PR] [SPARK-47566][SQL] Support SUBSTRING_INDEX function to work with collated strings [spark]

via GitHub Tue, 26 Mar 2024 08:10:15 -0700


miland-db opened a new pull request, #45725:
URL: https://github.com/apache/spark/pull/45725


   ### What changes were proposed in this pull request?
   Extend built-in string functions to support non-binary, non-lowercase 
collation for: substring_index.
   
   ### Why are the changes needed?
   Update collation support for built-in string functions in Spark.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, users should now be able to use COLLATE within arguments for built-in 
string function SUBSTRING_INDEX in Spark SQL queries, using non-binary 
collations such as UNICODE_CI.
   
   ### How was this patch tested?
   Unit tests for queries using StringReplace 
(`CollationStringExpressionsSuite.scala`).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   
   ### Info:
   There is no check for collation match between string and delimiter, it will 
be introduced with Implicit Casting.
   
   We can remove the original `public UTF8String subStringIndex(UTF8String 
delim, int count)` method, and get the existing behavior using 
`subStringIndex(delim, count, 0)`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-47566][SQL] Support SUBSTRING_INDEX function to work with collated strings [spark]

Reply via email to