Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

via GitHub Wed, 17 Apr 2024 02:33:30 -0700


uros-db commented on code in PR #46097:
URL: https://github.com/apache/spark/pull/46097#discussion_r1568525373



##########
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##########
@@ -101,6 +101,9 @@ public void testContains() throws SparkException {
     assertContains("ab世De", "AB世dE", "UNICODE_CI", true);
     assertContains("äbćδe", "ÄbćδE", "UNICODE_CI", true);
     assertContains("äbćδe", "ÄBcΔÉ", "UNICODE_CI", false);
+    // Case-variable character length

Review Comment:
   [ICU 
StringSearch](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html)
 UNICODE_CI doesn't recognize this
   
   as per these docs, I'd expect this to work only with a "GERMAN_CI" (which we 
don't yet have in Spark)
   ```
   StringSearch ensures that language eccentricity can be handled, e.g. for the 
German collator, characters ß and SS will be matched if case is chosen to be 
ignored. See the ["ICU Collation Design 
Document"](https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/main/design/collation/ICU_collation_design.htm)
 for more information.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

Reply via email to