vladimirg-db commented on code in PR #46097:
URL: https://github.com/apache/spark/pull/46097#discussion_r1568379883
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##########
@@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l, final
UTF8String r,
private static class CollationAwareUTF8String {
Review Comment:
Do we still need that? Looks like dead code
##########
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##########
@@ -101,6 +101,9 @@ public void testContains() throws SparkException {
assertContains("ab世De", "AB世dE", "UNICODE_CI", true);
assertContains("äbćδe", "ÄbćδE", "UNICODE_CI", true);
assertContains("äbćδe", "ÄBcΔÉ", "UNICODE_CI", false);
+ // Case-variable character length
Review Comment:
Actually, do you maybe wanna add a test case for `'ß'.upper() == 'SS'`?
Since it's a CI, and not LOWERCASE collation.
##########
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##########
@@ -101,6 +101,9 @@ public void testContains() throws SparkException {
assertContains("ab世De", "AB世dE", "UNICODE_CI", true);
assertContains("äbćδe", "ÄbćδE", "UNICODE_CI", true);
assertContains("äbćδe", "ÄBcΔÉ", "UNICODE_CI", false);
+ // Case-variable character length
Review Comment:
I would say more precisely "Longer binary lowercase representation", or
something like that, since in future we may add test cases for "Longer binary
uppercase representation" and such comments would point out the difference for
a reader.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]