uros-b commented on code in PR #56567:
URL: https://github.com/apache/spark/pull/56567#discussion_r3429927047


##########
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##########
@@ -3647,6 +3647,15 @@ public void testStringTrimRight() throws SparkException {
     assertStringTrimRight(UTF8_LCASE, "𝔸", "a", "𝔸");
     assertStringTrimRight(UNICODE, "𝔸", "a", "𝔸");
     assertStringTrimRight(UNICODE_CI, "𝔸", "a", "");
+    // RTRIM-modifier collations (ICU path): trailing spaces are ignored while 
matching but must
+    // be re-appended afterwards. When the number of trailing spaces equals 
the number of
+    // supplementary code points, a Java-char-index vs code-point-count 
comparison previously
+    // dropped the preserved spaces.
+    assertStringTrimRight("UNICODE_RTRIM", "x ", "x", " ");
+    assertStringTrimRight("UNICODE_RTRIM", "   ", "x", "   ");
+    assertStringTrimRight("UNICODE_RTRIM", "𝔸 ", "𝔸", " ");
+    assertStringTrimRight("UNICODE_RTRIM", "𝔸  ", "𝔸", "  ");
+    assertStringTrimRight("UNICODE_RTRIM", "𝔸𝔸  ", "𝔸", "  ");
   }

Review Comment:
   While we're already here, can we please add a few test cases with other 
RTRIM collations (e.g. UTF8_BINARY_RTRIM and UTF8_LCASE_RTRIM) - just to lock 
down the behaviour across the board.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to