uros-b commented on code in PR #56567:
URL: https://github.com/apache/spark/pull/56567#discussion_r3429927047
##########
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##########
@@ -3647,6 +3647,15 @@ public void testStringTrimRight() throws SparkException {
assertStringTrimRight(UTF8_LCASE, "𝔸", "a", "𝔸");
assertStringTrimRight(UNICODE, "𝔸", "a", "𝔸");
assertStringTrimRight(UNICODE_CI, "𝔸", "a", "");
+ // RTRIM-modifier collations (ICU path): trailing spaces are ignored while
matching but must
+ // be re-appended afterwards. When the number of trailing spaces equals
the number of
+ // supplementary code points, a Java-char-index vs code-point-count
comparison previously
+ // dropped the preserved spaces.
+ assertStringTrimRight("UNICODE_RTRIM", "x ", "x", " ");
+ assertStringTrimRight("UNICODE_RTRIM", " ", "x", " ");
+ assertStringTrimRight("UNICODE_RTRIM", "𝔸 ", "𝔸", " ");
+ assertStringTrimRight("UNICODE_RTRIM", "𝔸 ", "𝔸", " ");
+ assertStringTrimRight("UNICODE_RTRIM", "𝔸𝔸 ", "𝔸", " ");
}
Review Comment:
While we're already here, can we please add a few test cases with other
RTRIM collations (e.g. UTF8_BINARY_RTRIM and UTF8_LCASE_RTRIM) - just to lock
down the behaviour across the board.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]