LuciferYang opened a new pull request, #56567: URL: https://github.com/apache/spark/pull/56567
### What changes were proposed in this pull request? `CollationAwareUTF8String.trimRight` — the ICU path used by RTRIM-modifier collations — compared a Java-String (UTF-16) index against a Unicode code-point count: `lastNonSpacePosition == srcString.numChars()`. `lastNonSpacePosition` is a UTF-16 index into `src = srcString.toValidString()` (it is initialized to `src.length()` and decremented via `src.charAt`), so the sentinel must be compared against `src.length()`, matching the `charIndex == src.length()` check immediately above it. This PR changes that one comparison to `src.length()`. ### Why are the changes needed? For RTRIM-style collations, trailing spaces are ignored while matching the trim characters but must be re-appended to the result. With the code-point count, the "no trailing spaces were skipped" sentinel fired spuriously whenever the number of supplementary characters equaled the number of trailing spaces, so the preserved trailing spaces were dropped. For example, under `UNICODE_RTRIM`, right-trimming a supplementary character (such as U+1D538) followed by a single space returned an empty string instead of a single space. ### Does this PR introduce _any_ user-facing change? Yes, it fixes incorrect results. Under RTRIM-modifier collations, right-trim on a string containing supplementary characters followed by trailing spaces (when the count of supplementary characters equals the count of trailing spaces) no longer drops the preserved trailing spaces. Only previously-incorrect results change. ### How was this patch tested? Added cases to `CollationSupportSuite#testStringTrimRight` for RTRIM-modifier collations with supplementary characters and trailing spaces: the two coincidence cases that previously returned the wrong value, plus a BMP control, a non-coincidence control, and the all-spaces early-return path. The coincidence cases fail on the old code and pass with the fix. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
