LuciferYang opened a new pull request, #56567:
URL: https://github.com/apache/spark/pull/56567

   ### What changes were proposed in this pull request?
   
   `CollationAwareUTF8String.trimRight` — the ICU path used by RTRIM-modifier 
collations — compared a Java-String (UTF-16) index against a Unicode code-point 
count: `lastNonSpacePosition == srcString.numChars()`. `lastNonSpacePosition` 
is a UTF-16 index into `src = srcString.toValidString()` (it is initialized to 
`src.length()` and decremented via `src.charAt`), so the sentinel must be 
compared against `src.length()`, matching the `charIndex == src.length()` check 
immediately above it. This PR changes that one comparison to `src.length()`.
   
   ### Why are the changes needed?
   
   For RTRIM-style collations, trailing spaces are ignored while matching the 
trim characters but must be re-appended to the result. With the code-point 
count, the "no trailing spaces were skipped" sentinel fired spuriously whenever 
the number of supplementary characters equaled the number of trailing spaces, 
so the preserved trailing spaces were dropped. For example, under 
`UNICODE_RTRIM`, right-trimming a supplementary character (such as U+1D538) 
followed by a single space returned an empty string instead of a single space.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it fixes incorrect results. Under RTRIM-modifier collations, right-trim 
on a string containing supplementary characters followed by trailing spaces 
(when the count of supplementary characters equals the count of trailing 
spaces) no longer drops the preserved trailing spaces. Only 
previously-incorrect results change.
   
   ### How was this patch tested?
   
   Added cases to `CollationSupportSuite#testStringTrimRight` for 
RTRIM-modifier collations with supplementary characters and trailing spaces: 
the two coincidence cases that previously returned the wrong value, plus a BMP 
control, a non-coincidence control, and the all-spaces early-return path. The 
coincidence cases fail on the old code and pass with the fix.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to