miland-db commented on code in PR #45704:
URL: https://github.com/apache/spark/pull/45704#discussion_r1538955710
##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1136,6 +1136,104 @@ public UTF8String replace(UTF8String search, UTF8String
replace) {
return buf.build();
}
+ public UTF8String replace(UTF8String search, UTF8String replace, int
collationId) {
+ if (CollationFactory.fetchCollation(collationId).isBinaryCollation) {
+ return this.replace(search, replace);
+ }
+ if (collationId == CollationFactory.LOWERCASE_COLLATION_ID) {
+ return lowercaseReplace(search, replace);
+ }
+ return collatedReplace(search, replace, collationId);
+ }
+
+ public UTF8String lowercaseReplace(UTF8String search, UTF8String replace) {
Review Comment:
It is **not** faster. It is a separate method because
`StringSearch stringSearch = CollationFactory.getStringSearch(this, search,
collationId);` fails for `collationId == 1` because _collator_ for that value
is _null_ (see `CollationFactory`). For that reason I had to do comparisons the
standard way we do it for `UTF8_BINARY_LCASE` - convert everything to lowercase
and compare it that way.
So the idea is to do comparisons on lowercase strings but use the original
string for building the final result, to keep all uppercase and lowercase as in
the input.
With other type of collations we can do all comparisons on the original
input values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]