miland-db commented on code in PR #45704:
URL: https://github.com/apache/spark/pull/45704#discussion_r1538955710


##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1136,6 +1136,104 @@ public UTF8String replace(UTF8String search, UTF8String 
replace) {
     return buf.build();
   }
 
+  public UTF8String replace(UTF8String search, UTF8String replace, int 
collationId) {
+    if (CollationFactory.fetchCollation(collationId).isBinaryCollation) {
+      return this.replace(search, replace);
+    }
+    if (collationId == CollationFactory.LOWERCASE_COLLATION_ID) {
+      return lowercaseReplace(search, replace);
+    }
+    return collatedReplace(search, replace, collationId);
+  }
+
+  public UTF8String lowercaseReplace(UTF8String search, UTF8String replace) {

Review Comment:
   It is **not** faster. It is a separate method because 
   `StringSearch stringSearch = CollationFactory.getStringSearch(this, search, 
collationId);` fails for `collationId == 1` because _collator_ for that value 
is _null_ (see `CollationFactory`). For that reason I had to do comparisons the 
standard way we do it for `UTF8_BINARY_LCASE` - convert everything to lowercase 
and compare it that way. 
   
   So the idea is to do comparisons on lowercase strings but use the original 
string for building the final result, to keep all uppercase and lowercase as in 
the input.
   
   With other type of collations we can do all comparisons on the original 
input values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to