uros-db commented on code in PR #45704:
URL: https://github.com/apache/spark/pull/45704#discussion_r1546263160


##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1136,6 +1136,104 @@ public UTF8String replace(UTF8String search, UTF8String 
replace) {
     return buf.build();
   }
 
+  public UTF8String replace(UTF8String search, UTF8String replace, int 
collationId) {
+    if (CollationFactory.fetchCollation(collationId).isBinaryCollation) {
+      return this.replace(search, replace);
+    }
+    if (collationId == CollationFactory.LOWERCASE_COLLATION_ID) {
+      return lowercaseReplace(search, replace);
+    }
+    return collatedReplace(search, replace, collationId);
+  }
+
+  public UTF8String lowercaseReplace(UTF8String search, UTF8String replace) {

Review Comment:
   I think there is simply no `com.ibm.icu.text.Collator` instance for 
**UTF8_BINARY_LCASE**. After all, UTF8String is a Spark concept, and while I 
suppose that we could introduce some kind of wrapper collator for 
UTF8_BINARY_LCASE, it wouldn't seem like a good idea to me
   
   Instead, I think there should be an easy way to get a 
`com.ibm.icu.text.StringSearch` instance without passing a collationId 
(CollationFactory currently doesn't support this, but @miland-db can modify it 
in a way that would allow optional collationId: `new StringSearch(pattern, 
target)`)
   
   After these changes, I believe `lowercaseReplace` and `collatedReplace` can 
essentially be combined into a single function, where for **UTF8_BINARY_LCASE** 
it should be enough to pass `.toLowerCase()` for both **this** & **search** 
parameters (with no collationId)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to