srowen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve 
UTF8String.replace() / StringReplace performance
URL: https://github.com/apache/spark/pull/24707#discussion_r287610550
 
 

 ##########
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
 ##########
 @@ -976,9 +977,28 @@ public UTF8String replace(UTF8String search, UTF8String 
replace) {
     if (EMPTY_UTF8.equals(search)) {
       return this;
     }
-    String replaced = toString().replace(
-      search.toString(), replace.toString());
-    return fromString(replaced);
+    return replace(search.toString(), replace.toString());
+  }
+
+  public UTF8String replace(String search, String replace) {
+    String before = toString();
+    String after;
+    if (search.length() == 1 && replace.length() == 1) {
+      // Use single-character-replacement fast path
+      after = before.replace(search.charAt(0), replace.charAt(0));
+    } else {
+      // In Java 8, String.replace() is implemented using a regex and is 
therefore
+      // somewhat inefficient (see 
https://bugs.openjdk.java.net/browse/JDK-8058779).
+      // This is fixed in Java 9, but in Java 8 we can use Commons StringUtils 
instead:
+      after = StringUtils.replace(before, search, replace);
+    }
+    // Use reference equality to cheaply detect whether the replacement had no 
effect,
+    // in which case we can simply return the original UTF8String and save 
some copying.
+    if (before == after) {
 
 Review comment:
   I see, OK. `String.equals` also checks this immediately. Would it be worth 
the tiny overhead of the method call to also catch the cases where the 
implementations don't happen to return the same object but are still the same? 
don't know.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to