srowen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve
UTF8String.replace() / StringReplace performance
URL: https://github.com/apache/spark/pull/24707#discussion_r287609735
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
##########
@@ -976,9 +977,28 @@ public UTF8String replace(UTF8String search, UTF8String
replace) {
if (EMPTY_UTF8.equals(search)) {
return this;
}
- String replaced = toString().replace(
- search.toString(), replace.toString());
- return fromString(replaced);
+ return replace(search.toString(), replace.toString());
+ }
+
+ public UTF8String replace(String search, String replace) {
+ String before = toString();
+ String after;
+ if (search.length() == 1 && replace.length() == 1) {
+ // Use single-character-replacement fast path
+ after = before.replace(search.charAt(0), replace.charAt(0));
+ } else {
+ // In Java 8, String.replace() is implemented using a regex and is
therefore
+ // somewhat inefficient (see
https://bugs.openjdk.java.net/browse/JDK-8058779).
+ // This is fixed in Java 9, but in Java 8 we can use Commons StringUtils
instead:
+ after = StringUtils.replace(before, search, replace);
+ }
+ // Use reference equality to cheaply detect whether the replacement had no
effect,
+ // in which case we can simply return the original UTF8String and save
some copying.
+ if (before == after) {
Review comment:
You need `.equals()` here, no? the objects almost certainly won't be the
same String?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]