yaooqinn commented on a change in pull request #26626: [SPARK-29986][SQL]
Introduce java like string trim to UTF8String
URL: https://github.com/apache/spark/pull/26626#discussion_r349422225
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
##########
@@ -553,6 +553,29 @@ public UTF8String trim() {
return copyUTF8String(s, e);
}
+ /**
+ * Trims whitespaces (<= ASCII 32) from both ends of this string.
+ *
+ * @return this string with no control characters and spaces at the start or
end
+ */
+ public UTF8String trimAll() {
+ int s = 0;
+ // skip all of the whitespaces (<=0x20) in the left side
+ while (s < this.numBytes && getByte(s) <= 0x20) s++;
+ if (s == this.numBytes) {
+ // Everything trimmed
+ return EMPTY_UTF8;
+ }
+ // skip all of the whitespaces (<=0x20) in the right side
+ int e = this.numBytes - 1;
+ while (e > s && getByte(e) <= 0x20) e--;
+ if (s == 0 && e == numBytes - 1) {
+ // Nothing trimmed
+ return this;
+ }
+ return copyUTF8String(s, e);
Review comment:
do we not copy when it is an `EMPTY_UTF8` either? A bit odd if we do it
differently.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]