[GitHub] spark pull request #19355: [SPARK-22130][Core] UTF8String.trim() scans " " t...

srowen Tue, 26 Sep 2017 10:46:18 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19355#discussion_r141131388
  
    --- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
    @@ -501,14 +501,13 @@ public UTF8String trim() {
         int e = this.numBytes - 1;
         // skip all of the space (0x20) in the left side
         while (s < this.numBytes && getByte(s) == 0x20) s++;
    -    // skip all of the space (0x20) in the right side
    -    while (e >= 0 && getByte(e) == 0x20) e--;
    -    if (s > e) {
    +    if (s == this.numBytes) {
           // empty string
           return EMPTY_UTF8;
    -    } else {
    -      return copyUTF8String(s, e);
         }
    +    // skip all of the space (0x20) in the right side
    +    while (e >= 0 && getByte(e) == 0x20) e--;
    --- End diff --
    
    Nit: while you're optimizing this, you can bring the declaration of e down 
here, as it won't be used unless there's a non-space char.
    
    I think this condition can start with `e > s` too. At the end, s and e 
point to the first and last non-space char. When the loop starts, s points to a 
non-space char. So you can stop when e == s; this is the case of one non-space 
char.
    
    Might be worth adding test cases for an empty string, and single-non-char 
string too.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19355: [SPARK-22130][Core] UTF8String.trim() scans " " t...

Reply via email to