Github user original-brownbear commented on the issue:

    https://github.com/apache/spark/pull/19180
  
    @srowen benchmarked this some more on a quiet workstation to make sure and:
    
    this version still wins all categories and even beats the current version 
in `master` if at least 2 bytes are compared (`len > 1`) as a result of saving 
some non final field lookups and is faster in all other cases (couldn't find 
any case where it's  slower than the current `master`):
    
    ```java
      @Override
      public int compareTo(@Nonnull final UTF8String other) {
        int len = Math.min(numBytes, other.numBytes);
        if (len > 7) {
          return compareLarge(
              this.base, this.offset, numBytes, other.base, other.offset, 
other.numBytes, len
          );
        } else {
          return compareSmall(
              this.base, this.offset, numBytes, other.base, other.offset, 
other.numBytes, len
          );
        }
      }
    
      private static int compareLarge(Object base1, long off1, int len1, Object 
base2, long off2,
          int len2, int len) {
        // Shortcut for (len / 8) * 8
        int wordMax = len & ~7;
        for (int i = 0; i < wordMax; i += 8) {
            long left = getLong(base1, off1 + i);
            long right = getLong(base2, off2 + i);
            if (left != right) {
              if (IS_LITTLE_ENDIAN) {
                return Long.compareUnsigned(Long.reverseBytes(left), 
Long.reverseBytes(right));
              } else {
                return Long.compareUnsigned(left, right);
              }
            }
          }
        return compareSmall(
            base1, off1 + wordMax, len1, base2, off2 + wordMax, len2, len - 
wordMax
        );
      }
    
      private static int compareSmall(Object base1, long off1, int len1, Object 
base2, long off2,
          int len2, int len) {
        for (int i = 0; i < len; i++) {
          // In UTF-8, the byte should be unsigned, so we should compare them 
as unsigned int.
          int res = (Platform.getByte(base1, off1 + i) & 0xFF) -
              (Platform.getByte(base2, off2 + i) & 0xFF);
          if (res != 0) {
            return res;
          }
        }
        return len1 - len2;
      }
    ```
    
    The only category where this solution is slightly worse is the size of the 
method after JIT with `472` instead of `440` bytes.
    
    The reasons this one is faster are:
    
    * In the large branch `for (int i = 0; i < wordMax; i += 8) {` has its 
first check optimized away, since `wordMax` is always ` >= 1` see JITwatch below
    <img width="640" alt="screen shot 2017-09-15 at 11 46 37" 
src="https://user-images.githubusercontent.com/6490959/30476852-99c62eec-9a0b-11e7-91c0-d82ae9fcdb68.png";>
    
    * In the small branch `for (int i = 0; i < len; i++) {` has its first if 
optimized as well, since empty strings are not super likely (in my benchmark 
the 0 only comes up when `len % 8 == 0`).
    <img width="460" alt="screen shot 2017-09-15 at 11 49 29" 
src="https://user-images.githubusercontent.com/6490959/30477038-31e1c1fa-9a0c-11e7-8440-c8b2e2cc84fa.png";>
    
    * We don't have non-final field lookups in the tight loops and only fetch 
each field on `this` and `other` once no matter the path taken
    
    -----------------------------
    
    In my opinion (and benchmarks), the longer version with a `small` and 
`large` branch wins here with the way JDK8 JIT works. Especially since the case 
where only one byte is compared, in a string shorter than 8 bytes isn't all 
that theoretical.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to