Github user original-brownbear commented on the issue: https://github.com/apache/spark/pull/19180 @srowen benchmarked this some more on a quiet workstation to make sure and: this version still wins all categories and even beats the current version in `master` if at least 2 bytes are compared (`len > 1`) as a result of saving some non final field lookups and is faster in all other cases (couldn't find any case where it's slower than the current `master`): ```java @Override public int compareTo(@Nonnull final UTF8String other) { int len = Math.min(numBytes, other.numBytes); if (len > 7) { return compareLarge( this.base, this.offset, numBytes, other.base, other.offset, other.numBytes, len ); } else { return compareSmall( this.base, this.offset, numBytes, other.base, other.offset, other.numBytes, len ); } } private static int compareLarge(Object base1, long off1, int len1, Object base2, long off2, int len2, int len) { // Shortcut for (len / 8) * 8 int wordMax = len & ~7; for (int i = 0; i < wordMax; i += 8) { long left = getLong(base1, off1 + i); long right = getLong(base2, off2 + i); if (left != right) { if (IS_LITTLE_ENDIAN) { return Long.compareUnsigned(Long.reverseBytes(left), Long.reverseBytes(right)); } else { return Long.compareUnsigned(left, right); } } } return compareSmall( base1, off1 + wordMax, len1, base2, off2 + wordMax, len2, len - wordMax ); } private static int compareSmall(Object base1, long off1, int len1, Object base2, long off2, int len2, int len) { for (int i = 0; i < len; i++) { // In UTF-8, the byte should be unsigned, so we should compare them as unsigned int. int res = (Platform.getByte(base1, off1 + i) & 0xFF) - (Platform.getByte(base2, off2 + i) & 0xFF); if (res != 0) { return res; } } return len1 - len2; } ``` The only category where this solution is slightly worse is the size of the method after JIT with `472` instead of `440` bytes. The reasons this one is faster are: * In the large branch `for (int i = 0; i < wordMax; i += 8) {` has its first check optimized away, since `wordMax` is always ` >= 1` see JITwatch below <img width="640" alt="screen shot 2017-09-15 at 11 46 37" src="https://user-images.githubusercontent.com/6490959/30476852-99c62eec-9a0b-11e7-91c0-d82ae9fcdb68.png"> * In the small branch `for (int i = 0; i < len; i++) {` has its first if optimized as well, since empty strings are not super likely (in my benchmark the 0 only comes up when `len % 8 == 0`). <img width="460" alt="screen shot 2017-09-15 at 11 49 29" src="https://user-images.githubusercontent.com/6490959/30477038-31e1c1fa-9a0c-11e7-8440-c8b2e2cc84fa.png"> * We don't have non-final field lookups in the tight loops and only fetch each field on `this` and `other` once no matter the path taken ----------------------------- In my opinion (and benchmarks), the longer version with a `small` and `large` branch wins here with the way JDK8 JIT works. Especially since the case where only one byte is compared, in a string shorter than 8 bytes isn't all that theoretical.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org