Github user original-brownbear commented on the issue:
https://github.com/apache/spark/pull/19180
@srowen benchmarked this some more on a quiet workstation to make sure and:
this version still wins all categories and even beats the current version
in `master` if at least 2 bytes are compared (`len > 1`) as a result of saving
some non final field lookups and is faster in all other cases (couldn't find
any case where it's slower than the current `master`):
```java
@Override
public int compareTo(@Nonnull final UTF8String other) {
int len = Math.min(numBytes, other.numBytes);
if (len > 7) {
return compareLarge(
this.base, this.offset, numBytes, other.base, other.offset,
other.numBytes, len
);
} else {
return compareSmall(
this.base, this.offset, numBytes, other.base, other.offset,
other.numBytes, len
);
}
}
private static int compareLarge(Object base1, long off1, int len1, Object
base2, long off2,
int len2, int len) {
// Shortcut for (len / 8) * 8
int wordMax = len & ~7;
for (int i = 0; i < wordMax; i += 8) {
long left = getLong(base1, off1 + i);
long right = getLong(base2, off2 + i);
if (left != right) {
if (IS_LITTLE_ENDIAN) {
return Long.compareUnsigned(Long.reverseBytes(left),
Long.reverseBytes(right));
} else {
return Long.compareUnsigned(left, right);
}
}
}
return compareSmall(
base1, off1 + wordMax, len1, base2, off2 + wordMax, len2, len -
wordMax
);
}
private static int compareSmall(Object base1, long off1, int len1, Object
base2, long off2,
int len2, int len) {
for (int i = 0; i < len; i++) {
// In UTF-8, the byte should be unsigned, so we should compare them
as unsigned int.
int res = (Platform.getByte(base1, off1 + i) & 0xFF) -
(Platform.getByte(base2, off2 + i) & 0xFF);
if (res != 0) {
return res;
}
}
return len1 - len2;
}
```
The only category where this solution is slightly worse is the size of the
method after JIT with `472` instead of `440` bytes.
The reasons this one is faster are:
* In the large branch `for (int i = 0; i < wordMax; i += 8) {` has its
first check optimized away, since `wordMax` is always ` >= 1` see JITwatch below
<img width="640" alt="screen shot 2017-09-15 at 11 46 37"
src="https://user-images.githubusercontent.com/6490959/30476852-99c62eec-9a0b-11e7-91c0-d82ae9fcdb68.png">
* In the small branch `for (int i = 0; i < len; i++) {` has its first if
optimized as well, since empty strings are not super likely (in my benchmark
the 0 only comes up when `len % 8 == 0`).
<img width="460" alt="screen shot 2017-09-15 at 11 49 29"
src="https://user-images.githubusercontent.com/6490959/30477038-31e1c1fa-9a0c-11e7-8440-c8b2e2cc84fa.png">
* We don't have non-final field lookups in the tight loops and only fetch
each field on `this` and `other` once no matter the path taken
-----------------------------
In my opinion (and benchmarks), the longer version with a `small` and
`large` branch wins here with the way JDK8 JIT works. Especially since the case
where only one byte is compared, in a string shorter than 8 bytes isn't all
that theoretical.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]