Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19180#discussion_r138453195
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -1097,8 +1101,21 @@ public UTF8String copy() {
@Override
public int compareTo(@Nonnull final UTF8String other) {
int len = Math.min(numBytes, other.numBytes);
- // TODO: compare 8 bytes as unsigned long
- for (int i = 0; i < len; i ++) {
+ int words = len / Longs.BYTES;
+ long roffset = other.getBaseOffset();
+ Object rbase = other.getBaseObject();
+ for (int i = 0; i < words * Longs.BYTES; i += Longs.BYTES) {
+ long left = getLong(base, offset + i);
+ long right = getLong(rbase, roffset + i);
+ if (left != right) {
+ if (!IS_LITTLE_ENDIAN) {
--- End diff --
(Nit: flip the condition to avoid the negation)
Is the point that if JIT thinks the value can't change then either way it
can snip out the branch that will never happen? then yes checking the condition
in or outside the loop doesn't matter. If that not always true, then I'm pretty
clear that checking the condition inside this hot loop is slower.
My instinct would be to check it outside the loop just in case. However, I
also note that it only matters once in the execution of the method, because as
soon as `left != right`, the loop terminates. I think it's OK this way,
therefore.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]