Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19180#discussion_r138453195 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -1097,8 +1101,21 @@ public UTF8String copy() { @Override public int compareTo(@Nonnull final UTF8String other) { int len = Math.min(numBytes, other.numBytes); - // TODO: compare 8 bytes as unsigned long - for (int i = 0; i < len; i ++) { + int words = len / Longs.BYTES; + long roffset = other.getBaseOffset(); + Object rbase = other.getBaseObject(); + for (int i = 0; i < words * Longs.BYTES; i += Longs.BYTES) { + long left = getLong(base, offset + i); + long right = getLong(rbase, roffset + i); + if (left != right) { + if (!IS_LITTLE_ENDIAN) { --- End diff -- (Nit: flip the condition to avoid the negation) Is the point that if JIT thinks the value can't change then either way it can snip out the branch that will never happen? then yes checking the condition in or outside the loop doesn't matter. If that not always true, then I'm pretty clear that checking the condition inside this hot loop is slower. My instinct would be to check it outside the loop just in case. However, I also note that it only matters once in the execution of the method, because as soon as `left != right`, the loop terminates. I think it's OK this way, therefore.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org