ulysses-you commented on a change in pull request #34310:
URL: https://github.com/apache/spark/pull/34310#discussion_r731483712
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -75,6 +75,42 @@ static long getPrefix(Object base, long offset, int
numBytes) {
return (IS_LITTLE_ENDIAN ? java.lang.Long.reverseBytes(p) : p) & ~mask;
}
+ public static int compareBinary(byte[] leftBase, byte[] rightBase) {
+ return compareBinary(leftBase, Platform.BYTE_ARRAY_OFFSET, leftBase.length,
Review comment:
thank you @srowen and @JoshRosen for point out the difference. I follow
the linked benchmark but add a new 512 byte slow benchmark which the first 511
bytes are same. The benchmark result shows it has no regression after this PR
and has big benifits if the byte arrays have many same prefix.
Before this PR:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_271-b09 on Mac OS X 10.16
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Byte Array compareTo: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte 800 861
70 81.9 12.2 1.0X
8-16 byte 810 878
59 80.9 12.4 1.0X
16-32 byte 804 887
40 81.5 12.3 1.0X
512-1024 byte 1050 1181
43 62.4 16.0 0.8X
512 byte slow 23593 23698
311 2.8 360.0 0.0X
2-7 byte 778 784
5 84.2 11.9 1.0X
```
After this PR:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_271-b09 on Mac OS X 10.16
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Byte Array compareTo: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
2-7 byte 425 471
24 154.2 6.5 1.0X
8-16 byte 751 814
40 87.2 11.5 0.5X
16-32 byte 789 842
42 83.1 12.0 0.5X
512-1024 byte 1038 1175
193 63.1 15.8 0.4X
512 byte slow 3419 3924
NaN 19.2 52.2 0.1X
2-7 byte 421 424
2 155.6 6.4 1.0X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]