ulysses-you commented on a change in pull request #34310:
URL: https://github.com/apache/spark/pull/34310#discussion_r731483712



##########
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -75,6 +75,42 @@ static long getPrefix(Object base, long offset, int 
numBytes) {
     return (IS_LITTLE_ENDIAN ? java.lang.Long.reverseBytes(p) : p) & ~mask;
   }
 
+  public static int compareBinary(byte[] leftBase, byte[] rightBase) {
+    return compareBinary(leftBase, Platform.BYTE_ARRAY_OFFSET, leftBase.length,

Review comment:
       thank you @srowen and @JoshRosen for point out the difference. I follow 
the linked benchmark but add a new 512 byte slow benchmark which the first 511 
bytes are same. The benchmark result shows it has no regression after this PR 
and has big benifits if the byte arrays have many same prefix.
   
   Before this PR:
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_271-b09 on Mac OS X 10.16
   Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
   Byte Array compareTo:                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   2-7 byte                                            800            861       
   70         81.9          12.2       1.0X
   8-16 byte                                           810            878       
   59         80.9          12.4       1.0X
   16-32 byte                                          804            887       
   40         81.5          12.3       1.0X
   512-1024 byte                                      1050           1181       
   43         62.4          16.0       0.8X
   512 byte slow                                     23593          23698       
  311          2.8         360.0       0.0X
   2-7 byte                                            778            784       
    5         84.2          11.9       1.0X
   ```
   
   After this PR:
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_271-b09 on Mac OS X 10.16
   Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
   Byte Array compareTo:                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   2-7 byte                                            425            471       
   24        154.2           6.5       1.0X
   8-16 byte                                           751            814       
   40         87.2          11.5       0.5X
   16-32 byte                                          789            842       
   42         83.1          12.0       0.5X
   512-1024 byte                                      1038           1175       
  193         63.1          15.8       0.4X
   512 byte slow                                      3419           3924       
  NaN         19.2          52.2       0.1X
   2-7 byte                                            421            424       
    2        155.6           6.4       1.0X
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to