Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v4]

2021-01-19 Thread Valerie Peng
On Mon, 18 Jan 2021 13:39:04 GMT, Claes Redestad  wrote:

>> - The MD5 intrinsics added by 
>> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that 
>> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics 
>> from which the MD5 intrinsic takes inspiration
>> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to 
>> make it acceptable to use inline and replace the array in MD5 wholesale. 
>> This improves performance both in the presence and the absence of the 
>> intrinsic optimization.
>> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element 
>> arrays), but allocating the array lazily gets most of the speed-up in the 
>> presence of an intrinsic while being neutral in its absence.
>> 
>> Baseline:
>>   (digesterName)  (length)Cnt Score  
>> Error   Units
>> MessageDigests.digestMD516 15  
>> 2714.307 ±   21.133  ops/ms
>> MessageDigests.digestMD5  1024 15   
>> 318.087 ±0.637  ops/ms
>> MessageDigests.digest  SHA-116 15  
>> 1387.266 ±   40.932  ops/ms
>> MessageDigests.digest  SHA-1  1024 15   
>> 109.273 ±0.149  ops/ms
>> MessageDigests.digestSHA-25616 15   
>> 995.566 ±   21.186  ops/ms
>> MessageDigests.digestSHA-256  1024 15
>> 89.104 ±0.079  ops/ms
>> MessageDigests.digestSHA-51216 15   
>> 803.030 ±   15.722  ops/ms
>> MessageDigests.digestSHA-512  1024 15   
>> 115.611 ±0.234  ops/ms
>> MessageDigests.getAndDigest  MD516 15  
>> 2190.367 ±   97.037  ops/ms
>> MessageDigests.getAndDigest  MD5  1024 15   
>> 302.903 ±1.809  ops/ms
>> MessageDigests.getAndDigestSHA-116 15  
>> 1262.656 ±   43.751  ops/ms
>> MessageDigests.getAndDigestSHA-1  1024 15   
>> 104.889 ±3.554  ops/ms
>> MessageDigests.getAndDigest  SHA-25616 15   
>> 914.541 ±   55.621  ops/ms
>> MessageDigests.getAndDigest  SHA-256  1024 15
>> 85.708 ±1.394  ops/ms
>> MessageDigests.getAndDigest  SHA-51216 15   
>> 737.719 ±   53.671  ops/ms
>> MessageDigests.getAndDigest  SHA-512  1024 15   
>> 112.307 ±1.950  ops/ms
>> 
>> GC:
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm  MD516 15   
>> 312.011 ±0.005B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15   
>> 584.020 ±0.006B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-25616 15   
>> 544.019 ±0.016B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-51216 15  
>> 1056.037 ±0.003B/op
>> 
>> Target:
>> Benchmark (digesterName)  (length)Cnt
>>  Score  Error   Units
>> MessageDigests.digestMD516 15  
>> 3134.462 ±   43.685  ops/ms
>> MessageDigests.digestMD5  1024 15   
>> 323.667 ±0.633  ops/ms
>> MessageDigests.digest  SHA-116 15  
>> 1418.742 ±   38.223  ops/ms
>> MessageDigests.digest  SHA-1  1024 15   
>> 110.178 ±0.788  ops/ms
>> MessageDigests.digestSHA-25616 15  
>> 1037.949 ±   21.214  ops/ms
>> MessageDigests.digestSHA-256  1024 15
>> 89.671 ±0.228  ops/ms
>> MessageDigests.digestSHA-51216 15   
>> 812.028 ±   39.489  ops/ms
>> MessageDigests.digestSHA-512  1024 15   
>> 116.738 ±0.249  ops/ms
>> MessageDigests.getAndDigest  MD516 15  
>> 2314.379 ±  229.294  ops/ms
>> MessageDigests.getAndDigest  MD5  1024 15   
>> 307.835 ±5.730  ops/ms
>> MessageDigests.getAndDigestSHA-116 15  
>> 1326.887 ±   63.263  ops/ms
>> MessageDigests.getAndDigestSHA-1  1024 15   
>> 106.611 ±2.292  ops/ms
>> MessageDigests.getAndDigest  SHA-25616 15   
>> 961.589 ±   82.052  ops/ms
>> MessageDigests.getAndDigest  SHA-256  1024 15
>> 88.646 ±0.194  ops/ms
>> MessageDigests.getAndDigest  SHA-51216 15   
>> 775.417 ±   56.775  ops/ms
>> MessageDigests.getAndDigest  SHA-512  1024 15   
>> 112.904 ±2.014  ops/ms
>> 

Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v4]

2021-01-18 Thread Claes Redestad
> - The MD5 intrinsics added by 
> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that 
> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics 
> from which the MD5 intrinsic takes inspiration
> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to 
> make it acceptable to use inline and replace the array in MD5 wholesale. This 
> improves performance both in the presence and the absence of the intrinsic 
> optimization.
> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element 
> arrays), but allocating the array lazily gets most of the speed-up in the 
> presence of an intrinsic while being neutral in its absence.
> 
> Baseline:
>   (digesterName)  (length)Cnt Score  
> Error   Units
> MessageDigests.digestMD516 15  
> 2714.307 ±   21.133  ops/ms
> MessageDigests.digestMD5  1024 15   
> 318.087 ±0.637  ops/ms
> MessageDigests.digest  SHA-116 15  
> 1387.266 ±   40.932  ops/ms
> MessageDigests.digest  SHA-1  1024 15   
> 109.273 ±0.149  ops/ms
> MessageDigests.digestSHA-25616 15   
> 995.566 ±   21.186  ops/ms
> MessageDigests.digestSHA-256  1024 15
> 89.104 ±0.079  ops/ms
> MessageDigests.digestSHA-51216 15   
> 803.030 ±   15.722  ops/ms
> MessageDigests.digestSHA-512  1024 15   
> 115.611 ±0.234  ops/ms
> MessageDigests.getAndDigest  MD516 15  
> 2190.367 ±   97.037  ops/ms
> MessageDigests.getAndDigest  MD5  1024 15   
> 302.903 ±1.809  ops/ms
> MessageDigests.getAndDigestSHA-116 15  
> 1262.656 ±   43.751  ops/ms
> MessageDigests.getAndDigestSHA-1  1024 15   
> 104.889 ±3.554  ops/ms
> MessageDigests.getAndDigest  SHA-25616 15   
> 914.541 ±   55.621  ops/ms
> MessageDigests.getAndDigest  SHA-256  1024 15
> 85.708 ±1.394  ops/ms
> MessageDigests.getAndDigest  SHA-51216 15   
> 737.719 ±   53.671  ops/ms
> MessageDigests.getAndDigest  SHA-512  1024 15   
> 112.307 ±1.950  ops/ms
> 
> GC:
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  MD516 15   
> 312.011 ±0.005B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15   
> 584.020 ±0.006B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-25616 15   
> 544.019 ±0.016B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-51216 15  
> 1056.037 ±0.003B/op
> 
> Target:
> Benchmark (digesterName)  (length)Cnt 
> Score  Error   Units
> MessageDigests.digestMD516 15  
> 3134.462 ±   43.685  ops/ms
> MessageDigests.digestMD5  1024 15   
> 323.667 ±0.633  ops/ms
> MessageDigests.digest  SHA-116 15  
> 1418.742 ±   38.223  ops/ms
> MessageDigests.digest  SHA-1  1024 15   
> 110.178 ±0.788  ops/ms
> MessageDigests.digestSHA-25616 15  
> 1037.949 ±   21.214  ops/ms
> MessageDigests.digestSHA-256  1024 15
> 89.671 ±0.228  ops/ms
> MessageDigests.digestSHA-51216 15   
> 812.028 ±   39.489  ops/ms
> MessageDigests.digestSHA-512  1024 15   
> 116.738 ±0.249  ops/ms
> MessageDigests.getAndDigest  MD516 15  
> 2314.379 ±  229.294  ops/ms
> MessageDigests.getAndDigest  MD5  1024 15   
> 307.835 ±5.730  ops/ms
> MessageDigests.getAndDigestSHA-116 15  
> 1326.887 ±   63.263  ops/ms
> MessageDigests.getAndDigestSHA-1  1024 15   
> 106.611 ±2.292  ops/ms
> MessageDigests.getAndDigest  SHA-25616 15   
> 961.589 ±   82.052  ops/ms
> MessageDigests.getAndDigest  SHA-256  1024 15
> 88.646 ±0.194  ops/ms
> MessageDigests.getAndDigest  SHA-51216 15   
> 775.417 ±   56.775  ops/ms
> MessageDigests.getAndDigest  SHA-512  1024 15   
> 112.904 ±2.014  ops/ms
> 
> GC
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  MD516 15   
> 232.009 ±0.006B/op
>