Re: RFR: 8259065: Optimize MessageDigest.getInstance

Claes Redestad Tue, 05 Jan 2021 17:27:48 -0800

On Wed, 6 Jan 2021 01:05:35 GMT, Claes Redestad <[email protected]> wrote:


>> By caching default constructors used in 
>> `java.security.Provider::newInstanceUtil` in a `ClassValue`, we can reduce 
>> the overhead of allocating instances in a variety of places, e.g., 
>> `MessageDigest::getInstance`, without compromising thread-safety or security.
>> 
>> On the provided microbenchmark `MessageDigest.getInstance(digesterName)` 
>> improves substantially for any `digesterName` - around -90ns/op and -120B/op:
>> Benchmark                                                                  
>> (digesterName)  Mode  Cnt     Score     Error   Units
>> GetMessageDigest.getInstance                                                 
>>          md5  avgt   30   293.929 ±  11.294   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>          md5  avgt   30   424.028 ±   0.003    B/op
>> GetMessageDigest.getInstance                                                 
>>        SHA-1  avgt   30   322.928 ±  16.503   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>        SHA-1  avgt   30   688.039 ±   0.003    B/op
>> GetMessageDigest.getInstance                                                 
>>      SHA-256  avgt   30   338.140 ±  13.902   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>      SHA-256  avgt   30   640.037 ±   0.002    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>          md5  avgt   30   312.066 ±  12.805   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>          md5  avgt   30   424.029 ±   0.003    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>        SHA-1  avgt   30   345.777 ±  16.669   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>        SHA-1  avgt   30   688.040 ±   0.003    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>      SHA-256  avgt   30   371.134 ±  18.485   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>      SHA-256  avgt   30   640.039 ±   0.004    B/op
>> Patch:
>> Benchmark                                                                  
>> (digesterName)  Mode  Cnt     Score     Error   Units
>> GetMessageDigest.getInstance                                                 
>>          md5  avgt   30   210.629 ±   6.598   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>          md5  avgt   30   304.021 ±   0.002    B/op
>> GetMessageDigest.getInstance                                                 
>>        SHA-1  avgt   30   229.161 ±   8.158   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>        SHA-1  avgt   30   568.030 ±   0.002    B/op
>> GetMessageDigest.getInstance                                                 
>>      SHA-256  avgt   30   260.013 ±  15.032   ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm                             
>>      SHA-256  avgt   30   520.030 ±   0.002    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>          md5  avgt   30   231.928 ±  10.455   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>          md5  avgt   30   304.020 ±   0.002    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>        SHA-1  avgt   30   247.178 ±  11.209   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>        SHA-1  avgt   30   568.029 ±   0.002    B/op
>> GetMessageDigest.getInstanceWithProvider                                     
>>      SHA-256  avgt   30   265.625 ±  10.465   ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm                 
>>      SHA-256  avgt   30   520.030 ±   0.003    B/op
>> 
>> See: 
>> https://cl4es.github.io/2021/01/04/Investigating-MD5-Overheads.html#reflection-overheads
>>  for context.
>
> I refactored and optimized the lookup code further, getting rid of a number 
> of bottlenecks:
> 
> - Cache Constructors in Provider.Service instead of via a ClassValue.
> - Also cache the impl Class, wrap Class and Constructor in WeakReference if 
> not loaded by the null classloader (many builtins will be)
> - Cache EngineDescription in Service, avoiding a lookup on the hot path
> - We were hitting a synchronized method in ProviderConfig.getProvider(). The 
> provider field is volatile already, so I used the double-check idiom here to 
> avoid synchronization on the hot path
> - ServiceKey.hashCode using Objects.hash was cause for allocation, simplified 
> and optimized it.
> 
> Benchmark                                                      (digesterName) 
>  Mode  Cnt     Score    Error   Units
> GetMessageDigest.getInstance                                              MD5 
>  avgt   30   143.803 ±  5.431   ns/op
> GetMessageDigest.getInstance:·gc.alloc.rate.norm                          MD5 
>  avgt   30   280.015 ±  0.001    B/op

Since much of the cost is now the creation of the MessageDigest itself, I added 
a microbenchmark to stat this overhead:

Benchmark                                                        (digesterName) 
 Mode  Cnt     Score     Error   Units
GetMessageDigest.cloneInstance                                              MD5 
 avgt   30   124.922 ±   5.412   ns/op
GetMessageDigest.cloneInstance:·gc.alloc.rate.norm                          MD5 
 avgt   30   280.015 ±   0.001    B/op

That means there's no added allocation overhead of calling 
`MessageDigest.getInstance(digesterName)` compared to cloning an existing 
instance - which means we get almost all of the benefits without resorting to 
tricks as caching and cloning an instance at call sites such as the one in 
`UUID::nameUUIDFromBytes`. The remaining 20ns/op difference should be 
negligible.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1933

Re: RFR: 8259065: Optimize MessageDigest.getInstance

Reply via email to