I forgot to mention one other workaround for this bug, as used in Guava is cloning the instance if clone() is supported, see https://github.com/google/guava/issues/1197
On Friday, August 7, 2015, David Schlosnagle <[email protected]> wrote: > // reviving this ghost > > Hi Valerie, Sean, and sec-dev, > > I am curious if there has been any movement on incorporating a fix for > https://bugs.openjdk.java.net/browse/JDK-7092821? I have encountered > several systems where this is a significant contention and > scale bottleneck. While there are some workarounds such as pooling and > reusing Provider instances, that seems like a band-aid, and fixing the JDK > is a better path. > > If I were to put a webrev together, would someone be kind enough to sponsor > it for me? Are there any existing open source load tests to verify these > changes? Is there a good mechanism in OpenJDK now to run JMH across all of > the supported platforms (I only have access to a small subset of the > permutation of platforms). > > Thanks! > > - Dave > > On Thursday, January 12, 2012, Valerie (Yu-Ching) Peng < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Dave, >> Thanks for the comments. >> Let me think about it some more and see how to better address this kind >> of racing issue. >> >> If you have concerns on fixes for 7033170 and 7092825, please let me know. >> >> Sean, >> Can you please ignore the review request for 7092821 for now. I'll send >> out an updated version later. >> If you can still review the remaining two, that'd be great. >> >> Thanks, >> Valerie >> >> On 01/11/12 21:48, David Schlosnagle wrote: >> >>> On Wed, Jan 11, 2012 at 8:04 PM, Valerie (Yu-Ching) Peng >>> <[email protected]> wrote: >>> >>>> 7092821: java.security.Provider.getService() is synchronized and became >>>> scalability bottleneck. >>>> jdk7u Webrev: http://cr.openjdk.java.net/~valeriep/7092821_7u/ >>>> jdk8 Webrev: http://cr.openjdk.java.net/~valeriep/7092821/ >>>> >>> Valerie, >>> >>> You might already be aware of this, but there is a data race on lines >>> 685 - 686 of the Provider's getService(String, String) method. If >>> there are concurrent callers to getService while lookupCache == null, >>> the lookupCache may be overwritten by a new ConcurrentHashMap after >>> another thread has just instantiated and populated an entry in the >>> cache leading to thrashing on lookupCache. It might be worthwhile to >>> either use a double checked lock to avoid the race at the expense of >>> an additional lock and volatile read in the case lookupCache == null >>> or add a comment indicating that this is an accepted data race. >>> >>> There is also now the possibility that if one thread is executing >>> getService while another thread invokes one of the methods that sets >>> lookupCache = null, there could then be a NullPointerException on line >>> 703 as the volatile read would now see a null and fail. You could >>> prevent that by either moving the putIfAbsent under the lock (not >>> ideal for performance if you're already seeing contention), or just >>> maintain a local reference to the cache map and use it throughout the >>> getService method. This would mean that if the lookupCache reference >>> was concurrently mutated, the putIfAbsent would basically be a write >>> to the local map reference which is now garbage, but shouldn't really >>> hurt anything. >>> >>> I'd propose something along the lines of the following to address this: >>> >>> public Service getService(String type, String algorithm) { >>> ServiceKey key = new ServiceKey(type, algorithm); >>> ConcurrentMap<ServiceKey,Service> localLookupCache = >>> getLookupCache(); >>> Service result = localLookupCache.get(key); >>> if (result != null) { >>> return (result == NULL_MARK? null : result); >>> } >>> synchronized (this) { >>> checkInitialized(); >>> if (serviceMap != null) { >>> result = serviceMap.get(key); >>> } >>> if (result == null) { >>> ensureLegacyParsed(); >>> result = (legacyMap != null) ? legacyMap.get(key) : >>> null; >>> } >>> } >>> // under concurrent mutation of lookupCache, this will write >>> to map that is no >>> // longer the active cache >>> localLookupCache.putIfAbsent(key, (result == null? NULL_MARK : >>> result)); >>> return result; >>> } >>> >>> private ConcurrentMap<ServiceKey,Service> getLookupCache() { >>> if (lookupCache == null) { >>> synchronized (this) { >>> // must fall back on double checked lock >>> if (lookupCache == null) { >>> lookupCache = new ConcurrentHashMap<>(); >>> } >>> } >>> } >>> return lookupCache; >>> } >>> >>> >>> - Dave >>> >>> >>> For reference, here were the original changes: >>> 429 // Cache for service lookups. Discard whenever services are >>> changed. >>> 430 private transient volatile ConcurrentMap<ServiceKey,Service> >>> lookupCache; >>> ...snip... >>> 682 public Service getService(String type, String algorithm) { >>> 683 ServiceKey key = new ServiceKey(type, algorithm); >>> 684 Service result = null; >>> 685 if (lookupCache == null) { >>> 686 lookupCache = new ConcurrentHashMap<>(); >>> 687 } else { >>> 688 result = lookupCache.get(key); >>> 689 if (result != null) { >>> 690 return (result == NULL_MARK? null : result); >>> 691 } >>> 692 } >>> 693 synchronized (this) { >>> 694 checkInitialized(); >>> 695 if (serviceMap != null) { >>> 696 result = serviceMap.get(key); >>> 697 } >>> 698 if (result == null) { >>> 699 ensureLegacyParsed(); >>> 700 result = (legacyMap != null) ? legacyMap.get(key) >>> : null; >>> 701 } >>> 702 } >>> 703 lookupCache.putIfAbsent(key, (result == null? NULL_MARK : >>> result)); >>> 704 return result; >>> 705 } >>> >> >>
