Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Peter Levart Mon, 04 Dec 2017 07:53:25 -0800

Hi Ivan,

On 11/23/2017 01:16 AM, Ivan Gerasimov wrote:

Hi Peter!
Thank you very much for looking into this!


On 11/22/17 1:45 AM, Peter Levart wrote:
Hi Ivan,

Here's my attempt to increase multithreaded scalability of Cache:
http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.01/
Haven't tested this yet, but I thought that since you already haverelevant performance tests, you might want to try this, so I decidedto release it as is.
I plugged your implementation into the benchmark I had [*] and gotthese numbers for the throughput:
Original MemoryCache: 4,116,537 puts/second
Your variant:  2,942,561 puts/second

So, in the tested scenario performance downgraded by 28%.

Hm. What kind of benchmark did you have? Can you share the source? Wasthis multi-threaded concurrent puts or single-threaded? I'm surprisedabout the results since I did some benchmarking myself and got entirelydifferent results. Here's what I did:


http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/bench/

You can download the whole directory and just setup the JMH project overit. The main class is CacheBench. It uses a typical caching scenario:


    @Threads(4)
    @Benchmark
    public Object original() {
        Integer key = ThreadLocalRandom.current().nextInt(keyspan);
        Object val = origCache.get(key);
        if (val == null) {
            origCache.put(key, val = Boolean.TRUE);
        }
        return val;
    }

The cache hit-ratio is varied by adjusting cache size and/or the rangeof distinct key values produced by RNG. With this setup I get thefollowing results:

Benchmark (hitratio) (size) (timeout) Mode Cnt Score Error Units

CacheBench.original 0 1000 86400 avgt 10 18023,156Â± 434,710 ns/opCacheBench.original 5 1000 86400 avgt 10 17645,688Â± 767,696 ns/opCacheBench.original 50 1000 86400 avgt 10 10412,352Â± 216,208 ns/opCacheBench.original 95 1000 86400 avgt 10 2275,257Â± 449,302 ns/opCacheBench.original 100 1000 86400 avgt 10 542,321Â± 15,133 ns/op

CacheBench.patched1 0 1000 86400 avgt 10 1685,499Â± 24,257 ns/opCacheBench.patched1 5 1000 86400 avgt 10 1633,203Â± 15,045 ns/opCacheBench.patched1 50 1000 86400 avgt 10 1061,596Â± 12,312 ns/opCacheBench.patched1 95 1000 86400 avgt 10 125,456Â± 3,426 ns/opCacheBench.patched1 100 1000 86400 avgt 10 47,166Â± 3,140 ns/op

patched1 is mostly what I had in my proposed code but with a fix input() method which could install an entry into the cacheMap after someother thread already removed it from the DL-list. The race is prevendedby 1st installing a unique reservation value into the cacheMap, thenadding the entry to the DL-list and afterwards trying to replace thereservation value with the entry and cleaning up if not successful.

If this is not enough, I also tried an approach with lock-free DL-list.Basically a modified ConcurrentLinkedDequeue where intermediary nodesmay be removed in constant time (important for fast removal of elementsthat become soft-reachable). This variant is even more scalable thanlock-based DL-list:

CacheBench.patched2 0 1000 86400 avgt 10 774,356Â± 4,113 ns/opCacheBench.patched2 5 1000 86400 avgt 10 754,222Â± 2,956 ns/opCacheBench.patched2 50 1000 86400 avgt 10 434,969Â± 2,359 ns/opCacheBench.patched2 95 1000 86400 avgt 10 98,352Â± 0,750 ns/opCacheBench.patched2 100 1000 86400 avgt 10 47,692Â± 0,995 ns/op


Here's what such MemoryCache looks like in the JDK:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.02/

I took the ConcurrentLinkedDequeue by Doug Lea and Martin Buchholz andremoved all public methods keeping just the delicate internal mechanics.The Node is an interface and can be implemented by different classes.BasicNode and SoftReferenceNode are provided (and are subclassed inMemoryCache). Public methods are tailored to Cache use case and dealwith Node<E>(s) rather than E(lement)s.

Still, I think it's important to try to improve performance of theMemoryCache and, if possible, remove the points of contention.


So what do you think of the benchmark results?

Rough sketch of changes:
- replaced LinkedHashMap with ConcurrentHashMap
- CacheEntry(s) are additionaly linked into a double-linked list.DL-list management is the only synchronization point (similar toCleaner API) and the rule is: 1st new entry is linked into DL-list,then it is put into map - published. The same with removing: 1st anentry is unlinked from DL-list, then if successfull, removed from mapand invalidated.- changed the way expiry is handled so that the whole list is neverneeded to be scanned.
The code speaks for itself.
Your implementation is somewhat similar to what I had tried beforecoming up with proposal of the option to turn the cache off.I used ConcurrentHashMap + ConcurrentLinkedQueue to maintain the FIFOorder, still the performance was a few percent less then the one ofthe original implementation.

Removing an intermediary element from ConcurrentLinkedQueue takes O(n)time. This is needed in emptyQueue() for all sofly-reachable-and-clearedentries and for remove() when application removes entry explicitly fromcache (for example on SSL session termination). We need double-linkedlist (like ConcurrentLinkedDequeue) with constant-time removal ofintermediary node(s).

That's really why I decided to split the issue:
- first, provide an option to turn the cache off (that should beeasily backported and can provide immediate relief to the customersthat experience the scalability bottleneck;
- second, continue work on the cache implementation improvements.

Maybe with a scalable Cache such switch will not be needed. I canimagine that such switch provides relief for kinds of loads that mayhappen in very special circumstances (like DOS attacks for example thatdon't re-use SSL sessions), but in normal operation such switch woulddegrade performance wouldn't it?

The problem I see is that if you provide such switch now, users may setit and later when they upgrade to newer JDK with scalable cache, don'trealize that they don't need it any more and consequently unnecessarilykeep paying for it in the future...


Regards, Peter

Let me know if you find it usefull and/or it solves the scalabilitybottleneck.
Yes, I think it's very useful!
However, as I wrote above, I think that the issue needs be split intotwo parts: an option to turn the caching off (which can be easilybackported) and improving the cache implementation (which can evenrelax the requirements, as the FIFO order or absolutely hard upperbound of the cache size).
With kind regards,
Ivan

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Reply via email to