Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. Thanks for reviewing! - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. Marked as reviewed by xuelei (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 16:04:22 GMT, Anthony Scarpino wrote: > It also shows that more caching probably would help further. +1. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 17:22:57 GMT, Daniel Jeliński wrote: >>> With all the above in mind I decided not to use `sun.security.util.Cache` >>> here >> >> I was not meant to use Cache and timeout for this update. >> >> SoftReference and this patch should work good in larger memory boxes. >> Performance improvement sometimes is a trade-off game between memory and >> CPU. Did you have a chance to check the performance in the limited memory >> circumstance? like '-mx64M". > > I run a few tests: > `-Xmx16m`, with this patch, still better than the original version: > > Benchmark (resume) (tlsVersion) Mode Cnt Score > Error Units > SSLHandshake.doHandshake true TLSv1.2 thrpt5 4477.444 ± > 375.975 ops/s > SSLHandshake.doHandshake true TLS thrpt5 681.471 ± > 72.531 ops/s > SSLHandshake.doHandshake false TLSv1.2 thrpt5 335.366 ± > 89.056 ops/s > SSLHandshake.doHandshake false TLS thrpt5 308.711 ± > 90.134 ops/s > > > `-Xmx8m`, before patch: > > Benchmark (resume) (tlsVersion) Mode CntScore > Error Units > SSLHandshake.doHandshake true TLSv1.2 thrpt5 153.760 ± > 12.025 ops/s > SSLHandshake.doHandshake true TLS thrpt5 70.700 ± > 7.506 ops/s > SSLHandshake.doHandshake false TLSv1.2 thrpt5 67.459 ± > 4.325 ops/s > SSLHandshake.doHandshake false TLS thrpt5 64.695 ± > 1.647 ops/s > > after: > > Benchmark (resume) (tlsVersion) Mode CntScore > Error Units > SSLHandshake.doHandshake true TLSv1.2 thrpt5 557.324 ± > 50.269 ops/s > SSLHandshake.doHandshake true TLS thrpt5 155.258 ± > 13.866 ops/s > SSLHandshake.doHandshake false TLSv1.2 thrpt5 181.755 ± > 29.319 ops/s > SSLHandshake.doHandshake false TLS thrpt5 120.627 ± > 25.832 ops/s > > Much slower, but still faster with the patch. > With `-Xmx4m` the test failed with OOM. Thanks for the additional testing. The improvement is impressive. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 14:33:13 GMT, Xue-Lei Andrew Fan wrote: >> `SoftReference`s are guaranteed to survive one GC after use; beyond that >> their lifespan is determined by `SoftRefLRUPolicyMSPerMB` and the amount of >> memory available. > >> With all the above in mind I decided not to use `sun.security.util.Cache` >> here > > I was not meant to use Cache and timeout for this update. > > SoftReference and this patch should work good in larger memory boxes. > Performance improvement sometimes is a trade-off game between memory and CPU. > Did you have a chance to check the performance in the limited memory > circumstance? like '-mx64M". I run a few tests: `-Xmx16m`, with this patch, still better than the original version: Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 4477.444 ± 375.975 ops/s SSLHandshake.doHandshake true TLS thrpt5 681.471 ± 72.531 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5 335.366 ± 89.056 ops/s SSLHandshake.doHandshake false TLS thrpt5 308.711 ± 90.134 ops/s `-Xmx8m`, before patch: Benchmark (resume) (tlsVersion) Mode CntScoreError Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 153.760 ± 12.025 ops/s SSLHandshake.doHandshake true TLS thrpt5 70.700 ± 7.506 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5 67.459 ± 4.325 ops/s SSLHandshake.doHandshake false TLS thrpt5 64.695 ± 1.647 ops/s after: Benchmark (resume) (tlsVersion) Mode CntScoreError Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 557.324 ± 50.269 ops/s SSLHandshake.doHandshake true TLS thrpt5 155.258 ± 13.866 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5 181.755 ± 29.319 ops/s SSLHandshake.doHandshake false TLS thrpt5 120.627 ± 25.832 ops/s Much slower, but still faster with the patch. With `-Xmx4m` the test failed with OOM. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. Another nice performance boost in this area. Looks good to me. - Marked as reviewed by coffeys (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. I'm good with this idea. It's not caching the whole list like I struggled with, but it's a good solution for the algorithm name based constraints. It also shows that more caching probably would help further. I'll let Xuelei finish his review comments - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 13:59:44 GMT, Daniel Jeliński wrote: >> FWIW: I wouldn't expect `SoftReference` (as opposed to `WeakReference`) to >> be eagerly cleaned. > > `SoftReference`s are guaranteed to survive one GC after use; beyond that > their lifespan is determined by `SoftRefLRUPolicyMSPerMB` and the amount of > memory available. > With all the above in mind I decided not to use `sun.security.util.Cache` here I was not meant to use Cache and timeout for this update. SoftReference and this patch should work good in larger memory boxes. Performance improvement sometimes is a trade-off game between memory and CPU. Did you have a chance to check the performance in the limited memory circumstance? like '-mx64M". - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 13:22:34 GMT, Daniel Fuchs wrote: >> Right, soft references are likely to be cleaned if they are not used in an >> entire GC cycle. >> Using a soft reference for each map entry would not help here; note that all >> keys and all values in this map are GC roots (keys are enum names, values >> are `TRUE` and `FALSE`), so in practice they would never be collected. >> Even if we made the entries collectable, it would not help with the TLS >> case; SSLEngine & SSLSocket only check the constraints once at the beginning >> of the handshake, so they would all expire at the same time anyway. What's >> even worse, we would then have to clear the expired entries from the map. >> >> I also considered limiting the size of the map. That's a tricky subject; we >> would need to get the size limit exactly right. Given that the algorithms >> are always checked in the same order, if the cache were too small, we would >> get zero hits. >> With all the above in mind I decided not to use `sun.security.util.Cache` >> here. > > FWIW: I wouldn't expect `SoftReference` (as opposed to `WeakReference`) to be > eagerly cleaned. `SoftReference`s are guaranteed to survive one GC after use; beyond that their lifespan is determined by `SoftRefLRUPolicyMSPerMB` and the amount of memory available. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Mon, 25 Apr 2022 06:46:20 GMT, Daniel Jeliński wrote: >> src/java.base/share/classes/sun/security/util/DisabledAlgorithmConstraints.java >> line 105: >> >>> 103: private final Set disabledAlgorithms; >>> 104: private final Constraints algorithmConstraints; >>> 105: private volatile SoftReference> cacheRef = >> >> It looks like a one-clear-for-all mechanism. Once the clearing happens, the >> full map will be collected. For SoftReferences, it clears them fairly >> eagerly. Maybe, the performance could be further improved in practice by >> using soft reference for each map entry (See sun.security.util.Cache code >> for an example). >> >> I will have another look next week. > > Right, soft references are likely to be cleaned if they are not used in an > entire GC cycle. > Using a soft reference for each map entry would not help here; note that all > keys and all values in this map are GC roots (keys are enum names, values are > `TRUE` and `FALSE`), so in practice they would never be collected. > Even if we made the entries collectable, it would not help with the TLS case; > SSLEngine & SSLSocket only check the constraints once at the beginning of the > handshake, so they would all expire at the same time anyway. What's even > worse, we would then have to clear the expired entries from the map. > > I also considered limiting the size of the map. That's a tricky subject; we > would need to get the size limit exactly right. Given that the algorithms are > always checked in the same order, if the cache were too small, we would get > zero hits. > With all the above in mind I decided not to use `sun.security.util.Cache` > here. FWIW: I wouldn't expect `SoftReference` (as opposed to `WeakReference`) to be eagerly cleaned. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Sat, 23 Apr 2022 14:57:01 GMT, Xue-Lei Andrew Fan wrote: >> Profiling the TLS handshakes using SSLHandshake benchmark shows that a large >> portion of time is spent in HandshakeContext initialization, specifically in >> DisabledAlgorithmConstraints class. >> >> There are only a few instances of that class, and they are immutable. >> Caching the results should be a low-risk operation. >> >> The cache is implemented as a softly reachable ConcurrentHashMap; this way >> it can be removed from memory after a period of inactivity. Under normal >> circumstances the cache holds no more than 100 algorithms. > > src/java.base/share/classes/sun/security/util/DisabledAlgorithmConstraints.java > line 105: > >> 103: private final Set disabledAlgorithms; >> 104: private final Constraints algorithmConstraints; >> 105: private volatile SoftReference> cacheRef = > > It looks like a one-clear-for-all mechanism. Once the clearing happens, the > full map will be collected. For SoftReferences, it clears them fairly > eagerly. Maybe, the performance could be further improved in practice by > using soft reference for each map entry (See sun.security.util.Cache code for > an example). > > I will have another look next week. Right, soft references are likely to be cleaned if they are not used in an entire GC cycle. Using a soft reference for each map entry would not help here; note that all keys and all values in this map are GC roots (keys are enum names, values are `TRUE` and `FALSE`), so in practice they would never be collected. Even if we made the entries collectable, it would not help with the TLS case; SSLEngine & SSLSocket only check the constraints once at the beginning of the handshake, so they would all expire at the same time anyway. What's even worse, we would then have to clear the expired entries from the map. I also considered limiting the size of the map. That's a tricky subject; we would need to get the size limit exactly right. Given that the algorithms are always checked in the same order, if the cache were too small, we would get zero hits. With all the above in mind I decided not to use `sun.security.util.Cache` here. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. src/java.base/share/classes/sun/security/util/DisabledAlgorithmConstraints.java line 105: > 103: private final Set disabledAlgorithms; > 104: private final Constraints algorithmConstraints; > 105: private volatile SoftReference> cacheRef = It looks like a one-clear-for-all mechanism. Once the clearing happens, the full map will be collected. For SoftReferences, it clears them fairly eagerly. Maybe, the performance could be further improved in practice by using soft reference for each map entry (See sun.security.util.Cache code for an example). I will have another look next week. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Fri, 22 Apr 2022 02:13:14 GMT, David Schlosnagle wrote: >> Profiling the TLS handshakes using SSLHandshake benchmark shows that a large >> portion of time is spent in HandshakeContext initialization, specifically in >> DisabledAlgorithmConstraints class. >> >> There are only a few instances of that class, and they are immutable. >> Caching the results should be a low-risk operation. >> >> The cache is implemented as a softly reachable ConcurrentHashMap; this way >> it can be removed from memory after a period of inactivity. Under normal >> circumstances the cache holds no more than 100 algorithms. > > src/java.base/share/classes/sun/security/util/DisabledAlgorithmConstraints.java > line 969: > >> 967: result = checkAlgorithm(disabledAlgorithms, algorithm, >> decomposer); >> 968: cache.put(algorithm, result); >> 969: return result; > > Would it be worth using `cache.computeIfAbsent` or do you want to avoid > lambda allocation overhead and potentially blocking concurrent handshakes on > writer thread? > > Suggestion: > > return cache.computeIfAbsent(algorithm, algo -> > checkAlgorithm(disabledAlgorithms, algo, decomposer)); I generally prefer using `get` over `computeIfAbsent` when optimizing for performance. While `computeIfAbsent` is more concise, it's also much slower than `get` when the entry is already present. - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. src/java.base/share/classes/sun/security/util/DisabledAlgorithmConstraints.java line 969: > 967: result = checkAlgorithm(disabledAlgorithms, algorithm, > decomposer); > 968: cache.put(algorithm, result); > 969: return result; Would it be worth using `cache.computeIfAbsent` or do you want to avoid lambda allocation overhead and potentially blocking concurrent handshakes on writer thread? Suggestion: return cache.computeIfAbsent(algorithm, algo -> checkAlgorithm(disabledAlgorithms, algo, decomposer)); - PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
Nice. On 22/04/2022 6:47 am, Daniel Jeliński wrote: On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: Profiling the TLS handshakes using SSLHandshake benchmark shows that a large portion of time is spent in HandshakeContext initialization, specifically in DisabledAlgorithmConstraints class. There are only a few instances of that class, and they are immutable. Caching the results should be a low-risk operation. The cache is implemented as a softly reachable ConcurrentHashMap; this way it can be removed from memory after a period of inactivity. Under normal circumstances the cache holds no more than 100 algorithms. before: Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 2165.081 ± 440.204 ops/s SSLHandshake.doHandshake true TLS thrpt5 534.681 ± 146.931 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5 369.104 ± 11.143 ops/s SSLHandshake.doHandshake false TLS thrpt5 253.903 ± 58.056 ops/s after: Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 10440.501 ± 478.177 ops/s SSLHandshake.doHandshake true TLS thrpt5762.995 ± 33.842 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5440.471 ± 52.867 ops/s SSLHandshake.doHandshake false TLS thrpt5305.928 ± 57.847 ops/s After this patch the HandshakeContext initialization practically disappears from the CPU profile; it only takes ~5% in TLS1.2 resumption, and much less in the remaining scenarios. - PR: https://git.openjdk.java.net/jdk/pull/8349 -- Regards, Peter Firmstone 0498 286 363 Zeus Project Services Pty Ltd.
RFR: 8285398: Cache the results of constraint checks
Profiling the TLS handshakes using SSLHandshake benchmark shows that a large portion of time is spent in HandshakeContext initialization, specifically in DisabledAlgorithmConstraints class. There are only a few instances of that class, and they are immutable. Caching the results should be a low-risk operation. The cache is implemented as a softly reachable ConcurrentHashMap; this way it can be removed from memory after a period of inactivity. Under normal circumstances the cache holds no more than 100 algorithms. - Commit messages: - Implement constraint cache Changes: https://git.openjdk.java.net/jdk/pull/8349/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8349=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8285398 Stats: 26 lines in 1 file changed: 23 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/8349.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8349/head:pull/8349 PR: https://git.openjdk.java.net/jdk/pull/8349
Re: RFR: 8285398: Cache the results of constraint checks
On Thu, 21 Apr 2022 19:58:39 GMT, Daniel Jeliński wrote: > Profiling the TLS handshakes using SSLHandshake benchmark shows that a large > portion of time is spent in HandshakeContext initialization, specifically in > DisabledAlgorithmConstraints class. > > There are only a few instances of that class, and they are immutable. Caching > the results should be a low-risk operation. > > The cache is implemented as a softly reachable ConcurrentHashMap; this way it > can be removed from memory after a period of inactivity. Under normal > circumstances the cache holds no more than 100 algorithms. before: Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 2165.081 ± 440.204 ops/s SSLHandshake.doHandshake true TLS thrpt5 534.681 ± 146.931 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5 369.104 ± 11.143 ops/s SSLHandshake.doHandshake false TLS thrpt5 253.903 ± 58.056 ops/s after: Benchmark (resume) (tlsVersion) Mode Cnt Score Error Units SSLHandshake.doHandshake true TLSv1.2 thrpt5 10440.501 ± 478.177 ops/s SSLHandshake.doHandshake true TLS thrpt5762.995 ± 33.842 ops/s SSLHandshake.doHandshake false TLSv1.2 thrpt5440.471 ± 52.867 ops/s SSLHandshake.doHandshake false TLS thrpt5305.928 ± 57.847 ops/s After this patch the HandshakeContext initialization practically disappears from the CPU profile; it only takes ~5% in TLS1.2 resumption, and much less in the remaining scenarios. - PR: https://git.openjdk.java.net/jdk/pull/8349