Hi Eirik,
Thanks for thinking about this issue. First, I agree that some changes
are probably useful, but we probably need to discuss this more, and
maybe try to address the more pressing issues first. Some of my thoughts:
- I don't think the cache should be removed. I think it can improve
performance (especially for certs), and is not an unreasonable thing for
the JDK to try to help with.
- I do think there should be a *supported* way to turn each of the
caches off individually. If an application has their own caching
mechanism, then they should be able to use it instead of the JDKs.
- We have evidence from at least 2 customers that the CRL cache is the
more problematic issue. I don't remember the history, but I think the
"750" size may have been chosen arbitrarily to match the certificate
cache. As a short-term fix, I would probably be ok with lowering this
default value to something more reasonable, but I am not sure what that
should be offhand.
- I think knobs to control the cache properties sound like they would be
useful, but I also think many users would not know what to use as
reasonable values, and may end up just turning off the cache and seeing
if that helps.
- I'm a little surprised the SoftReferences still caused so many entries
to be retained. A comment (which may be out of date) in
sun.security.util.Cache.java says that they may be purged too eagerly,
which would seem to have helped the issues that the customers reported.
Some testing may be in order to see what the current behavior is with
different VM settings.
--Sean
On 4/24/23 2:32 AM, Eirik Bjørsnøs wrote:
Hi,
When reaching out to the BouncyCastle community regarding the deprecated
javax.security.cert APIs, I got some interesting feedback from Matti
Aarnio of Methics Ltd:
Long ago we did encounter problems with JRE's X509CertImpl.java
class, and more so with X509CRLImpl.java. Both have internal
caches of binary form input that is then converted to parsed forms.
Especially the CRL cache did not (still does not?) have nice "that
version is obsoleted, drop it to garbage collection" handling of
parsed CRLs, while it had way too large cache size.
A server running for months and getting a new CRL every 12 hours
eventually grew the memory footprint of obsolete parsed data up to
ridiculous amounts of gigabytes of heap footprint.
Once we realized that also certificate parser had such a cache and
that there is no official way to control that cache, we became very
much averse of using those APIs.
It seems the lack of cache control of these static implementation caches
can be problematic, especially in the CRL case. The default for both
caches is a max number of entries of 750 with an unlimited lifetime. The
inability of expiring outdated CRLs seems particularly problematic
for Matti.
A similar issue was reported in JDK-8059007 back in 2014, with the
following comment from Sean:
We need to introduce some system properties (or something similar)
to allow the crl/cert caches to be adjusted. Changing to RFE.
While it would certainly be nice to add some form of cache control, the
solution space is large:
-1: Remove the cache. One could claim that the parser/factory level is
the wrong level to introduce parsing. There are so many ways to tune
parsing, there exists many excellent libraries. If apps need caching,
they should take care of it at the app level.
0: Do nothing, there haven't been that many complaints
1: Add an option to entirely disable the cache such that apps can do
their own caching (or not) without wasting space in the default cache.
2: Add two system properties to control max entries of the cert and crl
cache. Setting these to 0 would disable the cache.
3: Add four system properties to control max entries and max lifetime
for each of the cert and crl caches. Setting max entries to 0 would
disable a cache.
4: Tune the default max entries and lifetime to make everyone happy.
5: Add a pluggable API to control caching. Think
AbstractCacheControllingMaxEntrySizeProxyFactoryContext :-)
6: Something I did not consider.
-1 is probably not desired/possible because of the behavioural change. 1
is simple but very constrained. 2 and 3 are similar in shape where 3
allows more flexibility introducing lifetime. 4 seems like wishful
thinking. You could probably guess I'm not a big fan of 5. 6 seems
promising :-)
I have a sketch implementation of 3 available, but it would be great to
gather some opinions before I put more work into this:
https://github.com/openjdk/jdk/compare/master...eirbjo:jdk:x509factory-cache-config
<https://github.com/openjdk/jdk/compare/master...eirbjo:jdk:x509factory-cache-config>
Thanks,
Eirik.
[1] https://bugs.openjdk.org/browse/JDK-8059007
<https://bugs.openjdk.org/browse/JDK-8059007>