Hello,
I recently started using OpenSSL in an embedded Linux device and I
encountered some very harmful caching behaviour. I have attached a
patch that fixes it for me, but I think the issue is worth some
discussion.
The problem is that the SSL_CTX's internal cache of SSL_SESSION
objects can become a huge drain on memory due to the fact that expired
sessions are only flushed after every 255th new session. For example,
in my embedded product I periodically make a connection to a
user-configurable number of SSL servers (actually TLS) to securely
exchange some control information. This is done from the same
long-lived process on my embedded device (so that the internal session
cache can usually speed up connection establishment), using a separate
SSL_CTX for each destination server (so that it can be done in
parallel from different threads). The session expiration time in
OpenSSL is 5 minutes, so each SSL_CTX will only create a new
SSL_SESSION once every 5 minutes, all of which remain in memory until
the 255th one. So it takes (5 minutes * 255) = 21.25 hours before any
SSL_SESSION objects are freed.
The problem is that my embedded Linux product has only 32 MB of RAM
and no swap. If the number of configured SSL servers is, say, 2, then
there are 2*255 = 510 SSL_SESSION objects in memory after 21.25 hours.
On my device, the memory taken up by those 510 SSL_SESSION objects and
their sub-objects is about 30% of all RAM, and my system's best-case
memory usage is, coincidentally, about 70% of all RAM. So every 21.25
hours there is a high probability that the OS runs out of memory and
makes my embedded device very, very sad. :(
I suspect this behaviour is what users are running into when they
report supposed "leaks" in OpenSSL. (I too thought it was a leak until
I found the root cause). As a solution I have adopted the attached
patch, which simply makes OpenSSL flush expired sessions after _every_
new session if built with OPENSSL_SMALL_FOOTPRINT. This is perfectly
sufficient for me because it means the maximum number of SSL_SESSION
objects in my process with N servers is simply N, rather than N*255.
However, I'm not convinced that my patch is really the Right Thing To
Do. I realize the intention of the lazy flushing is to avoid wasting
CPU time iterating over non-expired sessions ... so for some users the
existing behaviour may in fact be preferable even in
OPENSSL_SMALL_FOOTPRINT builds, and conversely the _new_ behaviour may
be preferable for some users even in non-OPENSSL_SMALL_FOOTPRINT
builds.
It seems to me that the ideal change might be to retain the lazy
flushing but to also free individual expired sessions
opportunistically when they are found to be expired during other
searches of the internal session cache (e.g., when searching for a
session to resume). But I haven't attempted that approach.
Index: ssl/ssl_lib.c
===================================================================
*** ssl/ssl_lib.c (revision 2825)
--- ssl/ssl_lib.c (working copy)
***************
*** 2235,2246 ****
--- 2235,2250 ----
if ((!(i & SSL_SESS_CACHE_NO_AUTO_CLEAR)) &&
((i & mode) == mode))
{
+ #ifndef OPENSSL_SMALL_FOOTPRINT
if ( (((mode & SSL_SESS_CACHE_CLIENT)
?s->session_ctx->stats.sess_connect_good
:s->session_ctx->stats.sess_accept_good) & 0xff) == 0xff)
{
+ #endif
SSL_CTX_flush_sessions(s->session_ctx,(unsigned long)time(NULL));
+ #ifndef OPENSSL_SMALL_FOOTPRINT
}
+ #endif
}
}