Hi there,
On Thu, 22 Feb 2001, Bodo Moeller wrote:
> On Thu, Feb 22, 2001 at 06:41:32AM -0800, Geoff Thorpe wrote:
>
> [...]
> > However, the problem remains that if external session caching is being used,
> > even if the race in the "local" cache is resolved, the same race needs to
> > resolved in the external cache which is less trivial. There would need to be an
> > atomic operation that simultaneously checks if the session ID is unique *and*
> > adds it to the cache if it is. Actually it's worse than that - the point at
> > which we check for uniqueness of the session ID is too early in the handshake to
> > have a session to put in the cache - so we're actually talking about reserving a
> > session ID in the cache if it doesn't conflict.
>
> Yes.
>
> > Ugh. Trying to write this as a
> > two-phase commit, (ie. first commit to the local cache, second commit to the
> > external cache, with a rollback if necessary), together with getting the locks
> > right for local cache operation as well as external cache operation (the latter,
> > if it involves networking directly or via nfs, etc, can't be done inside a
> > global lock, can it?), could be tricky.
>
> Well, the session wouldn't be added to the local cache before we have
> word from the external cache that it is unique. If there is an
> external cache, its answers are considered authoritative, and some
> subset of the contents of the external cache can be in the local
> cache.
Well yes, but that's not much help. The external cache can say "yes, that's
unique, go ahead" (this check is performed when forming the first ServerHello),
then the SSL handshake could complete. At which point the session needs to be
added to the cache, and either *or both* of the local and external caches may
have received a conflicting session during that lapse of time and boom, we've
run up against yet another possible "race". I see no other way round *that*
problem except to tell people to write good callbacks if they're going to do it,
and to perhaps implement Holgar's suggested trick that if a cache receives a
"put" for a session that conflicts with an existing session, the "put" is
rejected and the entry in the cache it conflicted with is marked as un-resumable
to prevent garbaged resumes from the client whose session wasn't cached. NB: we
could implement this in the local cache, but I think for the external cache,
we'd simply have to suggest to authors that *they* implement it that way in any
external caches they write (eg. shared memory, distributed, or whatever).
But the entire problem we're attempting to address here is that the
maximum-length "random" session IDs we had so far been using were largely immune
to the inherent speed wobbles of these caching race conditions - either in the
sense of multiple threads going for the same local cache, or in the sense of
multiple threads inside potentially multiple processes inside potentially
multiple machines going for the same external cache(s). The ID space was just
too big for collisions to be a problem. Actually, collisions were checked for
anyway - so to be more accurate, the ID space was just too big for collisions
inside "race condition" timespans to be a problem. :-)
The same exists now with the current callback scheme, except we can be less sure
of course that the application's own callback is choosing such well distributed
session ID candidates as RAND_psuedo_bytes() has been doing for us thus far. To an
extent we *could* just throw our hands in the air and say, "if you write a
callback, then *you* take responsibility for trying to avoid reproducable
behaviour that could result in threads, processes, and machines contesting
against each other on identical generated session IDs". That makes life a lot
simpler of course. Some would say "lazier" though ...
> We have a problem, though, if the external cache decides to expire
> sessions that have not yet expired according to the local cache. But
> there's not much that we can do about this (except make session IDs
> unique by using enough randomness) -- if there are multiple local
> caches owned by different processes, then these processes necessarily
> will have different world views.
A similar problem occurs if thread/process/machine A creates a session, stores
it locally, and then stores it in the external cache. If thread/process/machine
B does something (perhaps its resumes that same session via the external cache)
and decides that the session should be deleted - it doesn't really prevent t/p/m
A resuming the supposedly "deleted" session because they won't even check with
the external cache (or at the very least, it will always be a race between the
delete and the resume). For scenarios where this is important, which
unfortunately includes any situation that wants to protect against possibilities
for active attacks, the only reliable way to get round it is to switch off local
caching and force all caching to be done directly to the external cache. This
means, of course, that SSL_has_matchin... will *always* return zero because the
local cache it is checking against has *no* sessions - so everything could
generate the same session ID, think it's unique, finish the handshake and then
fail to store the session in the external cache due to conflicts. That's not
just a race/collision problem stemming from not being able to adequately
"reserve" an ID in the cache atomically - it's actually a problem stemming from
not being able to check uniqueness in the external cache if there is one.
Hmm ... I'm fast closing in on the conclusion that we just *warn* people (BTW:
Lutz is already touching up the man pages for this - I think he's mostly waiting
for me to stop making mistakes and changing it :-). Ie. "if you write a callback
because you need some structure in the generated session IDs based on whatever -
thread IDs, machine names/addresses, etc - then make it good, make it random,
spread it well, and try to do it in a way where an identical piece of software
running elsewhere won't conflict ... To this end, implementors are encouraged to
harness the RAND_*** functions for good nutritious ID generation." Or something
perhaps a little more explanatory and a little less floral. But, in essence, we
need the man-page equivalent of "if you write a callback, cache collisions and
anomolies between local versus external cachine are *your* problem".
Final point (this is long enough, I know I know ...) - I can imagine that this
sort of usage is probably going to go hand-in-hand with people or applications
that are implementing specialised caching anyway. Ie. if you want to control the
generation of session IDs, it is more than likely because you are doing
something fancy involving session caching (why else care about the session ID
formation?) - and probably something fancy involving session caching with
multiple servers (virtual or otherwise) and probably a sprinkling of something
fancy involving redirections or proxies. It is perfectly straightforward to take
measures at that external level to make this problem go away (eg. assign a range
of session IDs to each machine/process/thread, or perhaps use one or more ID
servers that each have their own range of session IDs - and the generate-session
callbacks request an ID to use from one of these servers to ensure global
uniqueness across all servers. If the potential problems are understood by the
author, there need be *no* problems like the ones we've mentioned - but solving
it generically in the library code so that dumb callbacks can be used safely is
perhaps too much to expect.
Cheers,
Geoff
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]