Re: [openssl.org #1795] [ bug report + tentative patch ] SSL_CTX

Ger Hobbelt Sat, 29 Nov 2008 04:01:36 -0800

On Fri, Nov 28, 2008 at 7:10 PM, Peter Edwards <[EMAIL PROTECTED]> wrote:
[...]
>> a) either forego on the side effect and resort to a linear (slow)
[...]
> This isn't ideal, obviously


Yup. I know.


>> b) provide a threadsafe environment by surrounding a least each sort
[...]
> You're walking a well worn path WRT race conditions here: if you allow
> arbitrary calls to the STACK interface to add and remove items, then all

Ahhhh.. the 'arbitrary' in there... noooo-o-o-o-o-ooo, didn't say
/that/! (more precisely: didn't mention it at all; another of those
rounds where I forget to list /all/ caveats.

So: 'yup!' again. I was well aware of that, but I didn't list that
particular caveat of STACK: using insert/delete on a STACK-based
object is definitely thread UNsafe. And verily. It was before and it
will be after the suggested fix to find(), no matter what one does.

Should've mentioned it, mea culpa, but my take here is that this
caveat has nothing to do with the STACK implementation per se; the
sort of screw-up you describe happens to anyone 'intelligent' enough
(koff koff) to mutate collections in a multithreading scenario where
such collections present pointers/references to the collected
instances to the outside world.

So, in short, I wasn't planning on fixing /that/, because nobody can. Or should.
What I /did/ plan though, is fix a bug particular to the current STACK
implementation and the _fixable_ thread-unsafety threat (with 't') it
contains.

Take this scenario, which _should_ be safe when regarded from a black
box perspective:

----------------
[single thread running:]

//--------------------
// INIT phase of code
//--------------------

new STACK();
STACK.push(obj1);
STACK.push(obj2);
// insert/delete/push/pop are okay here!

kickstart_multiple_threads();

[--> multiple threads running in parallel, all doing this:]

//----------------------
// OPERATIONAL phase of the code
//----------------------

...
const type *objref = STACK.find(key);
// insert/delete/push/pop are prohibited here!
//
// yes, that means STACK, and the objects it points at, are
// immutable once those multiple threads started.
// That's implicit to the reference-instead-of-copy STACK
// storage approach.
...
-----------------

Look at the above: this can work, no locking required /anywhere/.

UNLESS you start to play smart and modify find() in such a way that,
upon hitting an UNsorted collection, it first sorts the collection,
then performs a binary search. (As OpenSSL sk_*() does). The sort() in
the find() violates the 'immutable' requirement, so you end up with a
headache.


:-)) You got me thinking again this morning with your remark (much
appreciated!) but after reconsidering my suggestion from yesterday
night, it still stands. Here's why:

1) I did /not/ try to fix 'arbitrary' so I'll consider that part
settled. See above.

2) Why would a mutating sort() work - including scenarios such as the
one you show?

Because

(a) the sort is now made to happen only once. Never again. No matter
how many times you call find() once you've entered the 'STACK is now
assumed immutable as we're in multithreaded operational run mode'.

(b) thanks to the (now atomic) {is_sorted() + sort()} collective,
EVERY thread trying to find() will run into the lock surrounding the
sort() and will *wait* there until one of them has _completed_ the
sort.
Which means that your scenario becomes (augmented a bit to show worst
case parallel calls and the free() in there is a mutating operation
which is prohibited anyway, so it can't be in there):

Note: thread1 is the thread that just happens (by random chance!) to
be the first ever to call find():

>------------------------
> Thread  1      Thread 2
> item = find()  item= find()
{
  lock()  <-- winner!
                      lock()    <-- WAIT
  if (is_sorted()) --> FALSE
  sort()
  unlock()
                      @ lock granted
                      if (is_sorted()) --> TRUE
                      unlock()

{-- from here on out, the STACK is immutable,
   so parallel internal_find() is okay from now --}
  item = internal_find()

                      item = internal_find()
>                    /* free(item) */
                      ^^^ that's mutating STACK --> prohibited!
> use(const item)
                      use(const item)
>------------------------

which is fine.

The whole point is that before my suggested fix, setting up a stack in
the init code phase will give you a short window for a fatal race
condition later on (!) when a programmer/user of sk_*() [implicitly]
calls sk_*_find() in (multithreaded) run mode.

Compare the above with the EXISTING state of the union, erm, stack.
(this is exactly the same scenario, where, just like in the above,
thread1, by random chance, is to be the first to call find() on the
(by now regarded as immutable) shared stack instance):

>------------------------
> Thread  1      Thread 2
> item = find()  item= find()
{
  // no lock
                      // no lock, so go on down
  if (is_sorted()) --> FALSE
                      if (is_sorted()) --> FALSE
  sort()

  item = internal_find().begin (bsearch)
                        sort().begin !! qsort is not repeatable, so
items are shuffled here!
  RACE! [*]
  item = internal_find().end
  // ^^^ screwed up! wrong item or even 'not found'
                       sort().end

>                    /* free(item) */
                      ^^^ that's mutating STACK --> prohibited!
> use(const item)
// ^^^ corrupted
                      use(const item)
>------------------------

[*]    The race condition happens because, while bsearch() is hopping
through the array looking for an element with matching key, qsort() in
the other thread is subtly reshuffling the array as its split points
will trigger swap()s, resulting in swap()s for thread2.sort(), which,
in this case, will ever so slightly corrupt the (already sorted)
array. If bsearch() just happens to check those same position[s],
bsearch is toast from that point on.

It all depends on your local qsort() implementation's intricacies, of
course, but most platforms I know come with a qsort() which will cause
this to happen. (No 'is this already completely sorted' precheck in
there.) For a sample from theory, see Sedgewick, Algorithms in C,
section 7.6: duplicate keys. Assume fully sorted input and you can
already see from the picture in there that that implementation is
going to blow up a parallel running bsearch() if we don't have a
critical section like suggested before.



Summing it up, the suggested fix is only meant to ensure people can
apply the /regular/ assumptions to STACK and not having to remember a
very peculiar bit of caveat, which says: "you gotta sort before you
enter the multithreading zone." (Once, I had a perfect brain. Not
anymore. It's deterioration some call 'aging'. I'm told it happens to
all of us. So I am sure I'll forget that terribly wicked caveat one
day, when I /should/ have remembered it.)

The fix makes sure you can safely use sk_*() and write code like this:

----------------
[single thread init:]

new STACK();
STACK.push(obj1);
STACK.push(obj2);
// insert/delete/push/pop are okay here!

kickstart_multiple_threads();

[--> multiple threads running in parallel, all doing this:]

// OPERATIONAL phase of the code
const type *objref = STACK.find(key);
// insert/delete/push/pop are prohibited here!
...
-----------------

without coming to harm, just like you'd expect you wouldn't when
considering the interface definition for sk_*().


As my initial email showed, not only I myself assumed code like the
above would work, it also shows the OpenSSL core dev team assumed the
same in several spots in the code ( > 50% of the locations where a
global STACK object is modified, then used to find() previously stored
(by now immutable) info).



> By using the model you are talking about, you're still assuming that once
> the sort is done, the STACK is immutable, and that's just a feature of the
> particular use of the STACK interface that's in question here. Which leads
> to the third option, which is what I was doing with the patch:

Yes to first, and, no, it's /not/ in question here.

Given the multiple uses inside OpenSSL itself, it can only be thought
of as a (yet undocumented) design decision - which makes perfect sense
to me. Besides, it's worked for me for years - apart from that nasty
sort() bit which has eluded me for probably as long because I didn't
review that bit of code /that deep/. I just went with the interface
definitions.


I am /very/ grateful you found that one, because it quite reasonable
explains why I had these utterly agonizing about-once-a-year fatal
failures in our stuff.

However, plonking in another sort() somewhere will fix only one
occurrence of a nastiness that can potentially occur at several spots
(notice the same magic is going on in s2/s3_srvr and clnt.c, for
instance!) and will recur for anyone (like me) using STACK code
implicitly or explicitly, assuming 'immutable' makes you safe in a
multithreaded environment.

To prevent this /extremely hard to track down/ issue from happening
ever again, anywhere, I suggested the internal_find() fix. It should
work for you as well as it will work for me.


... Unless my reasoning is flawed, correction of which is much
appreciated. (But see the bottom for a possibly 'better' alternative.)




> The nomenclature of the _find() operation does leave something to be desired
> (both for the fact that it can modify the object it acts on, and that its
> worst case performance is worse than the obvious O(n) you'd expect),

Mwoch. To nitpick ad nauseam: there's no performance specced for
find(), O(n) is a nice initial guess, and the bsearch in there makes
that close to O(log2(n)) for stacks with no dupe keys, approaching
O(n) for fully dupe key sets and the qsort() in there will add a /one
time/ _extreme_ worse-case O(n^2) to that. Given it only happens once
for the first find(), I'm fine with that, as it'll average out to
O(log2(n))-something over time and I don't need dynamite startup speed
- it's sustained performance I'm after.

Add Maarten Lidmaath's beautiful tweak to the internal_find() fix and,
once you're past the initial few seconds of time where the find()s get
called, you don't even have the [new] overhead of the lock/unlock
operations anymore: exactly the same speed as before, yet safe now.
(And I don't do mutable things to those stacks, nor does OpenSSL
itself, nor should anyone else.)


> not sure there's anything particularly wrong with the semantics per se. I'd
> probably prefer that it failed hard if the stack wasn't already sorted, but
> that's just me.

Yup. I looked in there and my first reaction was pushing an error and
call it a day, but that still would mean (a) several places still need
to be augmented with an extra sort() or nothing would start anymore,
and (b) the damage is done anyhow when you mix insert/delete with find
in a multithreaded setup (naughty boys get spanked).

Besides, the realization that adding sort() everywhere, like your
initial fix, would turn all those add()s into add+sort series, which
for large lists is the worst thing one can do as the qsort() in there
will hit its worst-case scenario (almost fully sorted input) on every
add -- given the interfaces of asn1/x509 push()ing methods, it's that
or some seriously crazy code flow analysis + fixing -- and ~ O(n^2)
extra overhead will be included with _each_ insert. Ouch.


And then the idea for the locking fix came up: minimal code change and
now really safe where wrongly assumed safe before.


However, given the above, I think the yet better thing to do is
completely dispense with the sort and go with my last thought of
yesterday night: turn the insert() into an elemental insertion_sort()
operation, so that STACK _NEVER_ is unsorted anyhow. It's very much
like your sort() fix, only, again, internal to sk_*() so all places
benefit from it instead of just one spot.

That would result in internal_find() staying as it is; only change
happening to sk_insert() and, yes, a slower insert() as insertion
means we've got to bsearch+memmove part of the storage array for each
insert() (and thus for every push()).
Today, I think I can live with that penalty on insert() (and thus
push() and anyone using that) and have my find() work like I expect it
to, in an otherwise by now immutable environment, just like before.

One thing to check before I code this solution: any insert() (or
push() / other outer call) MUST insert() only fully setup objects,
i.e. objects which have their 'key' part already filled in.
Currently, one /can/ write this:

--------------------
type *item = new type();
stack.push(item); // == sk_insert()
// item is now known to stack, who keeps
// a *reference*, so we can init after insert:
item->key = foo;
item->value = bar;
// continue on our merry way
...
// this works:
ref = stack.find(key);
---------------------

(Another way of doing this is to internal_insert() an element at a
specific index 'loc', then update the element at that location after
calling internal_insert().)

Turning insert() into an 'ordered insert' makes that kind of code
invalid... (OpenSSL code grepped and inspected; doesn't seem to happen
anywhere (good!), so we should be good to go to make insert() an
'ordered insert' and throw away the 'sorted' flag in the process.)



Your thoughts?


-- 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   [EMAIL PROTECTED]
mobile: +31-6-11 120 978
--------------------------------------------------
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: [openssl.org #1795] [ bug report + tentative patch ] SSL_CTX

Reply via email to