I'd suggest checking where the bottlenecks are before making major structural changes. I'll admit we have made a few changes to the basic OpenSSL sources but I don't see unacceptable amounts of locking even on large machines (100's of processing units) with thousands of threads.

Blinding and the RNG's were the hot spots and relatively easy to address.
Also, you use TRNG's for things like blinding where a PRNG will do, fixing that also helps performance.

Peter


-----"openssl-dev" <openssl-dev-boun...@openssl.org> wrote: -----
To: paul.d...@oracle.com, openssl-dev@openssl.org
From: Nico Williams
Sent by: "openssl-dev"
Date: 12/01/2015 10:16AM
Subject: Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

On Tue, Dec 01, 2015 at 09:21:34AM +1000, Paul Dale wrote:
> However, the obstacle preventing 100% CPU utilisation for both stacks
> is lock contention.  The NSS folks apparently spent a lot of effort
> addressing this and they have a far more scalable locking model than
> OpenSSL: one lock per context for all the different kinds of context
> versus a small number of global locks.

I prefer APIs which state that they are "thread-safe provided the
application accesses each XYZ context from only one thread at a time".

Leave it to the application to do locking, as much as possible.  Many
threaded applications won't need locking here because they may naturally
have only one thread using a given context.

Also, for something like a TLS context, ideally it should be naturally
possible to have two threads active, as long as one thread only reads
and the other thread only writes.  There can be some dragons here with
respect to fatal events and deletion of a context, but the simplest
thing to do is to use atomics for manipulating state like "had a fatal
alert", and use reference counts to defer deletion (then if the
application developer wants it this way, each of the reader and writer
threads can have a reference and the last one to stop using the context
deletes it).

> There is definitely scope for improvement here.  My atomic operation
> suggestion is one approach which was quick and easy to validate,
> better might be more locks since it doesn't introduce a new paradigm
> and is more widely supported (C11 notwithstanding).

A platform compatibility atomics library would be simple enough (plenty
exist, I believe).  For platforms where no suitable implementation
exists you can use a single global lock, and if there's not even that,
then you can use non-atomic implementations and pretend it's all OK or
fail to build (users of such platforms will quickly provide real
implementations).

(Most compilers have pre-C11 atomics intrinsics and many OSes have
atomics libraries.)

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev

Reply via email to