Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-16 Thread Nico Williams
Another interesting portable atomics library is
https://github.com/mintomic/mintomic

FYI, I took a stab at a simple portable atomics that uses GCC/clang
__atomic, or __sync, or Win32 Interlocked*, or a single global lock, and
with a fallback to unsafe, non-atomic implementations for no-threads
configurations; adding C11 support will be trivial.  For just
incremdent/decrement and CAS this is really small, and I think that's
enough for OpenSSL for starters.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Kurt Roeckx
On Tue, Dec 15, 2015 at 10:35:58PM +0100, Florian Weimer wrote:
> * Kurt Roeckx:
> 
> > On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> >> * Nico Williams:
> >> 
> >> > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> >> >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> >> >> 
> >> >> It seems to have trouble to keep up with new architectures.
> >> >
> >> > New architectures are not really a problem because between a) decent
> >> > compilers with C11 and/or non-C11 atomic intrinsics,
> >> 
> >> Not on Windows.
> >> 
> >> > What's the alternative anyways?
> >> 
> >> Using C++11.
> >
> > I think this is a relevant article:
> > http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/
> 
> Based on the MSDN documentation, this (e.g. C++11 features implemented
> in the MSVC C compiler) has not actually happened:
> 
>   

And in
http://blogs.msdn.com/b/vcblog/archive/2014/06/11/c-11-14-feature-tables-for-visual-studio-14-ctp1.aspx
someone mentions that stdatomic.h doesn't exist in the 2014
version.  I can't find a reference that says anything in the 2015
changed related to that so I assume it didn't.

The only thing I could find is that the C library hasn't been
incorparated in the C++ standard.  But all the functions actually
exist in the C++ .


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Florian Weimer
* Kurt Roeckx:

> On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
>> * Nico Williams:
>> 
>> > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
>> >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
>> >> 
>> >> It seems to have trouble to keep up with new architectures.
>> >
>> > New architectures are not really a problem because between a) decent
>> > compilers with C11 and/or non-C11 atomic intrinsics,
>> 
>> Not on Windows.
>> 
>> > What's the alternative anyways?
>> 
>> Using C++11.
>
> I think this is a relevant article:
> http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/

Based on the MSDN documentation, this (e.g. C++11 features implemented
in the MSVC C compiler) has not actually happened:

  
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 07:54:35PM +0100, Kurt Roeckx wrote:
> Also, if you want to use atomics we really want the C11 / C++11
> memory model which prevents certain important optimazations.

Right, because compilers can reorder some operations.  But we've been
living with this pre-C11 for decades.  We can't yet require C11 (though
I'd sure like to).

BTW, there's this:

http://blogs.msdn.com/b/vcblog/archive/2015/05/01/bringing-clang-to-windows.aspx

which, IIUC means we can get C11 on Windows with MSVC with a clang
frontend.  So maybe Windows is a non-issue, and maybe C11 is a
non-issue.  Though I'm sure there's platforms where OpenSSL can't expect
C11 yet, so I suspect we're stuck with C90+ for a while.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 06:15:32PM +, Salz, Rich wrote:
> > I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS atomic
> > function libraries, and portable open-source atomics libraries, we can cover
> > almost all the bases.
> 
> Agreed.

Thanks.  This is helpful.  I now think we'll have an easy road to
consensus on how to get thread-safety in OpenSSL.  I will try to
contribute infrastructure, but there's a lot of heavy lifting to do that
will take a long time, and I can't do it all.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Kurt Roeckx
On Tue, Dec 15, 2015 at 09:57:32AM -0600, Benjamin Kaduk wrote:
> On 12/15/2015 06:43 AM, Kurt Roeckx wrote:
> > On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> >> * Nico Williams:
> >> Not on Windows.
> >>
> >>> What's the alternative anyways?
> >> Using C++11.
> > I think this is a relevant article:
> > http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/
> >
> 
> I think an article from 2012 is no longer current; something like
> http://blogs.msdn.com/b/vcblog/archive/2015/06/19/c-11-14-17-features-in-vs-2015-rtm.aspx
> might be a better source.

That's all about C++, except for library functions from C99,
offsetof, __func__, and "long long" that are now also part of
C++.

The point of the article I pointed to is that they only do C90.
They have no intention of adding support for newer C standards
except for the cases where C++ just happens to be compatible.

And the C++11 and C11 atomics are compatible, they just have a
slightly different syntax, and as the URL I pointed to says are
implemented in it.

Also, if you want to use atomics we really want the C11 / C++11
memory model which prevents certain important optimazations.


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Salz, Rich
> I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS atomic
> function libraries, and portable open-source atomics libraries, we can cover
> almost all the bases.

Agreed.

> We have a surfeit of options, not a dearth of them.  I don't think lack of
> atomics primitives is remotely a concern.  We should use atomic primitives in
> OpenSSL.

Yes.  It would be great if we could start collecting snippets so that it can 
easily be put into the next 1.1 alpha release.  Perhaps on the wiki: 
https://wiki.openssl.org/index.php/Examples
 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> * Nico Williams:
> 
> > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> >> 
> >> It seems to have trouble to keep up with new architectures.
> >
> > New architectures are not really a problem because between a) decent
> > compilers with C11 and/or non-C11 atomic intrinsics,
> 
> Not on Windows.

Windows has a family of functions for atomic addition, compare-and-swap,
etcetera:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686360%28v=vs.85%29.aspx#interlocked_functions

Solaris/Illumos has its own as well.

Linux has several atomics libraries.

And there are several open-source portable atomics libraries as well.

I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS
atomic function libraries, and portable open-source atomics libraries,
we can cover almost all the bases.

> > What's the alternative anyways?
> 
> Using C++11.

Sure, but only for a C atomics library for the rest of OpenSSL.

So that makes five alternatives, plus the two stub implementations (one
with global locks, one with no locking/atomics).  Any platform not
covered will get one of the stub implementations and its users will have
to contribute a better implementation.

We have a surfeit of options, not a dearth of them.  I don't think lack
of atomics primitives is remotely a concern.  We should use atomic
primitives in OpenSSL.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 09:57:32AM -0600, Benjamin Kaduk wrote:
> On 12/15/2015 06:43 AM, Kurt Roeckx wrote:
> > On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> >> Using C++11.
> > I think this is a relevant article:
> > http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/
> >
> 
> I think an article from 2012 is no longer current; something like
> http://blogs.msdn.com/b/vcblog/archive/2015/06/19/c-11-14-17-features-in-vs-2015-rtm.aspx
> might be a better source.

Yes, but not everyone will have the latest and greatest compilers.
Still, the Windows interlocked function family is enough.  See my other
post just now.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Benjamin Kaduk
On 12/15/2015 06:43 AM, Kurt Roeckx wrote:
> On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
>> * Nico Williams:
>> Not on Windows.
>>
>>> What's the alternative anyways?
>> Using C++11.
> I think this is a relevant article:
> http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/
>

I think an article from 2012 is no longer current; something like
http://blogs.msdn.com/b/vcblog/archive/2015/06/19/c-11-14-17-features-in-vs-2015-rtm.aspx
might be a better source.

-Ben
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Kurt Roeckx
On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> * Nico Williams:
> 
> > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> >> 
> >> It seems to have trouble to keep up with new architectures.
> >
> > New architectures are not really a problem because between a) decent
> > compilers with C11 and/or non-C11 atomic intrinsics,
> 
> Not on Windows.
> 
> > What's the alternative anyways?
> 
> Using C++11.

I think this is a relevant article:
http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Florian Weimer
* Nico Williams:

> On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
>> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
>> 
>> It seems to have trouble to keep up with new architectures.
>
> New architectures are not really a problem because between a) decent
> compilers with C11 and/or non-C11 atomic intrinsics,

Not on Windows.

> What's the alternative anyways?

Using C++11.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Nico Williams
On Thu, Dec 10, 2015 at 07:06:15AM +1000, Paul Dale wrote:
> Thanks for the clarification.  I was making an assumption that
> following the existing locking model, which did seem over complicated,
> was desirable.  Now that that is shot down, things can be much
> simpler.

Exactly :)

Sorry if I was a bit brusque.  Since inertia is strong, I figured I
needed to make a forceful argument.  However, it seems it was easy to
get consensus after all.

> It would make more sense to have a structure containing the reference
> counter and (optionally?) a lock to use for that counter.

It'd work but it'd be a complication, since now every integer to be used
with atomic increment/decrement, or CAS, or whatever, now needs to be a
struct type with the integer as one field and the rest as opaque.  It'd
be much nicer to be able to use ints normally, though I would agree that
having a special type for atomics has the benefit that it is
self-describing.

It's perfectly fine to have a worst-case atomics implementation that
uses a global lock.  Yes, that would be slow, but we need some incentive
to add true atomics for each platform, and making it slow is the exact
right incentive.

So even if for documentation and type-safety reasons we wanted to wrap
ints in structs for ints meant to be used with atomics, I'd still want
the worst-case atomics implementation to be slow.

> With atomics, the lock isn't there or at least isn't used.  Without
> them, it is.  This is because, I somewhat suspect having a fall back
> global lock for all atomic operations would be worse than the current
> situation were at least a few different locks are used.

That's a feature :)

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Paul Dale
Nico,

Thanks for the clarification.  I was making an assumption that following the 
existing locking model, which did seem over complicated, was desirable.  Now 
that that is shot down, things can be much simpler.

It would make more sense to have a structure containing the reference counter 
and (optionally?) a lock to use for that counter.
With atomics, the lock isn't there or at least isn't used.  Without them, it 
is.  This is because, I somewhat suspect having a fall back global lock for all 
atomic operations would be worse than the current situation were at least a few 
different locks are used.

There is also the possibility of only using the per reference counter lock and 
not using atomic operations at all -- this would reduce the contention a lot 
and might not hurt performance much.  It would be easy to benchmark an 
uncontested lock/add/unlock versus atomic add on the target platforms to see 
the difference.


Thanks against for the insights,

Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia
On Wed, 9 Dec 2015 03:27:51 AM Nico Williams wrote:
> On Wed, Dec 09, 2015 at 02:33:46AM -0600, Nico Williams wrote:
> > No more installing callbacks to get locking and atomics.
> 
> I should explain why.
> 
> First, lock callbacks are a serious detriment to usability.
> 
> Second, they are an admission that OpenSSL is incomplete.
> 
> Third, if we have lock callbacks to install, then we have the risk of
> racing (by multiple libraries using OpenSSL) to install them.  Unless
> there's a single function to install *all* such callbacks, then there's
> no way to install callbacks atomically.  But every once in a while we'll
> need to add an Nth callback, thus breaking the ABI or atomicity.
> 
> So, no, no lock callbacks.  OpenSSL should work thread-safely out of the
> box like other libraries.  That means that the default configuration
> should be to use pthreads on *nix, for example.  We'll need an atomics
> library (e.g., OpenPA, or something new) with safe and sane -if not very
> performant- defaults that use global locks for platform/compiler
> combinations where there's no built-in atomics intrinsics or system
> library.  It should be possible to have a no-threads configuration where
> the locks and atomics are non-concurrent-safe implementations.
> 
> Nico
> -- 



___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Blumenthal, Uri - 0553 - MITLL
+2 to Rich and Nico.

Sent from my BlackBerry 10 smartphone on the Verizon Wireless 4G LTE network.
  Original Message  
From: Salz, Rich
Sent: Wednesday, December 9, 2015 08:37
To: paul.d...@oracle.com; openssl-dev@openssl.org; Nico Williams
Reply To: openssl-dev@openssl.org
Subject: Re: [openssl-dev] [openssl-team] Discussion: design issue: async   
and -lpthread


> The "have-atomics" is intended to test if the callback was installed by the
> user.

I want to move away from runtime callback installations. It makes it too hard 
to know what the library is doing, if it is correct, and it complicates the 
code. There is almost never any reason for the flexibility that OpenSSL "used 
to" have. :)
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev



smime.p7s
Description: S/MIME cryptographic signature
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Salz, Rich

> The "have-atomics" is intended to test if the callback was installed by the
> user.

I want to move away from runtime callback installations.  It makes it too hard 
to know what the library is doing, if it is correct, and it complicates the 
code.  There is almost never any reason for the flexibility that OpenSSL "used 
to" have. :)
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Salz, Rich

> "have-atomics" must be known at compile time.
> 
> "lock" should not be needed because we should always have atomics, even
> when we don't have true atomics: just use a global lock in a stub
> implementation of atomic_add() and such.  KISS.  Besides, this will add
> pressure to add true atomics wherever they are truly needed.

Agree.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Nico Williams
On Wed, Dec 09, 2015 at 02:33:46AM -0600, Nico Williams wrote:
> No more installing callbacks to get locking and atomics.

I should explain why.

First, lock callbacks are a serious detriment to usability.

Second, they are an admission that OpenSSL is incomplete.

Third, if we have lock callbacks to install, then we have the risk of
racing (by multiple libraries using OpenSSL) to install them.  Unless
there's a single function to install *all* such callbacks, then there's
no way to install callbacks atomically.  But every once in a while we'll
need to add an Nth callback, thus breaking the ABI or atomicity.

So, no, no lock callbacks.  OpenSSL should work thread-safely out of the
box like other libraries.  That means that the default configuration
should be to use pthreads on *nix, for example.  We'll need an atomics
library (e.g., OpenPA, or something new) with safe and sane -if not very
performant- defaults that use global locks for platform/compiler
combinations where there's no built-in atomics intrinsics or system
library.  It should be possible to have a no-threads configuration where
the locks and atomics are non-concurrent-safe implementations.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Nico Williams
No more installing callbacks to get locking and atomics.  This has to all
work out of the box (the user could be allowed tip supply their
own implementations if these things at OpenSSL build time, but that's it,
not at run-time).

Nico
--
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Paul Dale

The "have-atomics" is intended to test if the callback was installed by the 
user.  If we're using an atomic library or compiler support, then it isn't 
required since we know we've got them.

Likewise, the lock argument isn't required if atomics are used everywhere.  
However, some code will need fixing since there are places that adjust 
reference counters directly using arithmetic operators (while holding the 
appropriate lock).  These will have to be changed to atomics and the locked 
sections of code checked to see that that doesn't introduce other problems.

All possible of course.


Pauli


On Tue, 8 Dec 2015 10:01:20 PM Nico Williams wrote:
> On Wed, Dec 09, 2015 at 09:27:16AM +1000, Paul Dale wrote:
> > It will be possible to support atomics in such a way that there is no
> > performance penalty for machines without them or for single threaded
> > operation.  My sketcy design is along the lines of adding a new API
> > CRYPTO_add_atomic that takes the same arguments as CRYPTO_add (i.e.
> > reference to counter, value to add and lock to use):
> > 
> > CRYPTO_add_atomic(int *addr, int amount, int lock)
> > if have-atomics then
> > atomic_add(addr, amount)
> > else if (lock == have-lock-already)
> > *addr += amount
> > else
> > CRYPTO_add(addr, amount, lock)
> 
> "have-atomics" must be known at compile time.
> 
> "lock" should not be needed because we should always have atomics, even
> when we don't have true atomics: just use a global lock in a stub
> implementation of atomic_add() and such.  KISS.  Besides, this will add
> pressure to add true atomics wherever they are truly needed.
> 
> Nico
> -- 

-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Nico Williams
On Wed, Dec 09, 2015 at 09:27:16AM +1000, Paul Dale wrote:
> It will be possible to support atomics in such a way that there is no
> performance penalty for machines without them or for single threaded
> operation.  My sketcy design is along the lines of adding a new API
> CRYPTO_add_atomic that takes the same arguments as CRYPTO_add (i.e.
> reference to counter, value to add and lock to use):
> 
> CRYPTO_add_atomic(int *addr, int amount, int lock)
> if have-atomics then
> atomic_add(addr, amount)
> else if (lock == have-lock-already)
> *addr += amount
> else
> CRYPTO_add(addr, amount, lock)

"have-atomics" must be known at compile time.

"lock" should not be needed because we should always have atomics, even
when we don't have true atomics: just use a global lock in a stub
implementation of atomic_add() and such.  KISS.  Besides, this will add
pressure to add true atomics wherever they are truly needed.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Paul Dale
It will be possible to support atomics in such a way that there is no 
performance penalty for machines without them or for single threaded operation.
My sketcy design is along the lines of adding a new API CRYPTO_add_atomic that 
takes the same arguments as CRYPTO_add (i.e. reference to counter, value to add 
and lock to use):

CRYPTO_add_atomic(int *addr, int amount, int lock)
if have-atomics then
atomic_add(addr, amount)
else if (lock == have-lock-already)
*addr += amount
else
CRYPTO_add(addr, amount, lock)

The have-lock-already will need to be a new code that indicates that the caller 
has the relevant lock held and there is no need to lock before the add.  Some 
conditional compilation like CRYPTO_add & CRYPTO_add_lock have can be done to 
get the overhead down to zero in the single threaded case and the case where it 
is known beforehand that there are no atomic operations.  It is also possible 
for the atomic_add function to be passed in as a user call back as per the 
other locking callbacks which means OSSL doesn't actually need to know how any 
of this works underneath.

Once this is done, most instances of CRYPTO_add can be changed to 
CRYPTO_add_atomic.  Unfortunately, not all can be changed, so this would 
involve manual inspection of each lock for which CRYPTO_add is used to see if 
atomics are suitable.  I've done a partial list of which could be changed over 
(attached) but it is pretty rough and needs rechecking.

It would be prudent to have a CRYPTO_add_atomic_lock call underneath 
CRYPTO_add_atomic, like CRYPTO_add has CRYPTO_add_lock, to get the extra debug 
output.


Finally, can someone explain what the callback passed to 
CRYPTO_set_add_lock_callback is supposed to do?  Superficially, it seems like a 
way to use atomic operations instead of full locking -- but that breaks things 
due to the way the locking is done elsewhere.  So this call back needs to lock, 
add and unlock like the alternate code path in the CRYPTO_add_lock function.  
There is no obvious benefit to providing it.


Pauli

On Tue, 8 Dec 2015 11:22:01 AM Nico Williams wrote:
> On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> > > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> > 
> > It seems to have trouble to keep up with new architectures.
> 
> New architectures are not really a problem because between a) decent
> compilers with C11 and/or non-C11 atomic intrinsics, b) asm-coded
> atomics, and c) mutex-based dumb atomics, we can get full coverage.
> Anyone who's still not satisfied can then contribute missing asm-coded
> atomics to OpenPA.  I suspect that OpenSSL using OpenPA is likely to
> lead to contributions to OpenPA that will make it better anyways.
> 
> What's the alternative anyways?
> 
> We're talking about API and performance enhancements to OpenSSL to go
> faster on platforms for which there are atomics, and maybe slower
> otherwise -- or maybe not; maybe we can implement context up-/down-ref
> functions that use fine-grained (or even global) locking as a fallback
> that yields performance comparable to today's.
> 
> If OpenPA's (or some other such library's) license works for OpenSSL,
> someone might start using it.  That someone might be me.  So that seems
> like a good question to ask: is OpenPA's license compatible with
> OpenSSL's?  For inclusion into OpenSSL's tree, or for use by OpenSSL?
> 
> Nico
> 

-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia
Lock #  Note
CRYPTO_LOCK_ERR 28  one problem with decrement using 
CRYPTO_add -- leave as is
CRYPTO_LOCK_EX_DATA 7   safe, no CRYPTO_add
CRYPTO_LOCK_X50910  most likely safe, needs deeper recheck 
down call chains
CRYPTO_LOCK_X509_INFO   2   safe, only uses CRYPTO_add
CRYPTO_LOCK_X509_PKEY   2   safe, only uses CRYPTO_add
CRYPTO_LOCK_X509_CRL5   unsure
CRYPTO_LOCK_X509_REQ2   only ASN1_SEQUENCE_re
CRYPTO_LOCK_DSA 5   safe with atomic add (double check)?
CRYPTO_LOCK_RSA 19  most likely safe, needs deeper recheck 
down call chains
CRYPTO_LOCK_EVP_PKEY27  safe with atomic add
CRYPTO_LOCK_X509_STORE  35  assume unsafe, one CRYPTO_add only
CRYPTO_LOCK_SSL_CTX 26  one increment unsafe, rest seems okay 
-> make increment atomic
CRYPTO_LOCK_SSL_CERT3   safe, only uses CRYPTO_add
CRYPTO_LOCK_SSL_SESSION 9   safe if atomic add is used inside 
locked block in ssl/ssl_sess.c
CRYPTO_LOCK_SSL_SESS_CERT   1   unused
CRYPTO_LOCK_SSL 11  safe without compression, probably safe 
with but would need a double check
CRYPTO_LOCK_SSL_METHOD  1   unused
CRYPTO_LOCK_RAND16  safe, no CRYPTO_add
CRYPTO_LOCK_RAND2   11  s

Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Nico Williams
On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> 
> It seems to have trouble to keep up with new architectures.

New architectures are not really a problem because between a) decent
compilers with C11 and/or non-C11 atomic intrinsics, b) asm-coded
atomics, and c) mutex-based dumb atomics, we can get full coverage.
Anyone who's still not satisfied can then contribute missing asm-coded
atomics to OpenPA.  I suspect that OpenSSL using OpenPA is likely to
lead to contributions to OpenPA that will make it better anyways.

What's the alternative anyways?

We're talking about API and performance enhancements to OpenSSL to go
faster on platforms for which there are atomics, and maybe slower
otherwise -- or maybe not; maybe we can implement context up-/down-ref
functions that use fine-grained (or even global) locking as a fallback
that yields performance comparable to today's.

If OpenPA's (or some other such library's) license works for OpenSSL,
someone might start using it.  That someone might be me.  So that seems
like a good question to ask: is OpenPA's license compatible with
OpenSSL's?  For inclusion into OpenSSL's tree, or for use by OpenSSL?

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Florian Weimer
* Nico Williams:

> Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?

It seems to have trouble to keep up with new architectures.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-07 Thread Nico Williams
Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-07 Thread Nico Williams
On Mon, Dec 07, 2015 at 02:41:35PM +0100, Florian Weimer wrote:
> On 11/25/2015 06:48 PM, Kurt Roeckx wrote:
> > Please note that we use C, not C++.  But C11 has the same atomics
> > extentions as C++11.
> 
> C++11 support is much more widespread than C11 support.  You will have
> trouble finding reliable support for C11 atomics with the Microsoft
> toolchain.
>
> [...]
>
> It is a lot of working getting the atomics right on all supported platforms.

The MSFT toolchain has its own intrisics, as do GCC/clang.  A variety of
OSes have their own atomics libraries (e.g., Solaris/Illumos, FreeBSD,
and others).  Linux has several as well, but I am not sure that the
licensing on those will be compatible to link against (much less to
incorporate as source in OpenSSL).  Some of the BSD or CDDL licensed
libraries might be possible to incorporate as source into OpenSSL.

It's a solvable problem, but yes, a lot of work :(  Still, it seems
worth doing.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-07 Thread Florian Weimer
On 11/25/2015 06:48 PM, Kurt Roeckx wrote:
> On Wed, Nov 25, 2015 at 01:02:29PM +0100, Florian Weimer wrote:
>> On 11/23/2015 11:08 PM, Kurt Roeckx wrote:
>>
>>> I think that we currently don't do any compile / link test to
>>> detect features but that we instead explicitly say so for each
>>> platform.
>>>
>>> Anyway, the gcc the documentation is here:
>>> https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
>>>
>>> TLS support clearly isn't supported everywhere.
>>
>> The most portable approach is to switch C++11, which provides you
>> thread-local variables with destructors (and you also get portable
>> atomics).  There are simply no standards-based C solutions as long as
>> you have to support the Microsoft compiler under Windows.
> 
> Please note that we use C, not C++.  But C11 has the same atomics
> extentions as C++11.

C++11 support is much more widespread than C11 support.  You will have
trouble finding reliable support for C11 atomics with the Microsoft
toolchain.

> We're also currently still targetting C89/C90 (with some minor
> extentions), but I think we should try to use them if the platform
> supports it.

It is a lot of working getting the atomics right on all supported platforms.

Florian

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-01 Thread Paul Dale
The figures were for connection reestablishment, RSA computations etc simply 
don't feature.  For initial connection establishment, on the other hand, they 
are the single largest factor.  The crypto is definitely not the bottleneck for 
this case.


Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-01 Thread Hubert Kario
On Tuesday 01 December 2015 09:21:34 Paul Dale wrote:
> > are you sure that the negotiated cipher suite is the same and that
> > the NSS is not configured to reuse the server key share if you're
> > using DHE or ECDHE?
> 
> There is definitely scope for improvement here.  My atomic operation
> suggestion is one approach which was quick and easy to validate,
> better might be more locks since it doesn't introduce a new paradigm
> and is more widely supported (C11 notwithstanding).

I'm not saying there is no room for improvement or that the improvements 
are useless. But as long as we're not comparing apples-to-apples the 
statistic is useless.

Other things to look for: ServerKeyExchange curve or group and signature 
algorithm used as well as the key size of server. Each of those things 
can have impact completely overshadowing the lock contention differences 
(picking big RSA key size can easily slash performance by an order of 
magnitude).
-- 
Regards,
Hubert Kario
Senior Quality Engineer, QE BaseOS Security team
Web: www.cz.redhat.com
Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno, Czech Republic

signature.asc
Description: This is a digitally signed message part.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Peter Waltenberg
 I'd suggest checking where the bottlenecks are before making major structural changes. I'll admit we have made a few changes to the basic OpenSSL sources but I don't see unacceptable amounts of locking even on large machines (100's of processing units) with thousands of threads.Blinding and the RNG's were the hot spots and relatively easy to address. Also, you use TRNG's for things like blinding where a PRNG will do, fixing that also helps performance.Peter-"openssl-dev"  wrote: -To: paul.d...@oracle.com, openssl-dev@openssl.orgFrom: Nico Williams Sent by: "openssl-dev" Date: 12/01/2015 10:16AMSubject: Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthreadOn Tue, Dec 01, 2015 at 09:21:34AM +1000, Paul Dale wrote:> However, the obstacle preventing 100% CPU utilisation for both stacks> is lock contention.  The NSS folks apparently spent a lot of effort> addressing this and they have a far more scalable locking model than> OpenSSL: one lock per context for all the different kinds of context> versus a small number of global locks.I prefer APIs which state that they are "thread-safe provided theapplication accesses each XYZ context from only one thread at a time".Leave it to the application to do locking, as much as possible.  Manythreaded applications won't need locking here because they may naturallyhave only one thread using a given context.Also, for something like a TLS context, ideally it should be naturallypossible to have two threads active, as long as one thread only readsand the other thread only writes.  There can be some dragons here withrespect to fatal events and deletion of a context, but the simplestthing to do is to use atomics for manipulating state like "had a fatalalert", and use reference counts to defer deletion (then if theapplication developer wants it this way, each of the reader and writerthreads can have a reference and the last one to stop using the contextdeletes it).> There is definitely scope for improvement here.  My atomic operation> suggestion is one approach which was quick and easy to validate,> better might be more locks since it doesn't introduce a new paradigm> and is more widely supported (C11 notwithstanding).A platform compatibility atomics library would be simple enough (plentyexist, I believe).  For platforms where no suitable implementationexists you can use a single global lock, and if there's not even that,then you can use non-atomic implementations and pretend it's all OK orfail to build (users of such platforms will quickly provide realimplementations).(Most compilers have pre-C11 atomics intrinsics and many OSes haveatomics libraries.)Nico-- ___openssl-dev mailing listTo unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Nico Williams
On Tue, Dec 01, 2015 at 09:21:34AM +1000, Paul Dale wrote:
> However, the obstacle preventing 100% CPU utilisation for both stacks
> is lock contention.  The NSS folks apparently spent a lot of effort
> addressing this and they have a far more scalable locking model than
> OpenSSL: one lock per context for all the different kinds of context
> versus a small number of global locks.

I prefer APIs which state that they are "thread-safe provided the
application accesses each XYZ context from only one thread at a time".

Leave it to the application to do locking, as much as possible.  Many
threaded applications won't need locking here because they may naturally
have only one thread using a given context.

Also, for something like a TLS context, ideally it should be naturally
possible to have two threads active, as long as one thread only reads
and the other thread only writes.  There can be some dragons here with
respect to fatal events and deletion of a context, but the simplest
thing to do is to use atomics for manipulating state like "had a fatal
alert", and use reference counts to defer deletion (then if the
application developer wants it this way, each of the reader and writer
threads can have a reference and the last one to stop using the context
deletes it).

> There is definitely scope for improvement here.  My atomic operation
> suggestion is one approach which was quick and easy to validate,
> better might be more locks since it doesn't introduce a new paradigm
> and is more widely supported (C11 notwithstanding).

A platform compatibility atomics library would be simple enough (plenty
exist, I believe).  For platforms where no suitable implementation
exists you can use a single global lock, and if there's not even that,
then you can use non-atomic implementations and pretend it's all OK or
fail to build (users of such platforms will quickly provide real
implementations).

(Most compilers have pre-C11 atomics intrinsics and many OSes have
atomics libraries.)

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Paul Dale
> are you sure that the negotiated cipher suite is the same and that the 
> NSS is not configured to reuse the server key share if you're using DHE 
> or ECDHE?

The cipher suite was the same.  I'd have to check to see exactly which was 
used.  It is certainly possible that NSS was configured as you suggest and, if 
so, this would improve its performance.


However, the obstacle preventing 100% CPU utilisation for both stacks is lock 
contention.  The NSS folks apparently spent a lot of effort addressing this and 
they have a far more scalable locking model than OpenSSL: one lock per context 
for all the different kinds of context versus a small number of global locks.

There is definitely scope for improvement here.  My atomic operation suggestion 
is one approach which was quick and easy to validate, better might be more 
locks since it doesn't introduce a new paradigm and is more widely supported 
(C11 notwithstanding).


Regards,

Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Nico Williams
On Mon, Nov 23, 2015 at 11:56:54PM +, Viktor Dukhovni wrote:
> > It may be a good idea to rethink locking completely.
> 
> There is some glimmer of hope in that as various libcrypto structures
> become opaque, the locking moves from application code into the
> library.  For example, we now have (yet to be documented):
> 
>   X509_up_ref()

Ideally there would be very little locking in OpenSSL, and instead the
app would be responsible for most locking (if needed).

But that will be a lengthy transition, no?  Maybe we'll need functions
by which to indicate that the app will be doing locking for specific
objects.  Still, functions like RAND_bytes() that have no context object
will need locking, so new functions will be needed that take contexts so
as to minimize locking.

> Doing this requires a global review of the API, and filling in
> missing functions and documentation. :-(

Yes.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Hubert Kario
On Tuesday 24 November 2015 10:49:26 Paul Dale wrote:
> On Mon, 23 Nov 2015 11:11:37 PM Alessandro Ghedini wrote:
> > Is this TLS connections?
> 
> Yes, this is just measuring the TLS handshake.  Renegotiations
> predominately. We deliberately didn't test the bulk symmetric crypto
> phase of the connection.
> > I'd like to know more...
> 
> The data are a bit rough and ready but I've included what I can.  I
> wasn't directly involved in taking these measurements, so Chinese
> whispers are entirely possible.  I've been tasked with trying to find
> some performance enhancements.
> 
> The TLS stack results are:
> 
> stack CPU %  connections/s
> OpenSSL 85  11,935
> atomic patch22  16,465proof of concept only, the stack
> is broken elsewhere 
> NSS 47  46,507!

are you sure that the negotiated cipher suite is the same and that the 
NSS is not configured to reuse the server key share if you're using DHE 
or ECDHE?

-- 
Regards,
Hubert Kario
Senior Quality Engineer, QE BaseOS Security team
Web: www.cz.redhat.com
Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno, Czech Republic

signature.asc
Description: This is a digitally signed message part.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-29 Thread Paul Dale
On Mon, 23 Nov 2015 11:11:37 PM Alessandro Ghedini wrote:
> Is this TLS connections?

Yes, this is just measuring the TLS handshake.  Renegotiations predominately.
We deliberately didn't test the bulk symmetric crypto phase of the connection.


> I'd like to know more...

The data are a bit rough and ready but I've included what I can.  I wasn't 
directly involved in taking these measurements, so Chinese whispers are 
entirely possible.  I've been tasked with trying to find some performance 
enhancements.

The TLS stack results are:

stack CPU %  connections/s
OpenSSL 85  11,935
atomic patch22  16,465proof of concept only, the stack is 
broken elsewhere
NSS 47  46,507!




--
The top ten bottlenecks identified by Sun Studio profiler.
There are before my atomic change.

Lock % 16   CPU % 41
 +-   EVP_MD_CTX_cleanup
   +-EVP_PKEY_CTX_free
+-EVP_PKEY_free
 +-  CRYPTO_add_lock
  +-   locking_function
   +-  pthread_mutex_lock


Lock % 21   CPU % 50
+-   EVP_MD_CTX_copy_ex
  +-EVP_PKEY_CTX_dup
   +-CRYPTO_add_lock
+-  locking_function
 +- pthread_mutex_lock


Lock % 5   CPU % -
+- EVP_PKEY_CTX_dup
  +-   CRYPTO_add_lock
   +- locking_function
+- pthread_mutex_lock


Lock % 3   CPU % 3
  +- EVP_PKEY_CTX_new
 +-CRYPTO_add_lock
  +-   locking_function
   +-pthread_mutex_lock


Lock % -  CPU % 3
EVP_PKEY_free


Lock % 2   CPU % 2
pkey_hmac_copy


Lock % -   CPU % 1
 +-   Connection::destroy()
+-  Connection::close()
  +-   NZIO_Close
 +-   nzos_Destroy_Ctx
+-  SSL_free
  +-   ssl_cert_free
+-   ssl_cert_clear_certs


Lock % -   CPU % 2
+- ERR_clear_error
  +-  ERR_get_state
+-int_thread_get_item
  +-lh_retrieve


Locl % -   CPU % 1
X509_check_private_key


Lock % -   CPU % 1
sha1_block_data_order_ssse3

--

After making all CRYPTO_add calls atomic, which breaks things elsewhere in
the TLS stack, we obtained the above performance improvement and these new
bottlenecks -- take these with more of a grain of salt:

Lock % 6   CPU % 13
SSL_new 
 +-  46.000   (7%)nzos_Create_Ctx
  +-  38.530   (6%)SSL_new
  +-  35.370   (6%)CRYPTO_new_ex_data
 +-  35.310   (6%)int_new_ex_data
  +-  34.360   (5%)def_get_class
 +-  34.100   (5%)CRYPTO_lock
+-  34.020   (5%)locking_function
+-  32.620   (5%)pthread_mutex_lock


Lock % 6   CPU % 12
SSL_SESSION_new
 +-  40.090   (6%)d2i_SSL_SESSION
 +-  34.420   (6%)SSL_SESSION_new
 +-  33.930   (5%)CRYPTO_new_ex_data
 +-  33.880   (5%)int_new_ex_data
 +-  32.630   (5%)def_get_class
 +-  32.390   (5%)CRYPTO_lock
 +-  32.370   (5%)locking_function
 +-  30.860   (5%)pthread_mutex_lock


Lock % 9   CPU %22
BIO_new 
+  BIO_new  1%
+  BIO_set  4%
+  CRYPTO_new_ex_data   5%
+  int_new_ex_data 
+  def_get_class 
+  CRYPTO_lock 
+  locking_function 
+  pthread_mutex_lock


Lock % 11   CPU % 26
BIO_free
+-  BIO_free
+-  CRYPTO_free_ex_data
+-  int_free_ex_data
+-  def_get_class
+-  CRYPTO_lock
+-  locking_function
+-  pthread_mutex_lock


Lock % 17
tls1_PRF 
+- tls1_PRF
+-  tls1_P_hash

called from

+-  62.400   (10%)ssl3_get_finished
  +-  62.120   (10%)ssl3_get_message
 +-  44.940   (7%)ssl3_read_bytes
+-  28.530   (5%)ssl3_do_change_cipher_spec
+-  23.410   (4%)tls1_final_finish_mac

or

  +-  16.300   (3%)ssl3_take_mac
   +-  16.260   (3%)tls1_final_finish_mac

or

 +-  17.430   (3%)ssl3_send_finished
 +-  15.240   (2%)tls1_final_finish_mac

or

+-  65.480   (10%)tls1_setup_key_block
  +-  62.890   (10%)tls1_generate_key_block


Lock % 6
ERR_clear_error
+-   ERR_clear_error
+-   ERR_get_state
+-  int_thread_get_item
+-lh_retrieve
 +-  getrn


CPU % 18.93
SSL_free


CPU % 5.5
SSL_SESSION_free



> As Matt mentioned, I'm curently working exactly on this. Would it be possible
> for you to share your testing method and code? I'd like to verify that my
> patches actually go in the right direction before having them merged (or maybe
> you could do your tests on top of my patches, they should mostly work fine 
> even
> if the work is not complete yet, I think).

I'm not sure I can share code and associated infras

Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-25 Thread Kurt Roeckx
On Wed, Nov 25, 2015 at 01:02:29PM +0100, Florian Weimer wrote:
> On 11/23/2015 11:08 PM, Kurt Roeckx wrote:
> 
> > I think that we currently don't do any compile / link test to
> > detect features but that we instead explicitly say so for each
> > platform.
> > 
> > Anyway, the gcc the documentation is here:
> > https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
> > 
> > TLS support clearly isn't supported everywhere.
> 
> The most portable approach is to switch C++11, which provides you
> thread-local variables with destructors (and you also get portable
> atomics).  There are simply no standards-based C solutions as long as
> you have to support the Microsoft compiler under Windows.

Please note that we use C, not C++.  But C11 has the same atomics
extentions as C++11.

We're also currently still targetting C89/C90 (with some minor
extentions), but I think we should try to use them if the platform
supports it.


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-25 Thread Florian Weimer
On 11/23/2015 11:08 PM, Kurt Roeckx wrote:

> I think that we currently don't do any compile / link test to
> detect features but that we instead explicitly say so for each
> platform.
> 
> Anyway, the gcc the documentation is here:
> https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
> 
> TLS support clearly isn't supported everywhere.

The most portable approach is to switch C++11, which provides you
thread-local variables with destructors (and you also get portable
atomics).  There are simply no standards-based C solutions as long as
you have to support the Microsoft compiler under Windows.

Florian
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-24 Thread Kurt Roeckx
On Tue, Nov 24, 2015 at 03:16:59PM +, Jonathan Larmour wrote:
> On 23/11/15 20:34, Matt Caswell wrote:
> > One other option we could pursue is to use the "__thread" syntax for
> > thread local variables and avoid the need for libpthread altogether. An
> > earlier version of the code did this. I have not found a way to reliably
> > detect at compile time the capability to do this and my understanding is
> > that this is a lot less portable.
> 
> Behind the scenes, if pthreads are available on an OS, then the compiler
> will probably just end up using pthread functions anyway, meaning there's
> no advantage over having just linked against libpthread.

__thread can be used in something that doesn't use threads.  It
gets put in a different section in the elf file (.tbss, .tdata).
But that still requires that things like the dynamic loader
support it.


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-24 Thread Matt Caswell


On 24/11/15 15:16, Jonathan Larmour wrote:
> On 23/11/15 20:34, Matt Caswell wrote:
>> On 23/11/15 17:49, Nico Williams wrote:
>>
>>> Still, if -lpthread avoidance were still desired, you'd have to find an
>>> alternative to pthread_key_create(), pthread_getspecific(), and friends.
>>
>> Just a point to note about this. The async code that introduced this has
>> 3 different implementations:
>>
>> - posix
>> - windows
>> - null
>>
>> The detection code will check if you have a suitable posix or windows
>> implementation and use that. Otherwise the fallback position is to use
>> the null implementation. With "null" everything will compile and run but
>> you won't be able to use any of the new async functionality.
> 
> I hope there will be the ability to plug in different operating systems
> and it's not hard coded to just these choices. There are embedded systems
> which use OpenSSL which do not have POSIX thread APIs (POSIX is very
> heavyweight for many).
> 
> For example, GCC abstracts their threading dependenencies with a "gthread"
> API which OS implementers can write against. What would be bad would be
> direct calls to pthread* dotted around the OpenSSL code base.

We are talking specifically about the new async functionality here. The
three implementations are all you get for that. That has no impact on
the rest of OpenSSL functionality. So if you happen to be on a non-posix
and non-windows platform then OpenSSL will function as it does today.
You just don't get the new async functionality on top.

There are pthread references in one (internal) header file and one c
source code file. If the new threading API discussed elsewhere in this
thread goes ahead, then those references would be replaced with calls to
that instead.

Matt
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-24 Thread Jonathan Larmour
On 23/11/15 20:34, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> 
>> Still, if -lpthread avoidance were still desired, you'd have to find an
>> alternative to pthread_key_create(), pthread_getspecific(), and friends.
> 
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
> 
> - posix
> - windows
> - null
> 
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.

I hope there will be the ability to plug in different operating systems
and it's not hard coded to just these choices. There are embedded systems
which use OpenSSL which do not have POSIX thread APIs (POSIX is very
heavyweight for many).

For example, GCC abstracts their threading dependenencies with a "gthread"
API which OS implementers can write against. What would be bad would be
direct calls to pthread* dotted around the OpenSSL code base.

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

Behind the scenes, if pthreads are available on an OS, then the compiler
will probably just end up using pthread functions anyway, meaning there's
no advantage over having just linked against libpthread.

Jifl
-- 
eCosCentric Limited  http://www.eCosCentric.com/ The eCos experts
Barnwell House, Barnwell Drive, Cambridge, UK.   Tel: +44 1223 245571
Registered in England and Wales: Reg No 4422071.
--["Si fractum non sit, noli id reficere"]--   Opinions==mine
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Viktor Dukhovni

> On Nov 24, 2015, at 2:13 AM, Nico Williams  wrote:
> 
> If the OpenSSL team finally decides to do something about sane locking
> by default, then it will be a huge improvement.  If this thread provides
> the impetus, so much the better.

I hope that happens.  It would certainly make a big contribution to the
ecosystem around OpenSSL.  The key question is whether there is a right
someone outside the team to step forward and write the code, or whether
it needs be developed by an OpenSSL team member (and whether cycles for
that are available in the near term).  I hope that one or the other may
come to pass.

-- 
Viktor.



___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote:
> I wasn't saying there was anything wrong with mmap(), just that guard pages
> only work if you can guarantee your overrun hits the guard page (and
> doesn't just step over it). Large stack allocations increase the odds of
> 'stepping over' the guard pages. It's still better than not having guard
> pages, but they aren't a hard guarantee that you won't have mysterious bugs
> still.
> 
> You obviously realize that, but bn_prime() is the classic example of
> allocating very large chunks of memory on the stack.

Sure.  Rich Salz claims this is all fixed in master, so guard pages
strike me as a plus, not a wash.

> As for fibre's, I doubt it'll work in general, the issue there is simply
> the range of OS's OpenSSL supports. If you wire it in you still have to run
> with man+dog+world in the process, that's a hard ask. One of the good
> points about OpenSSL up until now, it tends to not break those big messy
> apps where a whole lot of independly developed code ends up in the same
> process.

That's... a joke, right?

OpenSSL very much does "break those big messy apps ...".  I don't see
the fibers project making that worse, nor better -- it's neutral.

Let me give you some examples.

Using Heimdal's libgssapi from Java JGSS (with the JNI GSS wrapper)
blows up due to no one initializing OpenSSL's lock callbacks.  Heimdal,
of course, uses OpenSSL.  Who should be initializing the lock callbacks?
Not the JVM -- why should it know/assume that some library used via
dlopen() will want OpenSSL?

But it's not a library's place to initialize OpenSSL's lock callbacks
either!  Suppose a Java program were loading via dlopen() *two*
libraries that use OpenSSL, and suppose different threads are racing to
do this.

This happens, and it happens because OpenSSL is used in so many
*libraries*, not just *programs*.  OpenSSL is a prime case of how a
library meant for use by programs comes to be used by libraries.
Another example is PKCS#11.  That's no excuse.  Use by libraries must be
supported.

And historically OpenSSL has been very bad at keeping its ABI
backwards-compatible, so DLL Hell cases often involve OpenSSL.

The more layers: the higher the likelihood of breakage involving
OpenSSL.  All threaded pluggable-software-using software (name services
switch, PAM, JNI, ...) is vulnerable to these problems.

If the OpenSSL team finally decides to do something about sane locking
by default, then it will be a huge improvement.  If this thread provides
the impetus, so much the better.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Viktor Dukhovni
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote:

> As for fibre's, I doubt it'll work in general, the issue there is simply
> the range of OS's OpenSSL supports. If you wire it in you still have to run
> with man+dog+world in the process, that's a hard ask. One of the good
> points about OpenSSL up until now, it tends to not break those big messy
> apps where a whole lot of independly developed code ends up in the same
> process.

It'll work because it has to be explicitly enabled by the application,
the OpenSSL library will provide the support on sufficiently capable
platforms, but they won't be used unless explicitly requested, and
they're only useful if there's some sort of asynchronous crypto
engine available.  They're light-weight glue to facilitate new
kinds of low-overhead asynchronous operations.  The use of fibres
will be rather rare until such engines become commonplace.

-- 
Viktor.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Peter Waltenberg

I wasn't saying there was anything wrong with mmap(), just that guard pages
only work if you can guarantee your overrun hits the guard page (and
doesn't just step over it). Large stack allocations increase the odds of
'stepping over' the guard pages. It's still better than not having guard
pages, but they aren't a hard guarantee that you won't have mysterious bugs
still.

You obviously realize that, but bn_prime() is the classic example of
allocating very large chunks of memory on the stack.


As for fibre's, I doubt it'll work in general, the issue there is simply
the range of OS's OpenSSL supports. If you wire it in you still have to run
with man+dog+world in the process, that's a hard ask. One of the good
points about OpenSSL up until now, it tends to not break those big messy
apps where a whole lot of independly developed code ends up in the same
process.


Peter




From:   Nico Williams 
To: openssl-dev@openssl.org
Date:   24/11/2015 10:42
Subject:    Re: [openssl-dev] [openssl-team] Discussion: design issue:
async and -lpthread
Sent by:"openssl-dev" 



On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wrote:
>
> "
> Please do.  It will make this much safer.  Also, you might want to run
> some experiments to find the best stack size on each platform.  The
> smaller the stack you can get away with, the better.
> "
>
> It does, but it also requires code changes in a few places.
probable_prime
> () in bn_prime.c being far and away the worst offender. We instrumented
our
> test code so we could find out what the stack usage was, for libcrypto
you
> can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

Are you saying that using mmap() would be onerous?  Something else?

> FYI, nothing elegant there, just have your code allocate and fill a large
> stack array then add check points further down to see how far you've
eaten
> into it.

Sure.

> "
> > > A guard page
> > > would allow one to safely tune down fiber stack size to the whatever
> > > OpenSSL actually needs for a given use.
> "
>
> Unless someone allocates a stack array larger than the size of the guard
> page and scribbles over another threads stack. This is another reason to
> never use large arrays on the stack.

alloca() and VLAs aren't safe for allocating more bytes than fit in a
guard page.  One should not use alloca()/VLAs for anything larger than
that.

This is no reason not to have a guard page!

This is a reason to have coding standards that address alloca()/VLAs.

> "
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??
> "
>
> Only that it's a compile time choice and OpenSSL is currently 'thread
> neutral' at runtime, not at compile time ?.

OpenSSL is "thread-neutral" at run-time as to locks and thread IDs
because of the lock/threadid callbacks.  But here we're talking about a
new feature (fibers) that uses thread-locals, and here using pthread
thread locals (pthread_getspecific()) clearly means no longer being
"thread-neutral" -- if I understand your definition of that term
anyways.

It's perfectly fine to use __thread in compiled code regardless of what
threading library is used, provided -of course- that __thread was
supported to begin with and that the compiler isn't lying.

> Compile time is easy, making this work at runtime is hard and
occasionally
> is really valuable - i.e. way back in the dim distant path when Linux had
> multiple thread packages available.

If the compiler accepts __thread but allows it to break at run-time
depending on the available threading libraries, then the compiler is
broken and should now have allowed __thread to begin with.  I can't find
anything describing such brokenness.  If you assert such brokenness
exists then please post links or instructions for how to reproduce it.

BTW, https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C.2B.2B
says that even Visual Studio supports thread-locals.  Though there's a
caveat that requires some care at configuration time:

  On Windows versions before Vista and Server 2008, __declspec(thread)
  works in DLLs only when those DLLs are bound to the executable, and
  will not work for those loaded with LoadLibrary() (a protection fault
  or data corruption may occur).[9]

There must, of course, be compilers that don't support thread locals
(pcc?).  Wouldn't it be fair to say that OpenSSL simply doesn't support
fibers on those compilers?  I think so.

Nico
--
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev



___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Salz, Rich
?  It does, but it also requires code changes in a few places. probable_prime() 
in bn_prime.c being far and away the worst offender.
This is fixed in master which uses malloc and free.  Actually, I think all 
egregious stack consumption has been fixed in master.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wrote:
> 
> "
> Please do.  It will make this much safer.  Also, you might want to run
> some experiments to find the best stack size on each platform.  The
> smaller the stack you can get away with, the better.
> "
> 
> It does, but it also requires code changes in a few places. probable_prime
> () in bn_prime.c being far and away the worst offender. We instrumented our
> test code so we could find out what the stack usage was, for libcrypto you
> can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

Are you saying that using mmap() would be onerous?  Something else?

> FYI, nothing elegant there, just have your code allocate and fill a large
> stack array then add check points further down to see how far you've eaten
> into it.

Sure.

> "
> > > A guard page
> > > would allow one to safely tune down fiber stack size to the whatever
> > > OpenSSL actually needs for a given use.
> "
> 
> Unless someone allocates a stack array larger than the size of the guard
> page and scribbles over another threads stack. This is another reason to
> never use large arrays on the stack.

alloca() and VLAs aren't safe for allocating more bytes than fit in a
guard page.  One should not use alloca()/VLAs for anything larger than
that.

This is no reason not to have a guard page!

This is a reason to have coding standards that address alloca()/VLAs.

> "
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??
> "
> 
> Only that it's a compile time choice and OpenSSL is currently 'thread
> neutral' at runtime, not at compile time ?.

OpenSSL is "thread-neutral" at run-time as to locks and thread IDs
because of the lock/threadid callbacks.  But here we're talking about a
new feature (fibers) that uses thread-locals, and here using pthread
thread locals (pthread_getspecific()) clearly means no longer being
"thread-neutral" -- if I understand your definition of that term
anyways.

It's perfectly fine to use __thread in compiled code regardless of what
threading library is used, provided -of course- that __thread was
supported to begin with and that the compiler isn't lying.

> Compile time is easy, making this work at runtime is hard and occasionally
> is really valuable - i.e. way back in the dim distant path when Linux had
> multiple thread packages available.

If the compiler accepts __thread but allows it to break at run-time
depending on the available threading libraries, then the compiler is
broken and should now have allowed __thread to begin with.  I can't find
anything describing such brokenness.  If you assert such brokenness
exists then please post links or instructions for how to reproduce it.

BTW, https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C.2B.2B
says that even Visual Studio supports thread-locals.  Though there's a
caveat that requires some care at configuration time:

  On Windows versions before Vista and Server 2008, __declspec(thread)
  works in DLLs only when those DLLs are bound to the executable, and
  will not work for those loaded with LoadLibrary() (a protection fault
  or data corruption may occur).[9]

There must, of course, be compilers that don't support thread locals
(pcc?).  Wouldn't it be fair to say that OpenSSL simply doesn't support
fibers on those compilers?  I think so.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Viktor Dukhovni
On Mon, Nov 23, 2015 at 05:28:18PM -0600, Nico Williams wrote:

> It may be a good idea to rethink locking completely.

There is some glimmer of hope in that as various libcrypto structures
become opaque, the locking moves from application code into the
library.  For example, we now have (yet to be documented):

X509_up_ref()

function, because applications can no longer directly manipulate
the reference count.  For compatibility with older releases Postfix
now defines:

src/tls/tls.h:
#if OPENSSL_VERSION_NUMBER < 0x1010L
#define X509_up_ref(x) CRYPTO_add(&((x)->references), 1, 
CRYPTO_LOCK_X509)
#endif

Mind you, both the old and new interfaces assume that the X509
object (presumably freshly returned by some function) that the
application wants to hang on somewhat longer than default, has not
been deallocated by any other thread in the mean time!  So in
particular we are assuming that the caller's thread still "owns"
structures that hold and are responsible for releasing at least
one of the references.

Anyway, with the locking moving into X509_up_ref(), the implementation
moves into the library, where it becomes easier to switch to
fine-grained locking.

We may in more cases need "get1" analogues of existing "get0"
interfaces, so that the user can obtain an already up-refed object
from the library, and just have to call the appropriate "free"
function at the end.

Doing this requires a global review of the API, and filling in
missing functions and documentation. :-(
 
-- 
Viktor.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Peter Waltenberg

"
Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.
"

It does, but it also requires code changes in a few places. probable_prime
() in bn_prime.c being far and away the worst offender. We instrumented our
test code so we could find out what the stack usage was, for libcrypto you
can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

FYI, nothing elegant there, just have your code allocate and fill a large
stack array then add check points further down to see how far you've eaten
into it.

"
> > A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
"

Unless someone allocates a stack array larger than the size of the guard
page and scribbles over another threads stack. This is another reason to
never use large arrays on the stack.

"
Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??
"

Only that it's a compile time choice and OpenSSL is currently 'thread
neutral' at runtime, not at compile time ?.
Compile time is easy, making this work at runtime is hard and occasionally
is really valuable - i.e. way back in the dim distant path when Linux had
multiple thread packages available.

Peter






From:   Nico Williams 
To:     openssl-dev@openssl.org
Date:   24/11/2015 06:49
Subject:    Re: [openssl-dev] [openssl-team] Discussion: design issue:
async and -lpthread
Sent by:"openssl-dev" 



On Mon, Nov 23, 2015 at 08:34:29PM +, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> > On a slightly related note, I asked and Viktor tells me that fiber
> > stacks are allocated with malloc().  I would prefer that they were
> > allocated with mmap(), because then you get a guard page.  A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
>
> Interesting. I'll take a look at that.

Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.

> > Still, if -lpthread avoidance were still desired, you'd have to find an
> > alternative to pthread_key_create(), pthread_getspecific(), and
friends.
>
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
>
> - posix
> - windows
> - null
>
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.
>
> Only the posix implementation uses the pthread* functions (and only for
> thread local storage). Part of the requirement of the posix detection
> code is that you have "Configured" with "threads" enabled. This is the
> default. However it is possible to explicitly configure with
> "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
> now will also force the use of the null implementation for async and
> hence will not use any of the pthread functions.

Ah, I see.  I think that's fine.  Maybe Viktor misunderstood this?

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):

  dnl Thread local storage
  have___thread=no
  AC_MSG_CHECKING(for thread-local storage)
  AC_LINK_IFELSE([AC_LANG_SOURCE([
  static __thread int x ;
  int main () { x = 123; return x; }
  ])], have___thread=yes)
  if test $have___thread = yes; then
 AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports
__thread])
  fi
  AC_MSG_RESULT($have___thread)

Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??

Nico
--
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev



___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 10:18:27PM +, Matt Caswell wrote:
> On 23/11/15 21:56, Paul Dale wrote:
> > Somewhat tangentially related to this is the how thread locking in
> > OpenSSL is slowing things up.
> 
> Alessandro has submitted an interesting patch to provide a much better
> threading API. See:
> 
> https://github.com/openssl/openssl/pull/451
> 
> I'm not sure what the current status of this is though.

I'll have to look myself.  A while back I posted a link to a
not-yet-complete attempt to make all the lock (and threadid...)
callbacks automatically initialized correctly/safely on Windows and
Unix.

It may be a good idea to rethink locking completely.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Alessandro Ghedini
On Tue, Nov 24, 2015 at 07:56:15am +1000, Paul Dale wrote:
> Somewhat tangentially related to this is the how thread locking in OpenSSL is
> slowing things up.
> 
> We've been doing some connection establishment performance analysis recently
> and have discovered a lot of waiting on locks is occurring.  By far the worst
> culprit is CRYPTO_LOCK_EVP_PKEY in CRYPTO_add_lock calls.  Changing these to
> gcc's atomic add operations (__sync_add_and_fetch) improved things
> significantly:
> 
> base OpenSSL11935 connections/s85% CPU utilisation
> with Atomic Change  16465 connections/s22% CPU utilisation
> 
> So fifty percent more connections for a quarter the CPU.

Is this TLS connections?

> At this point a number of other locks are causing the slow down.

I'd like to know more...

> Now, I'm not sure if such a change would be interesting to the community or
> not, but there definitely  is room for significant gains in the multi threaded
> locking.  Ignoring the atomic operations and moving to a separate lock per
> reference count would likely save a an amount of blocking -- is this a
> suitable use for dynamic locks?

As Matt mentioned, I'm curently working exactly on this. Would it be possible
for you to share your testing method and code? I'd like to verify that my
patches actually go in the right direction before having them merged (or maybe
you could do your tests on top of my patches, they should mostly work fine even
if the work is not complete yet, I think).

Thanks
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Paul Dale
Thanks for the quick reply.  That patch looks much improved on this front.

We'll wait for the changes and then retest performance.


Thanks again,

Pauli

On Mon, 23 Nov 2015 10:18:27 PM Matt Caswell wrote:
> 
> On 23/11/15 21:56, Paul Dale wrote:
> > Somewhat tangentially related to this is the how thread locking in
> > OpenSSL is slowing things up.
> 
> Alessandro has submitted an interesting patch to provide a much better
> threading API. See:
> 
> https://github.com/openssl/openssl/pull/451
> 
> I'm not sure what the current status of this is though.
> 
> Matt
> ___
> openssl-dev mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev

-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Salz, Rich
> https://github.com/openssl/openssl/pull/451
> I'm not sure what the current status of this is though.

I've made several comments I think need to be addressed before we should merge 
it.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Matt Caswell


On 23/11/15 21:56, Paul Dale wrote:
> Somewhat tangentially related to this is the how thread locking in
> OpenSSL is slowing things up.

Alessandro has submitted an interesting patch to provide a much better
threading API. See:

https://github.com/openssl/openssl/pull/451

I'm not sure what the current status of this is though.

Matt
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Kurt Roeckx
On Mon, Nov 23, 2015 at 02:48:25PM -0600, Nico Williams wrote:
> 
> I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):
> 
>   dnl Thread local storage
>   have___thread=no
>   AC_MSG_CHECKING(for thread-local storage)
>   AC_LINK_IFELSE([AC_LANG_SOURCE([
>   static __thread int x ;
>   int main () { x = 123; return x; }
>   ])], have___thread=yes)
>   if test $have___thread = yes; then
>  AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports 
> __thread])
>   fi
>   AC_MSG_RESULT($have___thread)
> 
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??

I think that we currently don't do any compile / link test to
detect features but that we instead explicitly say so for each
platform.

Anyway, the gcc the documentation is here:
https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

TLS support clearly isn't supported everywhere.


Kurt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Paul Dale
Somewhat tangentially related to this is the how thread locking in OpenSSL is 
slowing things up.

We've been doing some connection establishment performance analysis recently 
and have discovered a lot of waiting on locks is occurring.  By far the worst 
culprit is CRYPTO_LOCK_EVP_PKEY in CRYPTO_add_lock calls.  Changing these to 
gcc's atomic add operations (__sync_add_and_fetch) improved things 
significantly:

base OpenSSL11935 connections/s85% CPU utilisation
with Atomic Change  16465 connections/s22% CPU utilisation

So fifty percent more connections for a quarter the CPU.  At this point a 
number of other locks are causing the slow down.

Now, I'm not sure if such a change would be interesting to the community or 
not, but there definitely  is room for significant gains in the multi threaded 
locking.  Ignoring the atomic operations and moving to a separate lock per 
reference count would likely save a an amount of blocking -- is this a suitable 
use for dynamic locks?


I also submitted a bug report and fix recently [openssl.org #4135] to do with 
threading, which will hopefully get included eventually.


Regards,

Pauli

-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 08:34:29PM +, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> > On a slightly related note, I asked and Viktor tells me that fiber
> > stacks are allocated with malloc().  I would prefer that they were
> > allocated with mmap(), because then you get a guard page.  A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
> 
> Interesting. I'll take a look at that.

Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.

> > Still, if -lpthread avoidance were still desired, you'd have to find an
> > alternative to pthread_key_create(), pthread_getspecific(), and friends.
> 
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
> 
> - posix
> - windows
> - null
> 
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.
> 
> Only the posix implementation uses the pthread* functions (and only for
> thread local storage). Part of the requirement of the posix detection
> code is that you have "Configured" with "threads" enabled. This is the
> default. However it is possible to explicitly configure with
> "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
> now will also force the use of the null implementation for async and
> hence will not use any of the pthread functions.

Ah, I see.  I think that's fine.  Maybe Viktor misunderstood this?

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):

  dnl Thread local storage
  have___thread=no
  AC_MSG_CHECKING(for thread-local storage)
  AC_LINK_IFELSE([AC_LANG_SOURCE([
  static __thread int x ;
  int main () { x = 123; return x; }
  ])], have___thread=yes)
  if test $have___thread = yes; then
 AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread])
  fi
  AC_MSG_RESULT($have___thread)

Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Matt Caswell


On 23/11/15 17:49, Nico Williams wrote:
> [Resend, with slight edits.]
> 
> [Viktor asked me for my advice on this issue and bounced me the post
>  that I'm following up to.  -Nico]
> 
> The summary of what I've to say is that making libcrypto and libssl need
> -lpthread is something that does require discussion, as it will have
> detrimental effects on some users.  Personally, I think that those
> detrimental effects are a good thing (see below), but nonetheless I
> encourage you to discuss whether this is actually what OpenSSL should
> do.  In particular, it may be possible to avoid -lpthread on some
> systems and still get a subset of lipthread functionality from libc or
> the compiler (e.g., thread-locals), and that may be worth doing.
> 
> On a slightly related note, I asked and Viktor tells me that fiber
> stacks are allocated with malloc().  I would prefer that they were
> allocated with mmap(), because then you get a guard page.  A guard page
> would allow one to safely tune down fiber stack size to the whatever
> OpenSSL actually needs for a given use.

Interesting. I'll take a look at that.

> Still, if -lpthread avoidance were still desired, you'd have to find an
> alternative to pthread_key_create(), pthread_getspecific(), and friends.

Just a point to note about this. The async code that introduced this has
3 different implementations:

- posix
- windows
- null

The detection code will check if you have a suitable posix or windows
implementation and use that. Otherwise the fallback position is to use
the null implementation. With "null" everything will compile and run but
you won't be able to use any of the new async functionality.

Only the posix implementation uses the pthread* functions (and only for
thread local storage). Part of the requirement of the posix detection
code is that you have "Configured" with "threads" enabled. This is the
default. However it is possible to explicitly configure with
"no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
now will also force the use of the null implementation for async and
hence will not use any of the pthread functions.

One other option we could pursue is to use the "__thread" syntax for
thread local variables and avoid the need for libpthread altogether. An
earlier version of the code did this. I have not found a way to reliably
detect at compile time the capability to do this and my understanding is
that this is a lot less portable.

Matt

___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:

[NetBSD header commentary extracts:]

> /*
>  * Use macros to rename many pthread functions to the corresponding
>  * libc symbols which are either trivial/no-op stubs or the real

No renaming is necessary if one's link-editor and RTLD support
filters...

  An ELF filter is a forwarding saying that the implementation of a
  symbol in the filter object is to be found elsewhere, e.g., in some
  other object.

  In Solaris/Illumos this is used to maintain backwards compatibility
  when symbols get moved from one library to another.

  E.g., libpthread and libdl moved into libc, but they remain as filters
  so that objects linked with those old libraries will a) still find
  them, b) still find the symbols the expect in them, c) get the correct
  implementations of those symbols from the object now providing them
  (here: libc).

  Filters are awesome.  Lack of universal support for them is very
  frustrating.  On Linux, for example, it's possible to create filters
  with strong link-editor-fu, but the RTLD does not support them.

>  * thing, depending on whether libpthread is linked in to the
>  * program. This permits code, particularly libraries that do not
>  * directly use threads but want to be thread-safe in the presence of
>  * threaded callers, to use pthread mutexes and the like without
>  * unnecessairly including libpthread in their linkage.

Just move these into libc, lock stock and barrel, and if you want to
have fast versions for the single-threaded case, just arrange for slower
versions to get hot-patched-in when pthread_create() is first called.

Or even just use a branch/computed jump (whichever is faster) to avoid
having to hot-patch.

It's important that pthread_mutex_init/lock/trylock/unlock/destroy work
correctly even in the optimized single-threaded case.  The main thread
might init and acquire some locks then create a second thread that will
block acquiring those locks.

>  * Left out of this list are functions that can't sensibly be trivial
>  * or no-op stubs in a single-threaded process (pthread_create,
>  * pthread_kill, pthread_detach), functions that normally block and
>  * wait for another thread to do something (pthread_join), and

Just move them into libc anyways.

>  * functions that don't make sense without the previous functions
>  * (pthread_attr_*). The pthread_cond_wait and pthread_cond_timedwait
>  * functions are useful in implementing certain protection mechanisms,
>  * though a non-buggy app shouldn't end up calling them in
>  * single-threaded mode.
>  *
>  * The rename is done as:
>  * #define pthread_foo__libc_foo
>  * instead of
>  * #define pthread_foo(x) __libc_foo((x))
>  * in order that taking the address of the function ("func =
>  * &pthread_foo;") continue to work.
>  *
>  * POSIX/SUSv3 requires that its functions exist as functions (even if
>  * macro versions exist) and specifically that "#undef pthread_foo" is
>  * legal and should not break anything. Code that does such will not
>  * successfully get the stub behavior implemented here and will
>  * require libpthread to be linked in.
>  */

All the more reason to not rename these symbols!  All you need is ELF
filter support.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
[Resend, with slight edits.]

[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On a slightly related note, I asked and Viktor tells me that fiber
stacks are allocated with malloc().  I would prefer that they were
allocated with mmap(), because then you get a guard page.  A guard page
would allow one to safely tune down fiber stack size to the whatever
OpenSSL actually needs for a given use.

Comments below.

On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:
> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
> 
> commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
> Author: Matt Caswell 
> Date:   Thu Nov 19 14:55:09 2015 +
> 
>   Add pthread support
> 
>   The forthcoming async code needs to use pthread thread local variables. 
> This
>   updates the various Configurations to add the necessary flags. In many 
> cases
>   this is an educated guess as I don't have access to most of these
>   environments! There is likely to be some tweaking needed.
> 
>   Reviewed-by: Kurt Roeckx 
> 
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside some
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
> 
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
discussed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-specifics the obvious answer is to use C compiler
thread-local variable support.  This might or might not be available;
this would have to be determined at build configuration ti

Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:
> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
> 
> commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
> Author: Matt Caswell 
> Date:   Thu Nov 19 14:55:09 2015 +
> 
>   Add pthread support
> 
>   The forthcoming async code needs to use pthread thread local variables. 
> This
>   updates the various Configurations to add the necessary flags. In many 
> cases
>   this is an educated guess as I don't have access to most of these
>   environments! There is likely to be some tweaking needed.
> 
>   Reviewed-by: Kurt Roeckx 
> 
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside som
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
> 
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
disucssed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-specifics the obvious answer is to use C compiler
thread-local variable support.  This might or might not be available;
this would have to be determined at build configuration time.  Still, if
where the compiler supports thread-locals, OpenSSL could avoid
-lpthread.

For pthread mutex functions (for lock callbacks) and, perhaps,
pthread_once() (for automatic initialization of lock callbacks,
perhaps), it's very difficult to create alternatives to -lpthread, but
it can conceivably be done within a library like OpenSSL.  I don't think
it'