Re: [openssl-project] DRBGs, threads and locking

2018-03-14 Thread Paul Dale
Consider a hypothetical scenario of a large high performance multi-user 
database.  All connections are via TLS.  A lot of other cryptographic 
operations are done, some involving random numbers.  This is an example where 
the dual ec attack could have been partially mitigated with per TLS DRBGs.  
Each TLS connection would reveal enough over the wire to be easily broken.  Yes 
bad.  Very bad.  However, the majority of the database would remain secure 
simply because there is no way to expand the attack from the broken DRBG to its 
parent as this requires reversing a hash.

The extra separation of per SSL DRBS could have helped in the past, nobody 
knows if it would again.  I'd like to see this but can live without it (our 
current DRBS are secure after all :)

As I mentioned in my previous email, non-locking is important for performance, 
I definitely want this.  I've not found the numbers yet :(


Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

From: Tim Hudson [mailto:t...@cryptsoft.com] 
Sent: Wednesday, 14 March 2018 1:15 PM
To: openssl-project@openssl.org
Subject: Re: [openssl-project] DRBGs, threads and locking

We have to keep in mind what threats we care about and their practicality. 
The security of a DRBG is dependent on both the secrecy of the seed material 
provided and the security of the algorithm in terms of its output not leaking 
information that materially leaks the internal state in a manner that enables 
it to be discovered or reversed in a manner to enable determination of previous 
or future outputs.

For some of the arguments used to date there appears to be an assumption that 
there is a practical difference between a broken DRBG algorithm such that it is 
not such a security issue if we separate out the DRBG instances on a per SSL 
connection.
In real terms if a DRBG is broken and its state is able to be determined 
remotely there is no practical difference in separating DRBG instances - they 
are all equally vulnerable in the same manner. 
In the case of the DualEC-DRBG this was clear - and no one I've seen has ever 
suggested that you were safer if you had separate instances of a broken 
algorithm for a DRBG - it makes no practical difference to the security at all.
Sure there is a slight technical difference - but from a security perspective 
there is no difference - you are susceptible to the same attack - so the minor 
technical difference offers no actual meaningful security value - and everyone 
that has referenced this to date has also indicated that they don't think that 
there is actually any real practical value to the difference - it has been more 
of a "it cannot harm" sort of comment. 

In more general terms we need to have a clear view on what we think about our 
thread model - what is considered inside the scope of what we care to address - 
and what is frankly outside the scope (for our view). 

• We don't consider attacks from the same process against itself within our 
threat model. 
• Correspondingly we don't consider attacks from one thread against another 
thread without our threat model.
• We don't consider privileged user attacks against the user in our threat 
model (i.e. root can read the memory of the process on most Unix-like systems). 
• We also don't actually consider a need to protect all secret information from 
every possible other bug that might leak arbitrary parts of memory. We could. 
But we don't. And if we did we would need to protect both the seeding material 
for the DRBG and its internal state and potentially its output. We don't do 
that - because that isn't within our threat model.

Typical applications share an SSL_CTX between multiple SSL instances and we 
maintain the session cache against the SSL_CTX. This may be in a single process 
(thread) or shared across multiple threads - or even shared across multiple 
prcesses (which is simply the same as being in a single process from our 
perspective where the "magic" to coordinate the session id cache between 
processes is left to the developer/user). 

In a FIPS context, every DRBG has requirements on its inputs (seeding) and on 
maintaining a continuous RNG test (block-based compare for non-repeating 
outputs at a block level). 
All of these would be a per-instance requirement on the DRBG. They have to be 
factored in.

There is also the argument that locking is bad and fewer locks are better - and 
that argument needs to be backed up by looking at the overall status - which 
particular application model are we concerned about? Have we measured it? Have 
we figured out where the bottlenecks are? Have we worked through optimising the 
actual areas of performance impact? Or are we just prematurely optimising? 
Excessive locking will have an impact for certain application models - but I 
don't think anyone is suggesting that what we had previously was excessive - 
and given the signif

Re: [openssl-project] DRBGs, threads and locking

2018-03-14 Thread Kurt Roeckx
On Wed, Mar 14, 2018 at 12:49:46PM +, Salz, Rich wrote:
> So is having a high-quality, lockless (per-thread) CSPRNG good enough for 
> now?  Phrased like that, I think so.  We have enough other stuff to do.  So 
> +1 to Kurt's per-thread approach.

I think it's better than what we have in 1.1.0. And if we think we
can improve it, I suggest we improve it after 1.1.1.

So I think the discussion is both about speed and security.

>From what I understand from various things is that the random
number generate is now for some workloads at least a limiting
factor. Having it lockless and per thread is both the easiest
thing to do and gives the best performance.

When it comes to security, there seems to be a concern that from
the public data it might be possible to determine the internal
state, and that this might possibly have an effect on the security
of a different connection. But we have the same situation now in
1.1.0. And I'm still waiting for people to properly explain that
having it per SSL is better or not, there at least doesn't seem
to be an agreement on that part.


Kurt

___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project


Re: [openssl-project] DRBGs, threads and locking

2018-03-14 Thread Salz, Rich
So is having a high-quality, lockless (per-thread) CSPRNG good enough for now?  
Phrased like that, I think so.  We have enough other stuff to do.  So +1 to 
Kurt's per-thread approach.

___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project


Re: [openssl-project] DRBGs, threads and locking

2018-03-14 Thread Dr. Matthias St. Pierre
It is good that Tim hit the break and requested a discussion. That was
overdue and it is unfortunate that we did not start it much earlier. I
think Tim brought up to important points:

1. We need to to pause for a discussion to determine the direction to
go. Otherwise the DRBG implementation will become an ever moving target

2. Instead of guessing about the performance impacts of our change we
should more rely on measuring them.

Ad 1: Unfortunately, it was not clear until a few days ago that there
was so much disagreement on how to do it right. In view of the upcoming
beta freeze, it would probably be the best to leave the status-quo on
master for 1.1.1, and continue the discussion under the premise that it
will be implemented in 1.1.2-dev or 1.2.0-dev. This would give us more
time to think about the optimal solution.

Ad 2: All the haste and the last-minute changes were in some way caused
by the fact that we did not notice the performance regressions until
#5559, because we were not measuring them on a regular base. Having
'openssl speed' is not sufficient and it would really be great if we
could have feedback about the actual performance of "real world" high
performance web servers. But that is out of reach for ordinary persons
and can only be done by a larger company.


per-thread vs. per-ssl

As for the discussion about whether per-thread or per-ssl DRBGs are
better from a security perspective: I am not a professional
cryptographer, so I'm not in the position to decide that. But I can say
that currently the per-thread implementation is far superior when it
comes to simplicity of design. And simplicity of design is an important
countermeasure to prevent bugs and security holes. One of the reasons
why Kurt reverted the per-ssl implementation was that it was a bit ugly
and unsatisfactory, because a lot of changes had to be made to functions
for handing down the correct DRBG through the callstack down to the low
level functions that needed to use it.


If per-ssl DRBGs are desired, I propose the following solution which
reconciles the two approaches without loosing simplicity: If the public
and private DRBGs are thread-local anyway, then it is easy to implement
them as a stack so that the per-thread DRBG can be exchanged by the
per-ssl DRBG within the scope of a function. The correct DRBG would then
be picked up automatically further down in the stack when RAND_bytes()
resp. RAND_priv_bytes() is called.

I am thinking of an API like

    RAND_DRBG_push_public(RAND_DRBG *public);
    RAND_DRBG_push_private(RAND_DRBG *private);

    RAND_DRBG_pop_public();
    RAND_DRBG_push_private();

As said before, this does not imply a preference in one of the two
directions, it's only a suggestion about how it could be implemented.
And I will not throw in a quick pull request for this... ;-)

Matthias


___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project

Re: [openssl-project] DRBGs, threads and locking

2018-03-13 Thread Paul Dale
We did a performance analysis for Oracle’s equivalent of Nginx, OTD, about two 
years ago.  We were looking at the number of connections per second that could 
be established and the limiting factor was locking in ssleay_rand_bytes.  
Approximately a third of the CPU and over 90% of the lock wait time was there 
from memory.  For this kind of workload (many threads, many connections) a 
non-locking RNG would have been an improvement.  I’ll see if I can find the 
analyses and then find out what I can release.

This was using 1.0.2 not 1.1 so things might have changed.


Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

From: Tim Hudson [mailto:t...@cryptsoft.com] 
Sent: Wednesday, 14 March 2018 1:15 PM
To: openssl-project@openssl.org
Subject: Re: [openssl-project] DRBGs, threads and locking

We have to keep in mind what threats we care about and their practicality. 
The security of a DRBG is dependent on both the secrecy of the seed material 
provided and the security of the algorithm in terms of its output not leaking 
information that materially leaks the internal state in a manner that enables 
it to be discovered or reversed in a manner to enable determination of previous 
or future outputs.

For some of the arguments used to date there appears to be an assumption that 
there is a practical difference between a broken DRBG algorithm such that it is 
not such a security issue if we separate out the DRBG instances on a per SSL 
connection.
In real terms if a DRBG is broken and its state is able to be determined 
remotely there is no practical difference in separating DRBG instances - they 
are all equally vulnerable in the same manner. 
In the case of the DualEC-DRBG this was clear - and no one I've seen has ever 
suggested that you were safer if you had separate instances of a broken 
algorithm for a DRBG - it makes no practical difference to the security at all.
Sure there is a slight technical difference - but from a security perspective 
there is no difference - you are susceptible to the same attack - so the minor 
technical difference offers no actual meaningful security value - and everyone 
that has referenced this to date has also indicated that they don't think that 
there is actually any real practical value to the difference - it has been more 
of a "it cannot harm" sort of comment. 

In more general terms we need to have a clear view on what we think about our 
thread model - what is considered inside the scope of what we care to address - 
and what is frankly outside the scope (for our view). 

• We don't consider attacks from the same process against itself within our 
threat model. 
• Correspondingly we don't consider attacks from one thread against another 
thread without our threat model.
• We don't consider privileged user attacks against the user in our threat 
model (i.e. root can read the memory of the process on most Unix-like systems). 
• We also don't actually consider a need to protect all secret information from 
every possible other bug that might leak arbitrary parts of memory. We could. 
But we don't. And if we did we would need to protect both the seeding material 
for the DRBG and its internal state and potentially its output. We don't do 
that - because that isn't within our threat model.

Typical applications share an SSL_CTX between multiple SSL instances and we 
maintain the session cache against the SSL_CTX. This may be in a single process 
(thread) or shared across multiple threads - or even shared across multiple 
prcesses (which is simply the same as being in a single process from our 
perspective where the "magic" to coordinate the session id cache between 
processes is left to the developer/user). 

In a FIPS context, every DRBG has requirements on its inputs (seeding) and on 
maintaining a continuous RNG test (block-based compare for non-repeating 
outputs at a block level). 
All of these would be a per-instance requirement on the DRBG. They have to be 
factored in.

There is also the argument that locking is bad and fewer locks are better - and 
that argument needs to be backed up by looking at the overall status - which 
particular application model are we concerned about? Have we measured it? Have 
we figured out where the bottlenecks are? Have we worked through optimising the 
actual areas of performance impact? Or are we just prematurely optimising? 
Excessive locking will have an impact for certain application models - but I 
don't think anyone is suggesting that what we had previously was excessive - 
and given the significant performance impact of the recent changes which went 
unmeasured and unaddressed I think it is clear we haven't been measuring 
performance related items for the DRBG at all to date - so there wasn't any 
"science" behind the choices made.

Simple, clear, well documented code with good tests and known architectural 
assumption

Re: [openssl-project] DRBGs, threads and locking

2018-03-13 Thread Paul Dale
I think the intention is to ditch the drbg from the ssl object and then call 
the global function (either public or private) which has been changed to use 
the current thread's drbg rather than being global.

I'm in favour of a single per ssl drbg still, I'm not sure what a clean way to 
hook it up to avoid locks is (yet).


Pauli
-- 
Oracle
Dr Paul Dale | Cryptographer | Network Security & Encryption 
Phone +61 7 3031 7217
Oracle Australia

-Original Message-
From: Salz, Rich [mailto:rs...@akamai.com] 
Sent: Wednesday, 14 March 2018 11:27 AM
To: openssl-project@openssl.org
Subject: Re: [openssl-project] DRBGs, threads and locking

So a major reason, as you explained, for having per-thread DRBG's is to reduce 
contention.  When threadA creates an SSL object, the parent DRBG will be the 
threadA one. Therefore you have to introducing locking, since threadA might 
create two SSL objects and they could end up being used in threadB and threadC 
and each need to reseed from their parent.  In order to do that safely, threadA 
also has to do the locking to avoid conflict. That defeats the major gain of 
per-thread.

I think having the SSL object parent be whatever the *current* thread DRBG is 
seems like the best, if not only, way to go.

 

___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project
___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project


Re: [openssl-project] DRBGs, threads and locking

2018-03-13 Thread Tim Hudson
We have to keep in mind what threats we care about and their practicality.
The security of a DRBG is dependent on both the secrecy of the seed
material provided and the security of the algorithm in terms of its output
not leaking information that materially leaks the internal state in a
manner that enables it to be discovered or reversed in a manner to enable
determination of previous or future outputs.

For some of the arguments used to date there appears to be an assumption
that there is a practical difference between a broken DRBG algorithm such
that it is not such a security issue if we separate out the DRBG instances
on a per SSL connection.
In real terms if a DRBG is broken and its state is able to be determined
remotely there is no practical difference in separating DRBG instances -
they are all equally vulnerable in the same manner.
In the case of the DualEC-DRBG this was clear - and no one I've seen has
ever suggested that you were safer if you had separate instances of a
broken algorithm for a DRBG - it makes no practical difference to the
security at all.
Sure there is a slight technical difference - but from a security
perspective there is no difference - you are susceptible to the same attack
- so the minor technical difference offers no actual meaningful security
value - and everyone that has referenced this to date has also indicated
that they don't think that there is actually any real practical value to
the difference - it has been more of a "it cannot harm" sort of comment.

In more general terms we need to have a clear view on what we think about
our thread model - what is considered inside the scope of what we care to
address - and what is frankly outside the scope (for our view).


   - We don't consider attacks from the same process against itself within
   our threat model.
   - Correspondingly we don't consider attacks from one thread against
   another thread without our threat model.
   - We don't consider privileged user attacks against the user in our
   threat model (i.e. root can read the memory of the process on most
   Unix-like systems).
   - We also don't actually consider a need to protect all secret
   information from every possible other bug that might leak arbitrary parts
   of memory. We could. But we don't. And if we did we would need to protect
   both the seeding material for the DRBG and its internal state and
   potentially its output. We don't do that - because that isn't within our
   threat model.


Typical applications share an SSL_CTX between multiple SSL instances and we
maintain the session cache against the SSL_CTX. This may be in a single
process (thread) or shared across multiple threads - or even shared across
multiple prcesses (which is simply the same as being in a single process
from our perspective where the "magic" to coordinate the session id cache
between processes is left to the developer/user).

In a FIPS context, every DRBG has requirements on its inputs (seeding) and
on maintaining a continuous RNG test (block-based compare for non-repeating
outputs at a block level).
All of these would be a per-instance requirement on the DRBG. They have to
be factored in.

There is also the argument that locking is bad and fewer locks are better -
and that argument needs to be backed up by looking at the overall status -
which particular application model are we concerned about? Have we measured
it? Have we figured out where the bottlenecks are? Have we worked through
optimising the actual areas of performance impact? Or are we just
prematurely optimising? Excessive locking will have an impact for certain
application models - but I don't think anyone is suggesting that what we
had previously was excessive - and given the significant performance impact
of the recent changes which went unmeasured and unaddressed I think it is
clear we haven't been measuring performance related items for the DRBG at
all to date - so there wasn't any "science" behind the choices made.

Simple, clear, well documented code with good tests and known architectural
assumptions is what we are trying to achieve - and my sense from the
conversations on this topic to date was that we don't have a consensus as
to what problem we are actually trying to solve - so the design approach
shifts, and shifts again - all of which are the authors of the PRs
responding to what is (in my view at least) conflicting suggestions based
on different assumptions.

That is what I put the -1 on the the PR - to have this discussion - and
agree on what we are trying to solve - and also agree on what we are not
trying to solve. And perhaps we can actually document some of our "threat
model" - as I'm sure we have different views on that as well.

I don't think we should have per-SSL DRBGs - it offers no meaningful
security value. We could have a per-SSL_CTX - but I'm not sure that is
needed. We could have a per-thread - but again that is unclear if we
actually need that either.
My thoughts are per-SSL_CTX 

Re: [openssl-project] DRBGs, threads and locking

2018-03-13 Thread Salz, Rich
So a major reason, as you explained, for having per-thread DRBG's is to reduce 
contention.  When threadA creates an SSL object, the parent DRBG will be the 
threadA one. Therefore you have to introducing locking, since threadA might 
create two SSL objects and they could end up being used in threadB and threadC 
and each need to reseed from their parent.  In order to do that safely, threadA 
also has to do the locking to avoid conflict. That defeats the major gain of 
per-thread.

I think having the SSL object parent be whatever the *current* thread DRBG is 
seems like the best, if not only, way to go.

 

___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project


Re: [openssl-project] DRBGs, threads and locking

2018-03-13 Thread Benjamin Kaduk
On Wed, Mar 14, 2018 at 01:27:47AM +0100, Kurt Roeckx wrote:
> My solution is to just have 1 master DRBG, and a public and
> private DRBG per thread. The only lock that then is needed is when
> the public or private DRBG needs to reseed. All the rest of the
> code can stay just as it is, but we might want to change some
> places to use the (thread local) private DRBG, which is what #4665
> is about.
[...]
> So the suggestion was to still have a per SSL public DRBG, but
> then the problem is that that SSL object might have moved to a
> different thread between creating and being used and so that the
> parent DRBG might actually belong to a different thread. One
> solution there is that we just take the current thread's public
> DRBG as parent instead of the original threads public DRBG.

This should be fine from a thread-safety point of view.  I don't
know whether it could potentially affect the standards compliance,
for the intermediate DRBG to potentially change over time (even
though it still chains to a common grandparent/master DRBG).

Per-SSL DRBGs (especially if split to public and private) seem
excessive to me, so architecture described in the quoted text seems
like the best option, to me.

-Ben
___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project


[openssl-project] DRBGs, threads and locking

2018-03-13 Thread Kurt Roeckx
So Tim has voted -1 on PR #5547 and wants us to discuss it here
and vote on it.

I don't know if it's clear to everybody what this is about. If
something is not clear, please ask. PR #5461 contains a
lot of documentation updates that is related to it, and it might
be useful to read it as background information. There are many
other related issues and pull request. I will try to make a basic
summary here.

The DRBGs is what we use to generate random numbers. They have the
possibility to chain, where 1 DRBG gets it's entropy from it's
parent. You can make long chains with this.

The current state in master is that you have 1 master DRBG that
gets it's entropy from the OS, and then 1 public and 1 private
DRBG that gets the entropy from the master. Then there is an DRBG
in the SSL struct that gets it's entropy from the public DRBG.

I have 2 problems with the current setup that I would like to
solve:
- On SSL_new() we create a new DRBG. That needs to get initialized
  and that requires getting entropy from somewhere, where that's
  currently the global public DRBG. This requires taking a lock.
- If we actually want to use the DRBG for everything related to
  that SSL connections to avoid having to lock a global DRBG
  everything that SSL code calls needs to be able to say with
  which DRBG it needs to generate random data. There was already 1
  PR related to this merged. PR #5510 is still open that deals at
  least with mot of it that I know about. I find that a very ugly
  hack and really don't see an easy way improve it.

My solution is to just have 1 master DRBG, and a public and
private DRBG per thread. The only lock that then is needed is when
the public or private DRBG needs to reseed. All the rest of the
code can stay just as it is, but we might want to change some
places to use the (thread local) private DRBG, which is what #4665
is about.

Some people seem to have a desire to have a separate the DRBGs per
SSL connection, at least for the public data. This is for cases
where from that data it would be possible to get the internal
state of the DRBG that is at least allegedly possible with the
DUAL_EC_DRBG. I believe this is mitigated somewhat by our mixing
in additional data when calling RAND_bytes() (or RAND_priv_bytes()),
but I'm not an expert in this and will leave this to the others to
comment on.

So the suggestion was to still have a per SSL public DRBG, but
then the problem is that that SSL object might have moved to a
different thread between creating and being used and so that the
parent DRBG might actually belong to a different thread. One
solution there is that we just take the current thread's public
DRBG as parent instead of the original threads public DRBG.


I hope I at least covered most of it.


Kurt

___
openssl-project mailing list
openssl-project@openssl.org
https://mta.openssl.org/mailman/listinfo/openssl-project