Re: [openssl-project] DRBGs, threads and locking
So it's not really clear currently that there is someone who doesn't want to see the per thread DRBGs. Should we hold a vote on this? Or Tim, do you want to retractor your -1? Kurt ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
Consider a hypothetical scenario of a large high performance multi-user database. All connections are via TLS. A lot of other cryptographic operations are done, some involving random numbers. This is an example where the dual ec attack could have been partially mitigated with per TLS DRBGs. Each TLS connection would reveal enough over the wire to be easily broken. Yes bad. Very bad. However, the majority of the database would remain secure simply because there is no way to expand the attack from the broken DRBG to its parent as this requires reversing a hash. The extra separation of per SSL DRBS could have helped in the past, nobody knows if it would again. I'd like to see this but can live without it (our current DRBS are secure after all :) As I mentioned in my previous email, non-locking is important for performance, I definitely want this. I've not found the numbers yet :( Pauli -- Oracle Dr Paul Dale | Cryptographer | Network Security & Encryption Phone +61 7 3031 7217 Oracle Australia From: Tim Hudson [mailto:t...@cryptsoft.com] Sent: Wednesday, 14 March 2018 1:15 PM To: openssl-project@openssl.org Subject: Re: [openssl-project] DRBGs, threads and locking We have to keep in mind what threats we care about and their practicality. The security of a DRBG is dependent on both the secrecy of the seed material provided and the security of the algorithm in terms of its output not leaking information that materially leaks the internal state in a manner that enables it to be discovered or reversed in a manner to enable determination of previous or future outputs. For some of the arguments used to date there appears to be an assumption that there is a practical difference between a broken DRBG algorithm such that it is not such a security issue if we separate out the DRBG instances on a per SSL connection. In real terms if a DRBG is broken and its state is able to be determined remotely there is no practical difference in separating DRBG instances - they are all equally vulnerable in the same manner. In the case of the DualEC-DRBG this was clear - and no one I've seen has ever suggested that you were safer if you had separate instances of a broken algorithm for a DRBG - it makes no practical difference to the security at all. Sure there is a slight technical difference - but from a security perspective there is no difference - you are susceptible to the same attack - so the minor technical difference offers no actual meaningful security value - and everyone that has referenced this to date has also indicated that they don't think that there is actually any real practical value to the difference - it has been more of a "it cannot harm" sort of comment. In more general terms we need to have a clear view on what we think about our thread model - what is considered inside the scope of what we care to address - and what is frankly outside the scope (for our view). • We don't consider attacks from the same process against itself within our threat model. • Correspondingly we don't consider attacks from one thread against another thread without our threat model. • We don't consider privileged user attacks against the user in our threat model (i.e. root can read the memory of the process on most Unix-like systems). • We also don't actually consider a need to protect all secret information from every possible other bug that might leak arbitrary parts of memory. We could. But we don't. And if we did we would need to protect both the seeding material for the DRBG and its internal state and potentially its output. We don't do that - because that isn't within our threat model. Typical applications share an SSL_CTX between multiple SSL instances and we maintain the session cache against the SSL_CTX. This may be in a single process (thread) or shared across multiple threads - or even shared across multiple prcesses (which is simply the same as being in a single process from our perspective where the "magic" to coordinate the session id cache between processes is left to the developer/user). In a FIPS context, every DRBG has requirements on its inputs (seeding) and on maintaining a continuous RNG test (block-based compare for non-repeating outputs at a block level). All of these would be a per-instance requirement on the DRBG. They have to be factored in. There is also the argument that locking is bad and fewer locks are better - and that argument needs to be backed up by looking at the overall status - which particular application model are we concerned about? Have we measured it? Have we figured out where the bottlenecks are? Have we worked through optimising the actual areas of performance impact? Or are we just prematurely optimising? Excessive locking will have an impact for certain application models - but I don't think anyone is suggesting that what we
Re: [openssl-project] DRBGs, threads and locking
On Wed, Mar 14, 2018 at 12:49:46PM +, Salz, Rich wrote: > So is having a high-quality, lockless (per-thread) CSPRNG good enough for > now? Phrased like that, I think so. We have enough other stuff to do. So > +1 to Kurt's per-thread approach. I think it's better than what we have in 1.1.0. And if we think we can improve it, I suggest we improve it after 1.1.1. So I think the discussion is both about speed and security. >From what I understand from various things is that the random number generate is now for some workloads at least a limiting factor. Having it lockless and per thread is both the easiest thing to do and gives the best performance. When it comes to security, there seems to be a concern that from the public data it might be possible to determine the internal state, and that this might possibly have an effect on the security of a different connection. But we have the same situation now in 1.1.0. And I'm still waiting for people to properly explain that having it per SSL is better or not, there at least doesn't seem to be an agreement on that part. Kurt ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
So is having a high-quality, lockless (per-thread) CSPRNG good enough for now? Phrased like that, I think so. We have enough other stuff to do. So +1 to Kurt's per-thread approach. ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
It is good that Tim hit the break and requested a discussion. That was overdue and it is unfortunate that we did not start it much earlier. I think Tim brought up to important points: 1. We need to to pause for a discussion to determine the direction to go. Otherwise the DRBG implementation will become an ever moving target 2. Instead of guessing about the performance impacts of our change we should more rely on measuring them. Ad 1: Unfortunately, it was not clear until a few days ago that there was so much disagreement on how to do it right. In view of the upcoming beta freeze, it would probably be the best to leave the status-quo on master for 1.1.1, and continue the discussion under the premise that it will be implemented in 1.1.2-dev or 1.2.0-dev. This would give us more time to think about the optimal solution. Ad 2: All the haste and the last-minute changes were in some way caused by the fact that we did not notice the performance regressions until #5559, because we were not measuring them on a regular base. Having 'openssl speed' is not sufficient and it would really be great if we could have feedback about the actual performance of "real world" high performance web servers. But that is out of reach for ordinary persons and can only be done by a larger company. per-thread vs. per-ssl As for the discussion about whether per-thread or per-ssl DRBGs are better from a security perspective: I am not a professional cryptographer, so I'm not in the position to decide that. But I can say that currently the per-thread implementation is far superior when it comes to simplicity of design. And simplicity of design is an important countermeasure to prevent bugs and security holes. One of the reasons why Kurt reverted the per-ssl implementation was that it was a bit ugly and unsatisfactory, because a lot of changes had to be made to functions for handing down the correct DRBG through the callstack down to the low level functions that needed to use it. If per-ssl DRBGs are desired, I propose the following solution which reconciles the two approaches without loosing simplicity: If the public and private DRBGs are thread-local anyway, then it is easy to implement them as a stack so that the per-thread DRBG can be exchanged by the per-ssl DRBG within the scope of a function. The correct DRBG would then be picked up automatically further down in the stack when RAND_bytes() resp. RAND_priv_bytes() is called. I am thinking of an API like RAND_DRBG_push_public(RAND_DRBG *public); RAND_DRBG_push_private(RAND_DRBG *private); RAND_DRBG_pop_public(); RAND_DRBG_push_private(); As said before, this does not imply a preference in one of the two directions, it's only a suggestion about how it could be implemented. And I will not throw in a quick pull request for this... ;-) Matthias ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
We did a performance analysis for Oracle’s equivalent of Nginx, OTD, about two years ago. We were looking at the number of connections per second that could be established and the limiting factor was locking in ssleay_rand_bytes. Approximately a third of the CPU and over 90% of the lock wait time was there from memory. For this kind of workload (many threads, many connections) a non-locking RNG would have been an improvement. I’ll see if I can find the analyses and then find out what I can release. This was using 1.0.2 not 1.1 so things might have changed. Pauli -- Oracle Dr Paul Dale | Cryptographer | Network Security & Encryption Phone +61 7 3031 7217 Oracle Australia From: Tim Hudson [mailto:t...@cryptsoft.com] Sent: Wednesday, 14 March 2018 1:15 PM To: openssl-project@openssl.org Subject: Re: [openssl-project] DRBGs, threads and locking We have to keep in mind what threats we care about and their practicality. The security of a DRBG is dependent on both the secrecy of the seed material provided and the security of the algorithm in terms of its output not leaking information that materially leaks the internal state in a manner that enables it to be discovered or reversed in a manner to enable determination of previous or future outputs. For some of the arguments used to date there appears to be an assumption that there is a practical difference between a broken DRBG algorithm such that it is not such a security issue if we separate out the DRBG instances on a per SSL connection. In real terms if a DRBG is broken and its state is able to be determined remotely there is no practical difference in separating DRBG instances - they are all equally vulnerable in the same manner. In the case of the DualEC-DRBG this was clear - and no one I've seen has ever suggested that you were safer if you had separate instances of a broken algorithm for a DRBG - it makes no practical difference to the security at all. Sure there is a slight technical difference - but from a security perspective there is no difference - you are susceptible to the same attack - so the minor technical difference offers no actual meaningful security value - and everyone that has referenced this to date has also indicated that they don't think that there is actually any real practical value to the difference - it has been more of a "it cannot harm" sort of comment. In more general terms we need to have a clear view on what we think about our thread model - what is considered inside the scope of what we care to address - and what is frankly outside the scope (for our view). • We don't consider attacks from the same process against itself within our threat model. • Correspondingly we don't consider attacks from one thread against another thread without our threat model. • We don't consider privileged user attacks against the user in our threat model (i.e. root can read the memory of the process on most Unix-like systems). • We also don't actually consider a need to protect all secret information from every possible other bug that might leak arbitrary parts of memory. We could. But we don't. And if we did we would need to protect both the seeding material for the DRBG and its internal state and potentially its output. We don't do that - because that isn't within our threat model. Typical applications share an SSL_CTX between multiple SSL instances and we maintain the session cache against the SSL_CTX. This may be in a single process (thread) or shared across multiple threads - or even shared across multiple prcesses (which is simply the same as being in a single process from our perspective where the "magic" to coordinate the session id cache between processes is left to the developer/user). In a FIPS context, every DRBG has requirements on its inputs (seeding) and on maintaining a continuous RNG test (block-based compare for non-repeating outputs at a block level). All of these would be a per-instance requirement on the DRBG. They have to be factored in. There is also the argument that locking is bad and fewer locks are better - and that argument needs to be backed up by looking at the overall status - which particular application model are we concerned about? Have we measured it? Have we figured out where the bottlenecks are? Have we worked through optimising the actual areas of performance impact? Or are we just prematurely optimising? Excessive locking will have an impact for certain application models - but I don't think anyone is suggesting that what we had previously was excessive - and given the significant performance impact of the recent changes which went unmeasured and unaddressed I think it is clear we haven't been measuring performance related items for the DRBG at all to date - so there wasn't any "science" behind the choices made. Simple, clear, well documented co
Re: [openssl-project] DRBGs, threads and locking
I think the intention is to ditch the drbg from the ssl object and then call the global function (either public or private) which has been changed to use the current thread's drbg rather than being global. I'm in favour of a single per ssl drbg still, I'm not sure what a clean way to hook it up to avoid locks is (yet). Pauli -- Oracle Dr Paul Dale | Cryptographer | Network Security & Encryption Phone +61 7 3031 7217 Oracle Australia -Original Message- From: Salz, Rich [mailto:rs...@akamai.com] Sent: Wednesday, 14 March 2018 11:27 AM To: openssl-project@openssl.org Subject: Re: [openssl-project] DRBGs, threads and locking So a major reason, as you explained, for having per-thread DRBG's is to reduce contention. When threadA creates an SSL object, the parent DRBG will be the threadA one. Therefore you have to introducing locking, since threadA might create two SSL objects and they could end up being used in threadB and threadC and each need to reseed from their parent. In order to do that safely, threadA also has to do the locking to avoid conflict. That defeats the major gain of per-thread. I think having the SSL object parent be whatever the *current* thread DRBG is seems like the best, if not only, way to go. ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
We have to keep in mind what threats we care about and their practicality. The security of a DRBG is dependent on both the secrecy of the seed material provided and the security of the algorithm in terms of its output not leaking information that materially leaks the internal state in a manner that enables it to be discovered or reversed in a manner to enable determination of previous or future outputs. For some of the arguments used to date there appears to be an assumption that there is a practical difference between a broken DRBG algorithm such that it is not such a security issue if we separate out the DRBG instances on a per SSL connection. In real terms if a DRBG is broken and its state is able to be determined remotely there is no practical difference in separating DRBG instances - they are all equally vulnerable in the same manner. In the case of the DualEC-DRBG this was clear - and no one I've seen has ever suggested that you were safer if you had separate instances of a broken algorithm for a DRBG - it makes no practical difference to the security at all. Sure there is a slight technical difference - but from a security perspective there is no difference - you are susceptible to the same attack - so the minor technical difference offers no actual meaningful security value - and everyone that has referenced this to date has also indicated that they don't think that there is actually any real practical value to the difference - it has been more of a "it cannot harm" sort of comment. In more general terms we need to have a clear view on what we think about our thread model - what is considered inside the scope of what we care to address - and what is frankly outside the scope (for our view). - We don't consider attacks from the same process against itself within our threat model. - Correspondingly we don't consider attacks from one thread against another thread without our threat model. - We don't consider privileged user attacks against the user in our threat model (i.e. root can read the memory of the process on most Unix-like systems). - We also don't actually consider a need to protect all secret information from every possible other bug that might leak arbitrary parts of memory. We could. But we don't. And if we did we would need to protect both the seeding material for the DRBG and its internal state and potentially its output. We don't do that - because that isn't within our threat model. Typical applications share an SSL_CTX between multiple SSL instances and we maintain the session cache against the SSL_CTX. This may be in a single process (thread) or shared across multiple threads - or even shared across multiple prcesses (which is simply the same as being in a single process from our perspective where the "magic" to coordinate the session id cache between processes is left to the developer/user). In a FIPS context, every DRBG has requirements on its inputs (seeding) and on maintaining a continuous RNG test (block-based compare for non-repeating outputs at a block level). All of these would be a per-instance requirement on the DRBG. They have to be factored in. There is also the argument that locking is bad and fewer locks are better - and that argument needs to be backed up by looking at the overall status - which particular application model are we concerned about? Have we measured it? Have we figured out where the bottlenecks are? Have we worked through optimising the actual areas of performance impact? Or are we just prematurely optimising? Excessive locking will have an impact for certain application models - but I don't think anyone is suggesting that what we had previously was excessive - and given the significant performance impact of the recent changes which went unmeasured and unaddressed I think it is clear we haven't been measuring performance related items for the DRBG at all to date - so there wasn't any "science" behind the choices made. Simple, clear, well documented code with good tests and known architectural assumptions is what we are trying to achieve - and my sense from the conversations on this topic to date was that we don't have a consensus as to what problem we are actually trying to solve - so the design approach shifts, and shifts again - all of which are the authors of the PRs responding to what is (in my view at least) conflicting suggestions based on different assumptions. That is what I put the -1 on the the PR - to have this discussion - and agree on what we are trying to solve - and also agree on what we are not trying to solve. And perhaps we can actually document some of our "threat model" - as I'm sure we have different views on that as well. I don't think we should have per-SSL DRBGs - it offers no meaningful security value. We could have a per-SSL_CTX - but I'm not sure that is needed. We could have a per-thread - but again that is unclear if we actually need that either. My thoughts are per-SSL_CTX mig
Re: [openssl-project] DRBGs, threads and locking
>Either that or just always use the per-thread DRBG for the current thread, and don't bother to do per-SSL at all. There is appeal to isolating each SSL connection so that an adversary can't use information it has about *it's* connection to attack another. Granted, this might not be practical, but still... ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
On Wed, Mar 14, 2018 at 01:26:38AM +, Salz, Rich wrote: > So a major reason, as you explained, for having per-thread DRBG's is to > reduce contention. When threadA creates an SSL object, the parent DRBG will > be the threadA one. Therefore you have to introducing locking, since threadA > might create two SSL objects and they could end up being used in threadB and > threadC and each need to reseed from their parent. In order to do that > safely, threadA also has to do the locking to avoid conflict. That defeats > the major gain of per-thread. > > I think having the SSL object parent be whatever the *current* thread DRBG is > seems like the best, if not only, way to go. Either that or just always use the per-thread DRBG for the current thread, and don't bother to do per-SSL at all. -Ben ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
So a major reason, as you explained, for having per-thread DRBG's is to reduce contention. When threadA creates an SSL object, the parent DRBG will be the threadA one. Therefore you have to introducing locking, since threadA might create two SSL objects and they could end up being used in threadB and threadC and each need to reseed from their parent. In order to do that safely, threadA also has to do the locking to avoid conflict. That defeats the major gain of per-thread. I think having the SSL object parent be whatever the *current* thread DRBG is seems like the best, if not only, way to go. ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project
Re: [openssl-project] DRBGs, threads and locking
On Wed, Mar 14, 2018 at 01:27:47AM +0100, Kurt Roeckx wrote: > My solution is to just have 1 master DRBG, and a public and > private DRBG per thread. The only lock that then is needed is when > the public or private DRBG needs to reseed. All the rest of the > code can stay just as it is, but we might want to change some > places to use the (thread local) private DRBG, which is what #4665 > is about. [...] > So the suggestion was to still have a per SSL public DRBG, but > then the problem is that that SSL object might have moved to a > different thread between creating and being used and so that the > parent DRBG might actually belong to a different thread. One > solution there is that we just take the current thread's public > DRBG as parent instead of the original threads public DRBG. This should be fine from a thread-safety point of view. I don't know whether it could potentially affect the standards compliance, for the intermediate DRBG to potentially change over time (even though it still chains to a common grandparent/master DRBG). Per-SSL DRBGs (especially if split to public and private) seem excessive to me, so architecture described in the quoted text seems like the best option, to me. -Ben ___ openssl-project mailing list openssl-project@openssl.org https://mta.openssl.org/mailman/listinfo/openssl-project