[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
3) An accellerator device directly supports TLS/SSL record encryption/decryption and the handshake operation itself. We do many bus transactions to the accellerator (and possibly system calls into the OS kernel) where we could do one, doing every single basic cryptographic operation individually when we could actually amortize the cost over the entire record or handshake operation. This is the case for most modern accellerators used with general-purpose CPUs. Application of such technique does not limited to hardware acselerator. Yet another example of such devices is services, allowing to pass the whole record plus encryption and MAC keys, and process it in single call. It is used when for some (security) reasons all cryptography-manipulations performed in separate process/driver/VM, and client operates only with handlers to keys. I saw how it was implemented in extension to MS CryptoAPI. Even without such extensions CryptEncrypt function is able to encrypt and hash data at the same time. Extension I'm taking about does add abbility to pass there pointer to header, body and place where to put tail - i.e. MAC value. Inabbility to process TLS record in single call results to necessity to pass the same data over IPC twice. Andrey. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
On Tue, Nov 23, 2010 at 04:08:11AM +0100, Richard Levitte via RT wrote: BTW, you wrote this a month ago... [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]: [...] I will file another bug describing these and detailing one possible solution. Did you? I can't seem to see that bug report, if you've submitted it, please tell me the number... No, I didn't. Got terribly busy unexpectedly. Fortunately I'm about to have two days off. Will file that bug and try to add something constructive to the rest of this thread. Thanks for your attention to this issue! Thor __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
On Tue, Nov 23, 2010 at 04:08:11AM +0100, Richard Levitte via RT wrote: BTW, you wrote this a month ago... [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]: [...] I will file another bug describing these and detailing one possible solution. Did you? I can't seem to see that bug report, if you've submitted it, please tell me the number... No, I didn't. Got terribly busy unexpectedly. Fortunately I'm about to have two days off. Will file that bug and try to add something constructive to the rest of this thread. Thanks for your attention to this issue! Thor __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
You're quite right on a number of things. Back when ENGINE was created, the goal was to interface against devices such as the nCipher box. It has slowly developped since, but obviously not enough. 1) seems simple enough. 2) hmmm, I think I'd need more details on this one, as I'm not sure about what this includes... maybe pointers to what this is about? I'll have to admit I feel a bit clueless on this, and would love to be clued in. 3) now, this is a big piece to bite. ENGINE was never designed to handle SSL level routines... I'll try to find time to have a look at the implementation you point at and see how that can be usable. Cheers, Richard [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]: The Openssl ENGINE interface has been showing its age for a long time. Arguably plugged into the code at the wrong layer of abstraction when it was originally written, with modern hardware it seriously hampers performance. It is largely a matter of luck that it is usable to accellerate, for example, AES with modern Intel CPUs, with decent performance. This is a matter of *luck*, not *design* -- it just so happens that the only thing Intel accellerates with special instructions is the one thing ENGINE actually handles at the right layer. Consider three problematic cases: 1) A CPU has special instructions (or a register-mapped accellerator has single commands) which accellerate AES, SHA1, and HMAC_SHA1. ENGINEs cannot directly handle keyed hashes. (at first, it looks like ENGINE accellerates any NID, but there is no appropriate table in the interface where an ENGINE may register the NIDs for keyed hash variants) The result is that hashing will occur at, at best, 1/2 the hardware's capability, because instead of handing the hardware the HMAC operation, it's handed multiple SHA1 operations in sequence. This is the simplest problem to fix and would simply require adding another ENGINE lookup/entry point for keyed hashes. 2) An abstract user-kernel interface to kernel-managed accellerator hardware has single operations which return both encrypted data and keyed hash of the data. Here, the current ENGINE interface loses even if the underlying hardware accellerates only the lowest-level raw transforms, because we pay at least three system call latencies where we could pay only one. This is why most ENGINEs don't actually bother to accellerate hash functions or are not used to accellerate hash functions because they end up so slow. This is the case for most accellerator hardware currently used with embedded or network processor CPUs. I'm not sure how to best address this issue. 3) An accellerator device directly supports TLS/SSL record encryption/decryption and the handshake operation itself. We do many bus transactions to the accellerator (and possibly system calls into the OS kernel) where we could do one, doing every single basic cryptographic operation individually when we could actually amortize the cost over the entire record or handshake operation. This is the case for most modern accellerators used with general-purpose CPUs. Fixing this would require plugging ENGINE in at the SSL layer rather than the crypto layer. This is rather complex but at least one vendor of this kind of hardware (NBMK, formerly NetOctave) have made the source code and design/implementation documentation to their modified version of OpenSSL freely available, including changes similar to these but not using ENGINE. There are other problems relating to use of ENGINE while SSL is in non-blocking mode. I will file another bug describing these and detailing one possible solution. -- Richard Levitte levi...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
BTW, you wrote this a month ago... [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]: [...] I will file another bug describing these and detailing one possible solution. Did you? I can't seem to see that bug report, if you've submitted it, please tell me the number... -- Richard Levitte levi...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware
The Openssl ENGINE interface has been showing its age for a long time. Arguably plugged into the code at the wrong layer of abstraction when it was originally written, with modern hardware it seriously hampers performance. It is largely a matter of luck that it is usable to accellerate, for example, AES with modern Intel CPUs, with decent performance. This is a matter of *luck*, not *design* -- it just so happens that the only thing Intel accellerates with special instructions is the one thing ENGINE actually handles at the right layer. Consider three problematic cases: 1) A CPU has special instructions (or a register-mapped accellerator has single commands) which accellerate AES, SHA1, and HMAC_SHA1. ENGINEs cannot directly handle keyed hashes. (at first, it looks like ENGINE accellerates any NID, but there is no appropriate table in the interface where an ENGINE may register the NIDs for keyed hash variants) The result is that hashing will occur at, at best, 1/2 the hardware's capability, because instead of handing the hardware the HMAC operation, it's handed multiple SHA1 operations in sequence. This is the simplest problem to fix and would simply require adding another ENGINE lookup/entry point for keyed hashes. 2) An abstract user-kernel interface to kernel-managed accellerator hardware has single operations which return both encrypted data and keyed hash of the data. Here, the current ENGINE interface loses even if the underlying hardware accellerates only the lowest-level raw transforms, because we pay at least three system call latencies where we could pay only one. This is why most ENGINEs don't actually bother to accellerate hash functions or are not used to accellerate hash functions because they end up so slow. This is the case for most accellerator hardware currently used with embedded or network processor CPUs. I'm not sure how to best address this issue. 3) An accellerator device directly supports TLS/SSL record encryption/decryption and the handshake operation itself. We do many bus transactions to the accellerator (and possibly system calls into the OS kernel) where we could do one, doing every single basic cryptographic operation individually when we could actually amortize the cost over the entire record or handshake operation. This is the case for most modern accellerators used with general-purpose CPUs. Fixing this would require plugging ENGINE in at the SSL layer rather than the crypto layer. This is rather complex but at least one vendor of this kind of hardware (NBMK, formerly NetOctave) have made the source code and design/implementation documentation to their modified version of OpenSSL freely available, including changes similar to these but not using ENGINE. There are other problems relating to use of ENGINE while SSL is in non-blocking mode. I will file another bug describing these and detailing one possible solution. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org