The Openssl ENGINE interface has been showing its age for a long time.
Arguably plugged into the code at the wrong layer of abstraction when it
was originally written, with modern hardware it seriously hampers performance.
It is largely a matter of luck that it is usable to accellerate, for
example, AES with modern Intel CPUs, with decent performance. This is
a matter of *luck*, not *design* -- it just so happens that the only thing
Intel accellerates with special instructions is the one thing ENGINE
actually handles at the right layer.
Consider three problematic cases:
1) A CPU has special instructions (or a register-mapped accellerator
has single commands) which accellerate AES, SHA1, and HMAC_SHA1.
ENGINEs cannot directly handle keyed hashes.
(at first, it looks like ENGINE accellerates any NID,
but there is no appropriate table in the interface
where an ENGINE may register the NIDs for keyed
hash variants)
The result is that hashing will occur at, at best,
1/2 the hardware's capability, because instead of
handing the hardware the HMAC operation, it's handed
multiple SHA1 operations in sequence.
This is the simplest problem to fix and would simply
require adding another ENGINE lookup/entry point for
keyed hashes.
2) An abstract user-kernel interface to kernel-managed accellerator
hardware has single operations which return both encrypted
data and keyed hash of the data.
Here, the current ENGINE interface loses even if the
underlying hardware accellerates only the lowest-level
raw transforms, because we pay at least three system call
latencies where we could pay only one. This is why most
ENGINEs don't actually bother to accellerate hash functions
or are not used to accellerate hash functions because they
end up so slow.
This is the case for most accellerator hardware currently
used with "embedded" or "network processor" CPUs.
I'm not sure how to best address this issue.
3) An accellerator device directly supports TLS/SSL record
encryption/decryption and the handshake operation itself.
We do many bus transactions to the accellerator (and
possibly system calls into the OS kernel) where we
could do one, doing every single basic cryptographic
operation individually when we could actually amortize
the cost over the entire record or handshake operation.
This is the case for most modern accellerators used with
general-purpose CPUs.
Fixing this would require plugging ENGINE in at the
SSL layer rather than the crypto layer. This is rather
complex but at least one vendor of this kind of hardware
(NBMK, formerly NetOctave) have made the source code and
design/implementation documentation to their modified
version of OpenSSL freely available, including changes
similar to these but not using ENGINE.
There are other problems relating to use of ENGINE while SSL is in
non-blocking mode. I will file another bug describing these and detailing
one possible solution.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]