The Openssl ENGINE interface has been showing its age for a long time.
Arguably plugged into the code at the wrong layer of abstraction when it
was originally written, with modern hardware it seriously hampers performance.

It is largely a matter of luck that it is usable to accellerate, for
example, AES with modern Intel CPUs, with decent performance.  This is
a matter of *luck*, not *design* -- it just so happens that the only thing
Intel accellerates with special instructions is the one thing ENGINE
actually handles at the right layer.

Consider three problematic cases:

        1) A CPU has special instructions (or a register-mapped accellerator
           has single commands) which accellerate AES, SHA1, and HMAC_SHA1.

                ENGINEs cannot directly handle keyed hashes.
                (at first, it looks like ENGINE accellerates any NID,
                but there is no appropriate table in the interface
                where an ENGINE may register the NIDs for keyed
                hash variants)

                The result is that hashing will occur at, at best,
                1/2 the hardware's capability, because instead of
                handing the hardware the HMAC operation, it's handed
                multiple SHA1 operations in sequence.

                This is the simplest problem to fix and would simply
                require adding another ENGINE lookup/entry point for
                keyed hashes.

        2) An abstract user-kernel interface to kernel-managed accellerator
           hardware has single operations which return both encrypted
           data and keyed hash of the data.

                Here, the current ENGINE interface loses even if the
                underlying hardware accellerates only the lowest-level
                raw transforms, because we pay at least three system call
                latencies where we could pay only one.  This is why most
                ENGINEs don't actually bother to accellerate hash functions
                or are not used to accellerate hash functions because they
                end up so slow.

                This is the case for most accellerator hardware currently
                used with "embedded" or "network processor" CPUs.

                I'm not sure how to best address this issue.

        3) An accellerator device directly supports TLS/SSL record
           encryption/decryption and the handshake operation itself.

                We do many bus transactions to the accellerator (and
                possibly system calls into the OS kernel) where we
                could do one, doing every single basic cryptographic
                operation individually when we could actually amortize
                the cost over the entire record or handshake operation.

                This is the case for most modern accellerators used with
                general-purpose CPUs.

                Fixing this would require plugging ENGINE in at the
                SSL layer rather than the crypto layer.  This is rather
                complex but at least one vendor of this kind of hardware
                (NBMK, formerly NetOctave) have made the source code and
                design/implementation documentation to their modified
                version of OpenSSL freely available, including changes
                similar to these but not using ENGINE.

There are other problems relating to use of ENGINE while SSL is in
non-blocking mode.  I will file another bug describing these and detailing
one possible solution.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to