[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2011-12-04 Thread Andrey Kulikov via RT
 3) An accellerator device directly supports TLS/SSL record
 encryption/decryption and the handshake operation itself.

 We do many bus transactions to the accellerator (and
 possibly system calls into the OS kernel) where we
 could do one, doing every single basic cryptographic
 operation individually when we could actually amortize
 the cost over the entire record or handshake operation.

 This is the case for most modern accellerators used with
 general-purpose CPUs.


Application of such technique does not limited to hardware acselerator.
Yet another example of such devices is services, allowing to pass
the whole record plus encryption and MAC keys, and process it in
single call.
It is used when for some (security) reasons all
cryptography-manipulations performed in separate process/driver/VM,
and client operates only with handlers to keys.

I saw how it was implemented in extension to MS CryptoAPI.
Even without such extensions CryptEncrypt function is able to encrypt
and hash data at the same time.
Extension I'm taking about does add abbility to pass there pointer to
header, body and place where to put tail - i.e. MAC value.

Inabbility to process TLS record in single call results to necessity
to pass the same data over IPC twice.

Andrey.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2010-11-26 Thread Thor Simon
On Tue, Nov 23, 2010 at 04:08:11AM +0100, Richard Levitte via RT wrote:
 BTW, you wrote this a month ago...
 
  [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]:
 [...]
  I will file another bug describing these and
  detailing
  one possible solution.
 
 Did you?  I can't seem to see that bug report, if you've submitted it, please 
 tell me the 
 number...

No, I didn't.  Got terribly busy unexpectedly.

Fortunately I'm about to have two days off.  Will file that bug and try
to add something constructive to the rest of this thread.  Thanks for your
attention to this issue!

Thor
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2010-11-24 Thread Thor Simon via RT
On Tue, Nov 23, 2010 at 04:08:11AM +0100, Richard Levitte via RT wrote:
 BTW, you wrote this a month ago...
 
  [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]:
 [...]
  I will file another bug describing these and
  detailing
  one possible solution.
 
 Did you?  I can't seem to see that bug report, if you've submitted it, please 
 tell me the 
 number...

No, I didn't.  Got terribly busy unexpectedly.

Fortunately I'm about to have two days off.  Will file that bug and try
to add something constructive to the rest of this thread.  Thanks for your
attention to this issue!

Thor


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2010-11-22 Thread Richard Levitte via RT
You're quite right on a number of things.  Back when ENGINE was created, the 
goal was 
to interface against devices such as the nCipher box.  It has slowly developped 
since, 
but obviously not enough.

1) seems simple enough.

2) hmmm, I think I'd need more details on this one, as I'm not sure about what 
this 
includes...  maybe pointers to what this is about?  I'll have to admit I feel a 
bit 
clueless on this, and would love to be clued in.

3) now, this is a big piece to bite.  ENGINE was never designed to handle SSL 
level 
routines...  I'll try to find time to have a look at the implementation you 
point at 
and see how that can be usable.

Cheers,
Richard

 [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]:
 
 The Openssl ENGINE interface has been showing its age for a long time.
 Arguably plugged into the code at the wrong layer of abstraction when
 it
 was originally written, with modern hardware it seriously hampers
 performance.
 
 It is largely a matter of luck that it is usable to accellerate, for
 example, AES with modern Intel CPUs, with decent performance.  This is
 a matter of *luck*, not *design* -- it just so happens that the only
 thing
 Intel accellerates with special instructions is the one thing ENGINE
 actually handles at the right layer.
 
 Consider three problematic cases:
 
   1) A CPU has special instructions (or a register-mapped accellerator
  has single commands) which accellerate AES, SHA1, and HMAC_SHA1.
 
   ENGINEs cannot directly handle keyed hashes.
   (at first, it looks like ENGINE accellerates any NID,
   but there is no appropriate table in the interface
   where an ENGINE may register the NIDs for keyed
   hash variants)
 
   The result is that hashing will occur at, at best,
   1/2 the hardware's capability, because instead of
   handing the hardware the HMAC operation, it's handed
   multiple SHA1 operations in sequence.
 
   This is the simplest problem to fix and would simply
   require adding another ENGINE lookup/entry point for
   keyed hashes.
 
   2) An abstract user-kernel interface to kernel-managed accellerator
  hardware has single operations which return both encrypted
  data and keyed hash of the data.
 
   Here, the current ENGINE interface loses even if the
   underlying hardware accellerates only the lowest-level
   raw transforms, because we pay at least three system call
   latencies where we could pay only one.  This is why most
   ENGINEs don't actually bother to accellerate hash functions
   or are not used to accellerate hash functions because they
   end up so slow.
 
   This is the case for most accellerator hardware currently
   used with embedded or network processor CPUs.
 
   I'm not sure how to best address this issue.
 
   3) An accellerator device directly supports TLS/SSL record
  encryption/decryption and the handshake operation itself.
 
   We do many bus transactions to the accellerator (and
   possibly system calls into the OS kernel) where we
   could do one, doing every single basic cryptographic
   operation individually when we could actually amortize
   the cost over the entire record or handshake operation.
 
   This is the case for most modern accellerators used with
   general-purpose CPUs.
 
   Fixing this would require plugging ENGINE in at the
   SSL layer rather than the crypto layer.  This is rather
   complex but at least one vendor of this kind of hardware
   (NBMK, formerly NetOctave) have made the source code and
   design/implementation documentation to their modified
   version of OpenSSL freely available, including changes
   similar to these but not using ENGINE.
 
 There are other problems relating to use of ENGINE while SSL is in
 non-blocking mode.  I will file another bug describing these and
 detailing
 one possible solution.
 
 
-- 
Richard Levitte
levi...@openssl.org
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2010-11-22 Thread Richard Levitte via RT
BTW, you wrote this a month ago...

 [...@coyotepoint.com - Sun Oct 24 14:39:15 2010]:
[...]
 I will file another bug describing these and
 detailing
 one possible solution.

Did you?  I can't seem to see that bug report, if you've submitted it, please 
tell me the 
number...

-- 
Richard Levitte
levi...@openssl.org
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #2365] Limitations of ENGINE interface hamper performance on modern hardware

2010-10-24 Thread Thor Simon via RT
The Openssl ENGINE interface has been showing its age for a long time.
Arguably plugged into the code at the wrong layer of abstraction when it
was originally written, with modern hardware it seriously hampers performance.

It is largely a matter of luck that it is usable to accellerate, for
example, AES with modern Intel CPUs, with decent performance.  This is
a matter of *luck*, not *design* -- it just so happens that the only thing
Intel accellerates with special instructions is the one thing ENGINE
actually handles at the right layer.

Consider three problematic cases:

1) A CPU has special instructions (or a register-mapped accellerator
   has single commands) which accellerate AES, SHA1, and HMAC_SHA1.

ENGINEs cannot directly handle keyed hashes.
(at first, it looks like ENGINE accellerates any NID,
but there is no appropriate table in the interface
where an ENGINE may register the NIDs for keyed
hash variants)

The result is that hashing will occur at, at best,
1/2 the hardware's capability, because instead of
handing the hardware the HMAC operation, it's handed
multiple SHA1 operations in sequence.

This is the simplest problem to fix and would simply
require adding another ENGINE lookup/entry point for
keyed hashes.

2) An abstract user-kernel interface to kernel-managed accellerator
   hardware has single operations which return both encrypted
   data and keyed hash of the data.

Here, the current ENGINE interface loses even if the
underlying hardware accellerates only the lowest-level
raw transforms, because we pay at least three system call
latencies where we could pay only one.  This is why most
ENGINEs don't actually bother to accellerate hash functions
or are not used to accellerate hash functions because they
end up so slow.

This is the case for most accellerator hardware currently
used with embedded or network processor CPUs.

I'm not sure how to best address this issue.

3) An accellerator device directly supports TLS/SSL record
   encryption/decryption and the handshake operation itself.

We do many bus transactions to the accellerator (and
possibly system calls into the OS kernel) where we
could do one, doing every single basic cryptographic
operation individually when we could actually amortize
the cost over the entire record or handshake operation.

This is the case for most modern accellerators used with
general-purpose CPUs.

Fixing this would require plugging ENGINE in at the
SSL layer rather than the crypto layer.  This is rather
complex but at least one vendor of this kind of hardware
(NBMK, formerly NetOctave) have made the source code and
design/implementation documentation to their modified
version of OpenSSL freely available, including changes
similar to these but not using ENGINE.

There are other problems relating to use of ENGINE while SSL is in
non-blocking mode.  I will file another bug describing these and detailing
one possible solution.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org