I'm more used to dealing with PKCS#11 where the call overhead is usually
measurable, but even so, doing just AES, probably not a problem.
Doing something like AES-GCM with the AES in the engine and GCM Hash in
OpenSSL though I'd expect to see an impact, you are basically doingthe AES
a blcok at a time in that sceenario.
That's where I'm claiming that you'll be sacrificing performance long term.
And for instructions that are wired into the CPU and unpriviledged there's
no real gain using an engine.

The other issue, FIPS, you already covered. Yes, I care for that reason as
well, FIPS certifiying with code in an engne will be more difficult , but
it really only impacts people who do their own FIPS certifications. Pretty
much our problem to deal with it.

Like I said though, your call.

Peter






From:   Andy Polyakov <ap...@openssl.org>
To:     openssl-dev@openssl.org
Date:   08/11/2011 05:00
Subject:        Re: [openssl.org #2627] SPARC T4 support for OpenSSL
Sent by:        owner-openssl-...@openssl.org



Peter Waltenberg wrote:
> There are some fairly severe performance hits in engine support unless
the
> engine includes all the submodes as well.
> That includes things you are just starting to play with now, like the
combined
> AES+SHA1 on x86.

??? Here is output for 'speed -engine intel-accel -evp
aes-128-cbc-hmac-sha1' for 1.0.0d, i.e. through engine.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes
aes-128-cbc-hmac-sha1   202516.18k   322609.98k   432125.60k
480232.03k   496191.36k

And here is output for 'speed -evp aes-128-cbc-hmac-sha1' for HEAD, i.e.
without engine.

aes-128-cbc-hmac-sha1   237351.62k   326968.34k   432138.62k
482383.80k   497401.86k

"Engine" overhead is significant at 16-byte chunks *only* and hardly
noticeable otherwise. What severe performance hits are we talking about?
 EVP has overhead, but I can't see that it's engine specific. Combined
cipher+hash implementations do minimize EVP overhead (you don't have to
make two EVP calls), but that was not the reason for implementing above
mentioned "stitched" modes, higher instruction-level parallelism was.

> For features that are part of CPU's - rather than plug in cards - my
preference
> would be that the implementation is inline so that every last drop of
> performance can eventually be wrung out of it.

As mentioned, there are other factors in play, such as maintenance,
adoption time...
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org



______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to