Hi,

Any comments on that?
In one word "no-o-o-o-o-o-o". :-) In more words. Preferred way to
integrate processor-specific code is plotted in Intel AES-NI and
SPARC T4 modules. And "preferred" does not really mean "matter of
choice". [s390x module is usually mentioned in the context, and the
answer is I wish I had time to do something about it.]

Can you be more specific with that? What do you disagree? Is it the way
I'm checking for processor's capabilities or the fact that I included
functions to encrypt and decrypt just individual blocks? Both? Or
anything else?

As implied, follow the lead in aesni-x86_64 and aest4-sparcv9 modules. The run-time switch is performed in e_aes.c, and it's not really run-time switch, but rather run-time *link*, because you "link" to platform-specific block and stream functions by reference upon key setup. In other words appropriate starting point can be standalone module with single-block functions and key schedule setup. Then it's possible to add specific stream functions. If key schedule pre-load is deemed beneficial, it's possible to arrange it key-length specific entry points, see aest4-sparcv9 module for example.

Regarding block encryption, my idea is to first provide optimization
for it and then include optimization for the most common cipher modes
(CBC, CTR and so on). In that way, any cipher mode without a specific
optimization still can have some level of performance improvement.

Of course. As mentioned, module with single-block functions is appropriate starting point. And e_aes.c can accommodate it.

The performancen't gain is about 5x in a non-final hardware.
More important question is what is theoretical asymptotic limit, how
far are we from it and how to get there. Well, answer is naturally
mode-specific subroutines, but it doesn't change the point. One
should discuss even absolute numbers, not only relative improvement.

I understand your point. And yes, just mode-specific routines will be
able to get maximum performance from that. I can post absolute numbers,
but in the end any notion of improvement only can be obtained when
comparing it with the current assembly or C implementation results,
specially because that is not a final hardware and the results will not
reflect the performance of the final hardware.

I find it hard to believe that performance metrics for involved instructions are not known already or will change abruptly. If there is reason to believe that non-final hardware does not live up to these metrics, we can make educated guesses based on projections.

As for ld/stxvd2x for data. Manual "threatens" with penalties on
cache line and page boundaries, and it doesn't seem to actually make
promise that it always works with byte alignment across page
boundaries. Yes, OS surely handles it by serving the exception, but
we don't want it to happen. Wouldn't it be more appropriate to
adhere to l/stvx? [See just committed vpaes-ppc.pl module for
example.]

As for page boundaries in ld/stxvd2x. Key schedule is aligned at 64
bits (in e_aes.c) and this doesn't preclude possibility for a
ld/stxvd2x to cross page boundary. And if there is penalty, it might
get costly [because of recurring nature of references to key
schedule]. Should one consider lvx even for key schedule?

They work with non-aligned data but I will check if there's any issues
with page boundaries.

Two points. 1. Hardware-assisted operation is so fast that even small penalties can significantly affect the result. Reference to "work" might be not good enough. 2. When you say "work" you refer to *one* specific CPU. We ought to be think broader. Didn't IBM suggest to do so by founding OpenPOWER?

I'd like to include those changes in a incremental way, what do you
think? I think it would avoid wasting time with submitting huge patches
that might need to be completely redone.

I'd still insist on new submission formed after above suggestions.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to