Re: [PATCH 0/2] Sparc AES crypto opcode support.

Andy Polyakov Fri, 28 Sep 2012 12:00:57 -0700

What is rationale behind choosing interleave factor of two for
parallelizable modes? Judging from aes-128 cbc encrypt benchmarks AES
round instruction latency is 4. If processor can pair together two
half-round instructions (I refer to fact that it takes two instructions
to perform single round), then optimal interleave factor should be 4. Do
you have performance metrics, specifically throughput, for instructions
in question? Did you attempt higher interleave factor?


The AES round instruction latency is 3 cycles.

As mentioned, the result looks more like 4, so it's either 4, orsomething holds it back (there might be room for improvement then), or Iestimated it wrong. But question was if processor is capable ofscheduling two independent ones at same time. If it is, then higherinterleave is more appropriate and would still outweight losses fromspilling key material and I reckon difference wouldn't be nominal. Whatwould be absolutely best is to know how it would look in nextgeneration, so that one can pick "future-safe" factor. I mean higherthan optimal interleave factor doesn't have as much negative effect aslower than optimal one.

We don't have enough registers to unroll it by another factor,


        aes01   %key0,%reg0,%reg1,%reg2
        aes23   %key1,%reg0,%reg1,%reg1 <<< 1, not 3
        aes01   %key2,%reg2,%reg1,%reg0
        aes23   %key4,%reg2,%reg1,%reg1

allows for 4x interleave up to 192-bit, right? 3*4+13*4=64? Or did I getit wrong? Or would 3-register arrangement like above not work?

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Re: [PATCH 0/2] Sparc AES crypto opcode support.

Reply via email to