What is rationale behind choosing interleave factor of two for
parallelizable modes? Judging from aes-128 cbc encrypt benchmarks AES
round instruction latency is 4. If processor can pair together two
half-round instructions (I refer to fact that it takes two instructions
to perform single round), then optimal interleave factor should be 4. Do
you have performance metrics, specifically throughput, for instructions
in question? Did you attempt higher interleave factor?

The AES round instruction latency is 3 cycles.

As mentioned, the result looks more like 4, so it's either 4, or something holds it back (there might be room for improvement then), or I estimated it wrong. But question was if processor is capable of scheduling two independent ones at same time. If it is, then higher interleave is more appropriate and would still outweight losses from spilling key material and I reckon difference wouldn't be nominal. What would be absolutely best is to know how it would look in next generation, so that one can pick "future-safe" factor. I mean higher than optimal interleave factor doesn't have as much negative effect as lower than optimal one.

We don't have enough registers to unroll it by another factor,

        aes01   %key0,%reg0,%reg1,%reg2
        aes23   %key1,%reg0,%reg1,%reg1 <<< 1, not 3
        aes01   %key2,%reg2,%reg1,%reg0
        aes23   %key4,%reg2,%reg1,%reg1

allows for 4x interleave up to 192-bit, right? 3*4+13*4=64? Or did I get it wrong? Or would 3-register arrangement like above not work?
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to