Hi,

> 1) Firstly, I was surprised about current AES implementation timings. I
> wrote the simplest test, containing calls to 4 core AES functions (
> AES_set_encrypt_key, AES_encrypt(), AES_set_decrypt_key(), AES_decrypt)
> and measured the clocks of some OpenSSL implementations.

This was discussed couple of times already. In assembly module above
subroutines use compact 256-byte S-boxes, which means that it takes
extra calculations, and *more* for AES_decrypt and AES_set_decrypt_key.
Therefore the results. It's conscious choice for assembly module in
order to mitigate side channel attacks. There are alternatives for
x86[_64] platforms [besides AES-NI], namely SSSE3 vpaes-x86[_64] and
bsaes-x86_64 modules. These are accessible through EVP and provide
adequate performance (in comparison to aes_core.c that is, not AES-NI).

> 2) So I decided to improve AES_set_decrypt_key() and here is the patch.
> The result of patched code compiled with icl is 732 cycles for 256-bit
> key that is 15% faster that best current implementation. For gcc, result
> is worse - 1086 cycles (8% speed-up). I also wrote an asm implementation
> that is faster than icl for 128-bit keys only (patched C version does
> 550 cycles and an asm version does 475) and mush faster than gcc code
> for any key length. Unfortunately, I was not able to make a patch for
> aes-586.pl because my assembler version actively uses all 7 registers
> and there is no more register to address constant tables Te1..4, Td1..4
> etc. I address them like
> mov    ebp,    DWORD PTR _Td2[ebp*4]
> but in your code these tables are in the code segment and require one
> more register.

It's not the fact that tables reside in code segment that requires extra
register, but the fact that the code itself is *position independent*. I
mean you can 'mov ebp,_Td2[ebp*]' even if _Td2 is in code segment, but
placing it in code segment won't make code position independent. And
vice-versa, I'd need extra register even if table we residing elsewhere.
Position independence is strong requirement and code that doesn't meet
it is not really interesting.

On side note, I had a quick look and it seems to be possible to improve
ILP a little bit and improve decrypt performance by few percent
(hopefully)... I spare the exercise for a rainy day...

Use alternatives!!! (Well, it probably should be noted that bsaes relies
on AES_set_[en|de]crypt_key).

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to