>> ... it's however a little bit faster.
>> It means the optimization still exist, and I think there should be more
>> gain if this code is used in OpenSSL SSSE3 codepath (I use completely
>> different SSSE3 code generation that is possibly less effective).
> 
> Not necessarily. Out-of-order execution logic can be forgiving (and
> current Intel out-of-order logic *is* forgiving enough) in sense that
> different sequences, even ones considered "less effective", can be
> executed equally fast. Sometimes it's more about sheer amount of u-ops
> than specific sequence of machine codes, and this case is very much like
> this.

Optimization indeed exists, as it does reduce amount of u-ops by 2%
(overlooked it). If forgiving enough, execution logic translates it to
2% performance improvement. http://cvs.openssl.org/chngview?cn=22725
effectively tells that Core2 (minor regression in 64-bit code), Sandy
Bridge (no effect in 32-bit mode(*)) and VIA Nano (minor regression in
32-bit core) are not always forgiving. Really minor improvement
otherwise, but once done it hardly make sense to let it be...

(*) 32-bit AVX result changed from 5.1 to 5.2, but 5.1 must have been
typo, because I couldn't reproduce it, while relative improvement, 70%,
was reproducible.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to