>> Now I agree ;) 1.8 version is "best-balanced" for all architectures.
>>
>
> I'm not sure I agree: I've grabbed the 1.8 version and rebuilt openssl
> 1.0.1c and tested it on an i5
i5 says exactly nothing, please don't use it. Say Nehalem, Sandy Bridge,
whatever, but not i5!
> and a Core 2 Duo; performance is better
> than the non-patched version but it is WORSE compared to the original
> version of the sha256-586.pl script that was posted here before on May
> 11th.
>
> version 11/05/2015:
> sha256 39017.64k 87648.54k 150106.58k 183705.94k
> 197330.99k
>
> version 1.8:
> sha256 33560.42k 73153.83k 121472.43k 167948.67k
> 180955.23k
It sounds like we're talking about Nehalem, as it's very close to
difference reported by Pavel:
> i5 Lynnfield 1250 / 1426 / 1271 / 1121 / 1033
1100
Indeed, you observe ~8% difference and above difference is about the
same. To tell you the truth Nehalem is ... special. I'm not suggesting
that you actually do, but if you take latest sha512-x86_64.pl and move
last rotate instruction in ROUND_00_15, you can observe as much as 10%
performance *drop* on Nehalem. How come? This just doesn't make sense,
it shouldn't matter that much on an out-of-order execution core! But it
does... What I'm trying to say here is that it's not impossible that
OpenSSL 32-bit code falls victim to similar "phenomena". But because
32-bit register bank is limited, it might be impossible to work around
the bottleneck. Recall that I've taken completely different approach to
full unroll, but can one consider it as inherently inferior? It might be
~8% slower on Nehalem, but it's more than 40% faster on Atom... I can
even add that I was able to squeeze extra 2-3% on Core 2, but it was
killing my AMD by 6%. It's trade-off...
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]