version 11/05/2015:
sha256 39017.64k 87648.54k 150106.58k 183705.94k
197330.99k
version 1.8:
sha256 33560.42k 73153.83k 121472.43k 167948.67k
180955.23k
It sounds like we're talking about Nehalem, as it's very close to
difference reported by Pavel:
i5 Lynnfield 1250 / 1426 / 1271 / 1121 / 1033
1100
Indeed, you observe ~8% difference and above difference
It occurred to me that you might also be referring to bigger than 8%
difference for blocks shorter than 1KB. While looking good in specific
benchmark fully unrolled loop can hurt overall performance, because it's
likely to evict other code from cache. I mean in real life you don't do
just SHA256 and nothing else, don't you? For the moment fully unrolled
loop is taken for inputs larger than 1KB. The limit was more or less
arbitrarily chosen, but intention is to eventually quantify costs of
bringing code to cache and adjust value accordingly.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]