version 11/05/2015:
sha256 39017.64k 87648.54k 150106.58k 183705.94k 197330.99k

version 1.8:
sha256 33560.42k 73153.83k 121472.43k 167948.67k 180955.23k

It sounds like we're talking about Nehalem, as it's very close to
difference reported by Pavel:

i5 Lynnfield       1250 / 1426 / 1271 / 1121 / 1033
                                          1100

Indeed, you observe ~8% difference and above difference

It occurred to me that you might also be referring to bigger than 8% difference for blocks shorter than 1KB. While looking good in specific benchmark fully unrolled loop can hurt overall performance, because it's likely to evict other code from cache. I mean in real life you don't do just SHA256 and nothing else, don't you? For the moment fully unrolled loop is taken for inputs larger than 1KB. The limit was more or less arbitrarily chosen, but intention is to eventually quantify costs of bringing code to cache and adjust value accordingly.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to