Hi,
But since contemporary processors are SSSE3-capable it makes more sense to benchmark *older* processors when evaluating integer-only optimizations.
You're right (except the K10, which hasn't SSSE3), the main goal is to use this code in SSSE3 code path.
How do you measure on i5? Specifically if so called Turbo Boost is off or on? And if on, do you compensate for it? I mean 4.9 does sound impressive, but at the same time it sounds too good
You're right again. I measured my code with Turbo Boost on using RDTSC. Without Turbo it gives 5.6 cpb, and it's however a little bit faster than OpenSSL implementation (361 vs 367 clocks for full block). It means the optimization still exist, and I think there should be more gain if this code is used in OpenSSL SSSE3 codepath (I use completely different SSSE3 code generation that is possibly less effective).
Thanks Andy for giving me the clue for correct measuring. In a few days I'll provide new SHA-256 code that is 20% faster as a compensation. ;-)
-- SY / C4acT/\uBo Pavel Semjanov _ _ _ http://www.semjanov.com | | |-| |_|_| |-| ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
