Hi,

But since contemporary processors are SSSE3-capable it makes more sense
to benchmark *older* processors when evaluating integer-only
optimizations.

You're right (except the K10, which hasn't SSSE3), the main goal is to use this code in SSSE3 code path.


How do you measure on i5? Specifically if so called Turbo Boost is off
or on? And if on, do you compensate for it? I mean 4.9 does sound
impressive, but at the same time it sounds too good

You're right again. I measured my code with Turbo Boost on using RDTSC. Without Turbo it gives 5.6 cpb, and it's however a little bit faster than OpenSSL implementation (361 vs 367 clocks for full block). It means the optimization still exist, and I think there should be more gain if this code is used in OpenSSL SSSE3 codepath (I use completely different SSSE3 code generation that is possibly less effective).

Thanks Andy for giving me the clue for correct measuring. In a few days I'll provide new SHA-256 code that is 20% faster as a compensation. ;-)


--

   SY / C4acT/\uBo             Pavel Semjanov
   _   _         _        http://www.semjanov.com
  | | |-| |_|_| |-|
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to