openssl speed sha-512:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
With SSE2 disabled:
sha-512           1050.62k     4223.53k     6141.97k     8488.01k     9480.48k
with SSE2 enabled:
sha-512           3171.75k    12757.93k    22761.88k    34514.56k    40059.42k

400% on large blocks.

4x? What gcc version? 3x mentioned in commentary section is also for largest block and with gcc 2.95.3. Well, not that 4x is worse result...


Many thanks to Andy for the code.

You're welcomed:-)

BTW. The method of enabling SSE2 via OPENSSL_ia32cap is IMHO
a kludge. What is 0x04000000 in decimal anyway?

As for OPENSSL_ia32cap. First of all, it's work in progress, it's not final yet. But the current plan for it is following. Even though it will be possible to manipulate the variable in question programmatically from application, we will *not* recommend it. Instead it will be initialized upon call to OPENSSL_add_all_algorithms to the value returned in EDX register by CPUID instruction (that's why the value is 1<<26). In order to arrange for those unfortunate situations when user runs application under kernel which does not support SSE extensions, we'll recommend to set environment variable with the same name [most commonly to 0] prior starting application [or recompile without SSE2 support]. So that *no* application source code modifications will ever be required to engage or disengage SSE2 code.


On djgpp where I tested this, we are free to use whatever CPU
instructions that's supported. Only trouble is getting at the CR4 register.

As long as you run DJGPP application under OS such as XP you won't be able to get to CR4, right? But what happens if you run it under real MS-DOS? Well, not that we should rush and implement SSE kernel support for MS-DOS, I'm simply curious:-)


djgpp also has a SIGILL handler, so it could fall-back to non-SSE2 method. I have some CPU detection code that could set OPENSSL_ia32cap programmatically if that's desired.

I'm sure it has/could. It's just that "As it doesn't appear feasable to detect the latter in a way we're ready to support on multiple platforms, we choose to lift this responsibility to end user." I mean it's not a problem to detect illegal instruction on some given platform, but to support it in a number of multi-threaded(!) environments. A.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to