400% on large blocks.

4x? What gcc version? 3x mentioned in commentary section is also for largest block and with gcc 2.95.3. Well, not that 4x is worse result...

I used gcc 3.3.1 with -O2 -fno-strength-reduce -fomit-frame-pointer.

Oh! I also get worse performance with 3.3.2, ~13 vs. 17MBps on 2.4GHz P4 to be specific. gcc 3.3.x-generated code is 2x slower than icc-generated one and 4x - than SSE2 code... Aren't compilers supposed to get better with time? Oh, well...


I would suggest some API like "int OPENSSL_enable_sse2(int)".

Original plan was actually *not* provide any API at all. Defaulting to CPUID value and option to override it with environment variable should suffice IMO. The "extern int OPENSSL_ia32cap;" kludge is meant primarily/exclusively for debugging and benchmarking purposes. Is there any other particular reason why would you like to see more "official" API?


And btw, shouldn't that be "unsigned long OPENSSL_ia32cap" ?

Yes, of course. Thanks.

On djgpp where I tested this, we are free to use whatever CPU
instructions that's supported. Only trouble is getting at the CR4 register.

As long as you run DJGPP application under OS such as XP you won't be able to get to CR4, right? But what happens if you run it under real MS-DOS? Well, not that we should rush and implement SSE kernel support for MS-DOS, I'm simply curious:-)

It depends on the DPMI host. I haven't tried it yet, but only CWSDPMI running at at ring-0 (on a "GenuineIntel") would allow it. Otherwise doing a "mov eax, cr4" would cause an exception. Not sure what an "AuthenticAMD" would do.

I'm pretty confident that AMD would operate with exact same limitations. Thanks for information. A.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to