it's more than appropriate to #ifndef
I386_ONLY the whole engine in question, which effectively eliminates the
need for explicit test on hardware:-)

Done. Can I remove the test for CPUID availability now?

No. We have 486 left which doesn't support CPUID. I don't know why Geoff didn't ask about 486 too:-) But for the record I don't really share his concern. I mean if anybody runs into a problem with the code in question, there always is the option to compile with no-hw and disable offensive code... Secondly, it anybody runs into a problem it would be with some rare OEM, non-Intel compliant implementation, in which case configuring with 386 and/or no-asm would most likely be appropriate advice anyway...


As for the code itself. What's "pushfl; popfl; movl %ebx,-4(%esp)"?
Especially the last instruction? Is it safe to store something above
stack pointer? What if a signal is delivered? What's wrong with just
"push %ebx" and then "pop %ebx"?

Sorry, this is my habit from amd64 :-/ I'm not 100% sure about i386

That's how bugs generally slips through, don't they:-):-):-) Someone wasn't 100% sure and believed the code was good enough:-) In either case on x86 you may not put anything above(*) stack pointer.


(*) or "below" if you write on paper upwards.

FYI "pushfl; popfl" is used to clear a bit in eflags that says whether
or not to reload the encryption key from memory. For now I always force
the reloading. Probably it could be more optimized later.

I assume that you would have to detect if routine was called with different context then, right? One can argue if you have to compare it against global variable or this variable can reside in per-thread storage. Indeed, thread has all rights to expect that its EFLAGS is preserved and therefore the flag in question will be cleared automatically whenever any particular thread is scheduled for execution. Which in turn means that as long as you *know* that no execution thread interleaves several contexts, pushfl;popfl is redundant. Right?


Any other comments?

You bet:-)

Get rid of htonl. It surely mapped to inline asm on Linux, but what about other platforms? Just implement your own inline bswapl.

In my opinion aligning code abuses malloc. I'd recommend to declare an automatic buffer of say 1KB and use it as temporary aligned storage whenever nbytes is less than size of this automatic buffer, and fall to malloc only if it's larger. Once you have the code in place, see how this buffer size affects the benchmark.

Shouldn't you check if output buffer is aligned first? Because if it is and input is not, then you don't have to use intermediate aligned storage. Right?

A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to