> [note -- i changed the cc to rt because there's something preventing me
> from posting to openssl-dev... and rt seems to be one way for me to get my
> messages through.]

And my yesterday reply didn't appear in RT...

> > "- Transition from x87 FPU to MMX technology instructions or to SSE or
> > SSE2 instructions that operate on MMX registers should be preceded by
> > saving the state of the x87 FPU.
> 
> this depends on the ABI -- if there's callee saves data in the FPU then it
> needs to be saved.  but the x86 ELF ABI defines the FPU stack to be empty
> on entry to functions, so there's nothing live to be saved.  see page 38
> of <http://www.linuxbase.org/spec/refspecs/elf/abi386-4.pdf> for example.
> 
> unfortunately that ABI is pretty old...

Well, it doesn't matter how old specification is, as long as it's the
one currently in effect. Or does it:-) Using MMX registers indeed
appears appropriate under this ABI. But keep in mind that OpenSSL is not
exclusively about Linux and we have to think of a common denominator
here or reserve for both MMX and XMM (should be trivial with perl). I
admit that it might turn out that it's perfectly appropriate to use MMX
registers under *all* supported OS (it most likely is), but the option
to switch is appropriate in either case [in my opinion], not to mention
a distinct commentary section on these ABI pitfalls:-)

> i'll go re-implement with xmm and see what happens to the perf.
> 
> the trick with xmm regs is that i'm only using 64-bits of the register,
> and opteron, pentium-m and efficeon implement their MMX/SSE2 with a dual
> pair of 64-bit units.  which generally means issuing two 64-bit ops for
> every SSE2 128-bit op.

P4 can issue SSE2 op only every second cycle, so it works that way too.
Basically there is no point to implement full-width SSE2 ALU, as the
whole idea is to schedule two *ideally* pipe-lined instructions.

> > I haven't made up my mind about cpuid.c yet... It least it fails to
> > compile with -fPIC...

To compile with -fPIC you have to

        __asm volatile(
                "push %%ebx; cpuid; pop %%ebx"
                : "=a" (eax), "=c" (ecx), "=d" (edx)
                : "0" (1));

As compiler wants %ebx for itself.

> yeah i'm sure the SIGILL and pthread_once stuff is playing havoc here.

I'm not sure I like pthread_once solution and would like to ponder over
it further... Or rather about more common denominator than
pthread_once:-)

> (curse intel for requiring a faulting test to determine if SSE is
> enabled.)

Well, if you insist on performing even that check and catch SIGILL, why
bother with cpuid instruction at all? Just issue xorps and see if it
traps. Well, what I *really* try to say with this, or rather with "more
common denominator than pthread_once," is that we might as well decide
that cpuid check is sufficient and *demand* that toolkit compiled with
SSE2 support will be executed under OS which supports it.

> do you know if there's any method i can rely on to be called when PIC code
> is loaded?

On ELF platforms you can drop code into .init section (see
http://www.openssl.org/~appro/usatomic/ for samples). More "portable"
way is to rely on C++ run-time environment and instantiate a static
class with appropriate constructor. A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to