> [note -- i changed the cc to rt because there's something preventing me > from posting to openssl-dev... and rt seems to be one way for me to get my > messages through.]
And my yesterday reply didn't appear in RT... > > "- Transition from x87 FPU to MMX technology instructions or to SSE or > > SSE2 instructions that operate on MMX registers should be preceded by > > saving the state of the x87 FPU. > > this depends on the ABI -- if there's callee saves data in the FPU then it > needs to be saved. but the x86 ELF ABI defines the FPU stack to be empty > on entry to functions, so there's nothing live to be saved. see page 38 > of <http://www.linuxbase.org/spec/refspecs/elf/abi386-4.pdf> for example. > > unfortunately that ABI is pretty old... Well, it doesn't matter how old specification is, as long as it's the one currently in effect. Or does it:-) Using MMX registers indeed appears appropriate under this ABI. But keep in mind that OpenSSL is not exclusively about Linux and we have to think of a common denominator here or reserve for both MMX and XMM (should be trivial with perl). I admit that it might turn out that it's perfectly appropriate to use MMX registers under *all* supported OS (it most likely is), but the option to switch is appropriate in either case [in my opinion], not to mention a distinct commentary section on these ABI pitfalls:-) > i'll go re-implement with xmm and see what happens to the perf. > > the trick with xmm regs is that i'm only using 64-bits of the register, > and opteron, pentium-m and efficeon implement their MMX/SSE2 with a dual > pair of 64-bit units. which generally means issuing two 64-bit ops for > every SSE2 128-bit op. P4 can issue SSE2 op only every second cycle, so it works that way too. Basically there is no point to implement full-width SSE2 ALU, as the whole idea is to schedule two *ideally* pipe-lined instructions. > > I haven't made up my mind about cpuid.c yet... It least it fails to > > compile with -fPIC... To compile with -fPIC you have to __asm volatile( "push %%ebx; cpuid; pop %%ebx" : "=a" (eax), "=c" (ecx), "=d" (edx) : "0" (1)); As compiler wants %ebx for itself. > yeah i'm sure the SIGILL and pthread_once stuff is playing havoc here. I'm not sure I like pthread_once solution and would like to ponder over it further... Or rather about more common denominator than pthread_once:-) > (curse intel for requiring a faulting test to determine if SSE is > enabled.) Well, if you insist on performing even that check and catch SIGILL, why bother with cpuid instruction at all? Just issue xorps and see if it traps. Well, what I *really* try to say with this, or rather with "more common denominator than pthread_once," is that we might as well decide that cpuid check is sufficient and *demand* that toolkit compiled with SSE2 support will be executed under OS which supports it. > do you know if there's any method i can rely on to be called when PIC code > is loaded? On ELF platforms you can drop code into .init section (see http://www.openssl.org/~appro/usatomic/ for samples). More "portable" way is to rely on C++ run-time environment and instantiate a static class with appropriate constructor. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
