"Andy Polyakov" <[EMAIL PROTECTED]> said:

> >>400% on large blocks.
> 
> 4x? What gcc version? 3x mentioned in commentary section is also for 
> largest block and with gcc 2.95.3. Well, not that 4x is worse result...

I used gcc 3.3.1 with -O2 -fno-strength-reduce -fomit-frame-pointer.

> As for OPENSSL_ia32cap. First of all, it's work in progress, it's not 
> final yet. But the current plan for it is following. Even though it will 
> be possible to manipulate the variable in question programmatically from 
> application, we will *not* recommend it. Instead it will be initialized 
> upon call to OPENSSL_add_all_algorithms to the value returned in EDX 
> register by CPUID instruction (that's why the value is 1<<26).

I got that part. But AFAICS, when strtol(env,NULL,0) is used to set 
OPENSSL_ia32cap and env = "0x04000000", strtol() treats the value
as octal. From mn strtol:
  The string may begin with an arbitrary amount of white space (as
  determined by isspace(3)) followed by a single optional + or - sign. If
  base is zero or 16, the string may then include a 0x prefix, and the
  number will be read in base 16; otherwise, a zero base is taken as 10
  (decimal) unless the *next character is 0*, in which case it is taken as 8
  (octal).

> starting application [or recompile without SSE2 support]. So that *no* 
> application source code modifications will ever be required to engage or 
> disengage SSE2 code.

I would suggest some API like "int OPENSSL_enable_sse2(int)".
And btw, shouldn't that be "unsigned long OPENSSL_ia32cap" ?

> > On djgpp where I tested this, we are free to use whatever CPU
> > instructions that's supported. Only trouble is getting at the CR4 register.
> 
> As long as you run DJGPP application under OS such as XP you won't be 
> able to get to CR4, right? But what happens if you run it under real 
> MS-DOS? Well, not that we should rush and implement SSE kernel support 
> for MS-DOS, I'm simply curious:-)

It depends on the DPMI host. I haven't tried it yet, but only CWSDPMI running 
at at ring-0 (on a "GenuineIntel") would allow it. Otherwise doing a "mov eax, cr4" 
would cause an exception. Not sure what an "AuthenticAMD" would do.
Under Windows' NTVDM, it's virtualised and always returns 0 :-(
 
--gv

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to