> This patch is a faster bn_mul_add_words for x86 assembly.

What's your platform? I can't get it working on Linux, it dumps the core
at the indirect jump... Had anybody have better luck? If you run on
Windows and it works it must be a bug in perlasm module generating code
for Unix...

And as for that indirect jump. It makes the code position dependent and
therefore inappropriate for usage in shared objects... Can you think of
a way to keep the code position independent?

> For example, rsa 1024 bits sign/s improved about 4.7%.

Not really impressive... How does it go on Intel CPUs? IA-32
implrementations are rather different (I myself call it "impaired in
different ways") and I've seen several times codes optimized for some
particular implementation running few percents faster on the target CPU
but significantly slower on other implementation. Latest time it was
compiler-generated code, but it sets pretty good example. When compared
to the code generated by GCC, the Rijndael code generated by Intel C was
running ~5% faster on PIII, but 33% slower on AMD. We have to find a
balance. At the very least 5% on AMD may not hurt PIII nor P4. Can you
check?

Andy.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to