it seems somewhat fortunate that core2 CPUs track the p4 behaviour
w.r.t. these two rc4 implementations. here are the core2 results with the
stock code / HT test:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
rc4 166799.58k 180552.87k 182437.93k 183381.67k
183206.87k
for the record, core2 64-bit code seriously underperforming the 32-bit
code... here's the 32-bit results (with cpuid test enabled):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
rc4 254164.64k 279901.10k 279364.38k 283617.62k
276690.26k
... The key feature in 32-bit code with cpuid test is that corresponding loop
is not unrolled. Can you test following in *64-bit* build on Core2 hardware.
Open rc4-x86_64.pl in text editor and make jump to .Lcloop1 at line 154
unconditional, i.e. replace jz to jmp. make, benchmark and report back. A.
small improvement...
i think this hints that the problem with the unrolled code is the manual
load/store alias avoidance -- there's fancy new hardware in core2 for
dealing with this (obviously it's not fancy enough :)... and it seems
the 32-bit code pushes the alias problem onto the hardware.
But .Lcloop1 is folded and doesn't avoid aliasing.
oh and i tried using cmove with no luck either.
bizarre... i think i copied the 32-bit code into the 64-bit Lcloop1 case
and it's still not performing like it does in 32-bit...
Fresh optimization manual says that targeting 32 bits of a register and
then using all 64 bits incurs extra μop (look for "sign extension to
full 64-bits"). Could you try to remove occurrences of #d in movzb
instructions in .Lcloop1 body? Naturally keeping unconditional jmp
.Lcloop1 as suggested above. It's also possible to compress the loop
body by moving variables to "upper" register half, ax-dx,si,di,bp to
minimize usage of of rex prefix. It shouldn't make difference though,
not in .Lcloop1, as it won't reduce amount of cache-lines used. A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager [EMAIL PROTECTED]