it seems somewhat fortunate that core2 CPUs track the p4 behaviour
w.r.t. these two rc4 implementations.  here are the core2 results with the
stock code / HT test:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes
rc4             166799.58k   180552.87k   182437.93k   183381.67k
183206.87k

for the record, core2 64-bit code seriously underperforming the 32-bit
code...  here's the 32-bit results (with cpuid test enabled):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes
rc4             254164.64k   279901.10k   279364.38k   283617.62k
276690.26k
... The key feature in 32-bit code with cpuid test is that corresponding loop
is not unrolled. Can you test following in *64-bit* build on Core2 hardware.
Open rc4-x86_64.pl in text editor and make jump to .Lcloop1 at line 154
unconditional, i.e. replace jz to jmp. make, benchmark and report back. A.

small improvement...

i think this hints that the problem with the unrolled code is the manual
load/store alias avoidance -- there's fancy new hardware in core2 for
dealing with this (obviously it's not fancy enough :)... and it seems
the 32-bit code pushes the alias problem onto the hardware.

But .Lcloop1 is folded and doesn't avoid aliasing.

oh and i tried using cmove with no luck either.

bizarre... i think i copied the 32-bit code into the 64-bit Lcloop1 case and it's still not performing like it does in 32-bit...

Fresh optimization manual says that targeting 32 bits of a register and then using all 64 bits incurs extra μop (look for "sign extension to full 64-bits"). Could you try to remove occurrences of #d in movzb instructions in .Lcloop1 body? Naturally keeping unconditional jmp .Lcloop1 as suggested above. It's also possible to compress the loop body by moving variables to "upper" register half, ax-dx,si,di,bp to minimize usage of of rex prefix. It shouldn't make difference though, not in .Lcloop1, as it won't reduce amount of cache-lines used. A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to