> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Andy Polyakov > Sent: Wednesday, April 06, 2005 5:34 PM > To: [email protected] > Subject: Re: RC4 optimize for em64t > > >>>Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb > >>>(%rdi,%r10),%r14b and make inter-register move between r8 and r14 > >>>conditional? > >>> > >> > >> I will try it. > > > > I have tried it, not performance gain. > > Does it mean that it's same or does it mean that it's slower? Was it > cmov or was it jump over mov instruction? BTW, what is the > latency/throughput for Intel cmov anyway? I can't find information > anywhere...
Using cmov here slows down a lot. move the mov r13b, (%rdi, %rdi) to conditional has the same speed... > > Another question. Why rotations are 32-bit? Did you try 64-bit rotations > and found them slow? If so, for how much? Changing to 64 bit ror will slow the throughput to around 480Mb/s > > You may wonder why all these questions. I want to understand the code to > make it regular enough to express assembler unrolled loop in perl loop > terms. It make it easier for us to maintain and I'm even ready to > sacrifice few percents of performance for more regular looking code. > > >>>BTW, 272MBps at 3.6GHz? I get 262MBps out of [as just mentioned > >>>virtually identical] 32-bit code at 2.4GHz P4... A. > >> > >> In fact, Your implement on EM64t isn't that slow if > >> we change the inc and dec to add and sub. :) > >> > >> With that change the throughput boost from 272Mb/s to 396Mb/s. > > For *now* I'm committing only this change to CVS and will have closer > look at unrolled loop later on [some time next week]. BTW, there is > aCnother idea I'd like to try, so I'm likely to send you some code for > benchmarking on EM64T hardware. A. I am glad to do the test for you. I have tested changing inc and dec in 32 bit code to add and sub and see a %2 performance gain on a P4. It is a bit strange you see slowdown. Change inc to add will only benefit on P4 in theory. Zou Nan hai ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [EMAIL PROTECTED]
