> -----Original Message-----
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> On Behalf Of Andy Polyakov
> Sent: Wednesday, April 06, 2005 5:34 PM
> To: [email protected]
> Subject: Re: RC4 optimize for em64t
> 
> >>>Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb
> >>>(%rdi,%r10),%r14b and make inter-register move between r8 and r14
> >>>conditional?
> >>>
> >>
> >>  I will try it.
> >
> >     I have tried it, not performance gain.
> 
> Does it mean that it's same or does it mean that it's slower? Was it
> cmov or was it jump over mov instruction? BTW, what is the
> latency/throughput for Intel cmov anyway? I can't find information
> anywhere...

  Using cmov here slows down a lot.
  move the mov r13b, (%rdi, %rdi) to conditional has the same speed...

> 
> Another question. Why rotations are 32-bit? Did you try 64-bit
rotations
> and found them slow? If so, for how much?

  Changing to 64 bit ror will slow the throughput to around 480Mb/s
> 
> You may wonder why all these questions. I want to understand the code
to
> make it regular enough to express assembler unrolled loop in perl loop
> terms. It make it easier for us to maintain and I'm even ready to
> sacrifice few percents of performance for more regular looking code.
> 
> >>>BTW, 272MBps at 3.6GHz? I get 262MBps out of [as just mentioned
> >>>virtually identical] 32-bit code at 2.4GHz P4... A.
> >>
> >>  In fact, Your implement on EM64t isn't that slow if
> >>  we change the inc and dec to add and sub. :)
> >>
> >>  With that change the throughput boost from 272Mb/s to 396Mb/s.
> 
> For *now* I'm committing only this change to CVS and will have closer
> look at unrolled loop later on [some time next week]. BTW, there is
> aCnother idea I'd like to try, so I'm likely to send you some code for
> benchmarking on EM64T hardware. A.


  I am glad to do the test for you.
  I have tested changing inc and dec in 32 bit code to add and sub and
see a %2 performance gain on a P4. 
  It is a bit strange you see slowdown. Change inc to add will only
benefit on P4 in theory.

Zou Nan hai
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to