Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-30 Thread Andy Polyakov
Hi, Thank you so much for looking into the issue with Ferenc! I'll incorporate the change into Solaris to verify the 20-30% performance improvement. The conservative approach sounds like the best approach at this point. Once the performance improvement is verified, can you commit the change

Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-21 Thread Misaki.Miyashita
Hi Andy, Thank you so much for looking into the issue with Ferenc! I'll incorporate the change into Solaris to verify the 20-30% performance improvement. The conservative approach sounds like the best approach at this point. Once the performance improvement is verified, can you commit the

Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-18 Thread Andy Polyakov
Misaki, The measurement I sent yesterday for OpenSSL (with inlined T4 instruction support) was not quite accurate. Some of the T4 specific code you committed was not enabled when we tested, and I realized that __sparc__ was not defined on our system. Thus, I changed #if defined(__sparc__) to

Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-01 Thread Andy Polyakov
Another question is about suitability of floating-point fcmps and fmovd instructions. These are used to pick a vector from powers table in cache-timing neutral manner. I have to admit I haven't done due research whether or not they are optimal choice in the context, and/or whether or not we are

Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-01 Thread David Miller
From: Andy Polyakov ap...@openssl.org Date: Sat, 01 Jun 2013 09:38:18 +0200 I wonder about integer conditional move on integer condition. It should be noted that sheer latency is of lesser concern, as long as processor can efficiently handle several of them in pipeline. Condition Codes

Re: MONTMUL performance: t4 engine vs inlined t4

2013-06-01 Thread David Miller
I forgot to mention, the out of order execution unit renames the condition codes just like any other register. __ OpenSSL Project http://www.openssl.org Development Mailing List

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Andy Polyakov
For public reference. In certain degree it's apparent from the context, but the report is about RSA sign performance difference for OpenSSL SPARC T4 Montgomery multiplication module and corresponding Solaris T4 module, with OpenSSL being significantly slower. The least one can say [at this point]

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Andy Polyakov
For public reference. In certain degree it's apparent from the context, but the report is about RSA sign performance difference for OpenSSL SPARC T4 Montgomery multiplication module and corresponding Solaris T4 module, with OpenSSL being significantly slower. The least one can say [at this

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread David Miller
From: Andy Polyakov ap...@openssl.org Date: Fri, 31 May 2013 10:29:37 +0200 Another question is about suitability of floating-point fcmps and fmovd instructions. These are used to pick a vector from powers table in cache-timing neutral manner. I have to admit I haven't done due research

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Andy Polyakov
Another question is about suitability of floating-point fcmps and fmovd instructions. These are used to pick a vector from powers table in cache-timing neutral manner. I have to admit I haven't done due research whether or not they are optimal choice in the context, and/or whether or not we

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Andy Polyakov
... here is something to test. In crypto/bn/asm/sparct4-mont.pl there is a register windows warm-up sequence that is executed in 32-bit application context only (benchmarking on Linux had shown that it's not necessary in 64-bit application context). Could you test to engage it even in 64-bit

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Misaki.Miyashita
Hi Andy, The measurement I sent yesterday for OpenSSL (with inlined T4 instruction support) was not quite accurate. Some of the T4 specific code you committed was not enabled when we tested, and I realized that__sparc__ was not defined on our system. Thus, I changed #if defined(__sparc__) to

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-31 Thread Andy Polyakov
Hi, The measurement I sent yesterday for OpenSSL (with inlined T4 instruction support) was not quite accurate. Some of the T4 specific code you committed was not enabled when we tested, and I realized that __sparc__ was not defined on our system. Thus, I changed #if defined(__sparc__) to #if

Re: MONTMUL performance: t4 engine vs inlined t4

2013-05-30 Thread Misaki.Miyashita
Hi Andy, On 05/30/13 15:08, Ferenc Rakoczi wrote: Hi, Andy, Andy Polyakov wrote: First of all, RSA512 is essentially irrelevant and no attempt was made to optimize it. So let's just disregard RSA512 results (I have even removed them from above quoted part). Secondly note that our RSA