Hi,
Thank you so much for looking into the issue with Ferenc!
I'll incorporate the change into Solaris to verify the 20-30%
performance improvement.
The conservative approach sounds like the best approach at this point.
Once the performance improvement is verified, can you commit the change
Hi Andy,
Thank you so much for looking into the issue with Ferenc!
I'll incorporate the change into Solaris to verify the 20-30%
performance improvement.
The conservative approach sounds like the best approach at this point.
Once the performance improvement is verified, can you commit the
Misaki,
The measurement I sent yesterday for OpenSSL (with inlined T4
instruction support) was not quite accurate.
Some of the T4 specific code you committed was not enabled when we
tested, and I realized that __sparc__ was not defined on our system.
Thus, I changed #if defined(__sparc__) to
Another question is about suitability of floating-point fcmps and fmovd
instructions. These are used to pick a vector from powers table in
cache-timing neutral manner. I have to admit I haven't done due research
whether or not they are optimal choice in the context, and/or whether or
not we are
From: Andy Polyakov ap...@openssl.org
Date: Sat, 01 Jun 2013 09:38:18 +0200
I wonder about integer conditional move on integer condition. It
should be noted that sheer latency is of lesser concern, as long as
processor can efficiently handle several of them in
pipeline. Condition Codes
I forgot to mention, the out of order execution unit renames
the condition codes just like any other register.
__
OpenSSL Project http://www.openssl.org
Development Mailing List
For public reference. In certain degree it's apparent from the context,
but the report is about RSA sign performance difference for OpenSSL
SPARC T4 Montgomery multiplication module and corresponding Solaris T4
module, with OpenSSL being significantly slower. The least one can say
[at this point]
For public reference. In certain degree it's apparent from the context,
but the report is about RSA sign performance difference for OpenSSL
SPARC T4 Montgomery multiplication module and corresponding Solaris T4
module, with OpenSSL being significantly slower. The least one can say
[at this
From: Andy Polyakov ap...@openssl.org
Date: Fri, 31 May 2013 10:29:37 +0200
Another question is about suitability of floating-point fcmps and fmovd
instructions. These are used to pick a vector from powers table in
cache-timing neutral manner. I have to admit I haven't done due research
Another question is about suitability of floating-point fcmps and fmovd
instructions. These are used to pick a vector from powers table in
cache-timing neutral manner. I have to admit I haven't done due research
whether or not they are optimal choice in the context, and/or whether or
not we
... here is something to test. In
crypto/bn/asm/sparct4-mont.pl there is a register windows warm-up
sequence that is executed in 32-bit application context only
(benchmarking on Linux had shown that it's not necessary in 64-bit
application context). Could you test to engage it even in 64-bit
Hi Andy,
The measurement I sent yesterday for OpenSSL (with inlined T4
instruction support) was not quite accurate.
Some of the T4 specific code you committed was not enabled when we
tested, and I realized that__sparc__ was not defined on our system.
Thus, I changed #if defined(__sparc__) to
Hi,
The measurement I sent yesterday for OpenSSL (with inlined T4
instruction support) was not quite accurate.
Some of the T4 specific code you committed was not enabled when we
tested, and I realized that __sparc__ was not defined on our system.
Thus, I changed #if defined(__sparc__) to #if
Hi Andy,
On 05/30/13 15:08, Ferenc Rakoczi wrote:
Hi, Andy,
Andy Polyakov wrote:
First of all, RSA512 is essentially irrelevant and no attempt was
made to optimize it. So let's just disregard RSA512 results (I have
even removed them from above quoted part). Secondly note that our RSA
14 matches
Mail list logo