[openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2014-04-11 Thread Bodo Moeller via RT
For the record, I have reviewed Adam's versions of the code before these were
posted here, and Adam has resolved the problems that I pointed out. As of the
latest patch, I think the code is suitable for inclusion in OpenSSL. The final
missing part is support that makes it easy to build with or without this NIST
P-256 implementation, depending on whether the target platform supports it,
similar to the enable-ec_nistp_64_gcc_128 config option for the 64-bit
optimized implementations using type __uint128_t. (The current patch
unconditionally links in the new files, but we may not even be able to process
the new assembler files.)

Also, it would be nice to have Shay review our changes to his contribution (the
March 11 patch.bz2 as further changed by the April 10 patch) to make sure
that while fixing the problems we found, we didn't do unwanted changes.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


RE: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-12 Thread Gueron, Shay via RT
 Do you have any comment from Intel on the concerns regarding the 
 scattering technique 
 (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)?

First, a comment: it is difficult to actually understand the precise claim by 
the authors, from these 6 slides. The code snippet does not include sufficient 
information to determine what the experiment really is. With this disclaimer, 
my understanding is this: the authors of the presentation claim that accessing 
different locations on a cache line shows different latencies – depending on 
the location (within the cache line). The graph shows a *significantly* larger 
latency for doubleword #0. As a result, they insinuate that the 
scattering-gathering method for accessing tables does not protect against side 
channel analysis. 

This, however, seems to me to be incorrect. 

Perhaps what the experiment shows is due to “cache line splits”, where the data 
that is being accessed does not reside completely in a single cahce line. For 
example, when aligned on 56 bytes, and reading 64-bit quadwords, the first one 
(say Q0) is in one cache line and the rest are in another. So, when flushing 
(only) the cache line that holds Q0, subsequent reading of Q0 would be “slow” 
but reading the rest would be fast. 

It is not possible to see how the array was aligned in the discussed 
presentation, because the code snippet does not include this part. 

Anyway, this phenomenon is not a vulnerability - it is only about a proper 
implementation of the gather/scatter method: the elements of the table simply 
need to be naturally aligned in a way that every single element belongs to the 
same cache line. The programmer needs make sure that the elements of the table 
are naturally aligned (e.g., aligned (64)). As long as there are no cache line 
splits, there should be no timing difference between accesses to a cache line.

Shay 

-Original Message-
From: Bodo Moeller via RT [mailto:r...@openssl.org] 
Sent: Friday, November 08, 2013 12:09
To: Gueron, Shay
Cc: openssl-dev@openssl.org
Subject: Re: [openssl.org #3149] [patch] Fast and side channel protected 
implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

 Here is an updated version of the patch.

 Addressing a) pointer to the function (to select ADCX/ADOX) and b) 
 multiple points addition

 There is (only) ~1% performance deterioration in due to the pointer 
 being passed now, instead of (originally) being static. You can choose 
 which style is preferable.


Thanks!

Alternatives would be (a) using a new lock for safe static initialization, or 
(b) more code duplication to avoid the need for an explicit pointer (there 
could be two separate implementations for the higher-level routines).  However, 
given the 1% performance penalty, that's a minor issue at this point.

Do you have any comment from Intel on the concerns regarding the scattering 
technique (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)?

Bodo

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-09 Thread Andy Polyakov via RT
 Do you have any comment from Intel on the concerns regarding the scattering
 technique (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)?
 
 As discussed off-list in this case the discrepancy is because so called 
 memory disambiguation logic attempting to move loads ahead of stores, 
 and failing when the least significant bits are same. Naturally load 
 ought to be given opportunity to *try* to get ahead. I mean if there 
 is enough work done between store and potentially conflicting load, 
 then load won't try to get ahead of the store and variation . And 
 indeed, if you add instructions to the test program the variation 
 disappears. On Intel CPUs amount worth 5 cycles appears to be 
 sufficient.

This might be misinterpreted. It's not necessarily that 5 cycles is 
sufficient for load not to try to get ahead, but it's sufficient to 
amortize eventual variations.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-09 Thread Andy Polyakov

Not sent to RT, to openssl-dev only.


Alternatives would be (a) using a new lock for safe static initialization,
or (b) more code duplication to avoid the need for an explicit pointer
(there could be two separate implementations for the higher-level
routines).  However, given the 1% performance penalty, that's a minor issue
at this point.

While if (functiona==NULL || functionb==NULL) { asssign functiona,
functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign
functiona } followed by if (functionb) { assign functionb } is.


That's not thread-safe.  There's no memory barrier so the writes can
be reordered.  The compiler itself could reorder those instructions.


I want to remind that in this case assignments are constant in sense 
that all threads will assign same value. So that disjoint if's work out 
even if threads are reading/writing out of order. As for reorder by 
compiler. No, it won't do that. If it actually would have this option, 
then if (ptr) *ptr wouldn't be safe. If compiler [or hardware] does 
that kind of speculation it is also obliged to verify the assumption and 
act accordingly. Or do you mean that it could reorder it as 
if-if-assign-assign? This is not a problem. The only *formally* valid 
concerns are pointed by Bodo in another message, and they are that 
function[ab] are misaligned (and therefore access is not atomic), and 
that they are used for something else earlier. As for latter. If 
compiler was to use explicitly initialized shared variable for something 
else, it would have to do in a way invisible to program code. It is only 
possible after *full* inter-procedural analysis for given *binary*. But 
then analyzer would have to be aware of transition from single- to 
multi-threaded operation and stop using variable for something else 
before multi-threaded part [and even initialize it accordingly]. And as 
for former. In the context one should recognize that we are not talking 
about *universally* portable code. The code in question *is* 
platform-specific and we are free to take into consideration such 
implementation specifics as object alignment [in supported compilers].

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Bodo Moeller via RT
 Here is an updated version of the patch.

 Addressing a) pointer to the function (to select ADCX/ADOX) and b)
 multiple points addition

 There is (only) ~1% performance deterioration in due to the pointer being
 passed now, instead of (originally) being static. You can choose which
 style is preferable.


Thanks!

Alternatives would be (a) using a new lock for safe static initialization,
or (b) more code duplication to avoid the need for an explicit pointer
(there could be two separate implementations for the higher-level
routines).  However, given the 1% performance penalty, that's a minor issue
at this point.

Do you have any comment from Intel on the concerns regarding the scattering
technique (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)?

Bodo

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Nico Williams
On Fri, Nov 8, 2013 at 4:08 AM, Bodo Moeller via RT r...@openssl.org wrote:
 Alternatives would be (a) using a new lock for safe static initialization,

Maybe you could try my patches on my thread_safety branch of my github
clone of OpenSSL?  (https://github.com/nicowilliams/openssl)

Nico
--
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Andy Polyakov via RT
 Here is an updated version of the patch.

 Addressing a) pointer to the function (to select ADCX/ADOX) and b)
 multiple points addition

 There is (only) ~1% performance deterioration in due to the pointer being
 passed now, instead of (originally) being static. You can choose which
 style is preferable.

 
 Thanks!
 
 Alternatives would be (a) using a new lock for safe static initialization,
 or (b) more code duplication to avoid the need for an explicit pointer
 (there could be two separate implementations for the higher-level
 routines).  However, given the 1% performance penalty, that's a minor issue
 at this point.

While if (functiona==NULL || functionb==NULL) { asssign functiona, 
functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign 
functiona } followed by if (functionb) { assign functionb } is.

 Do you have any comment from Intel on the concerns regarding the scattering
 technique (http://cryptojedi.org/peter/data/chesrump-20130822.pdf)?

As discussed off-list in this case the discrepancy is because so called 
memory disambiguation logic attempting to move loads ahead of stores, 
and failing when the least significant bits are same. Naturally load 
ought to be given opportunity to *try* to get ahead. I mean if there 
is enough work done between store and potentially conflicting load, 
then load won't try to get ahead of the store and variation . And 
indeed, if you add instructions to the test program the variation 
disappears. On Intel CPUs amount worth 5 cycles appears to be 
sufficient. For record, update for x86_64-mont modules is being prepared...


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Nico Williams
On Fri, Nov 8, 2013 at 2:43 PM, Andy Polyakov via RT r...@openssl.org wrote:
 Alternatives would be (a) using a new lock for safe static initialization,
 or (b) more code duplication to avoid the need for an explicit pointer
 (there could be two separate implementations for the higher-level
 routines).  However, given the 1% performance penalty, that's a minor issue
 at this point.

 While if (functiona==NULL || functionb==NULL) { asssign functiona,
 functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign
 functiona } followed by if (functionb) { assign functionb } is.

That's not thread-safe.  There's no memory barrier so the writes can
be reordered.  The compiler itself could reorder those instructions.

Nico
--
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Bodo Moeller via RT
 While if (functiona==NULL || functionb==NULL) { asssign functiona,
 functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign
 functiona } followed by if (functionb) { assign functionb } is.


We're implicitly assuming here that (thanks to alignment, etc.) each
pointer can be accessed atomically, which so far seems reasonable given
the particular platform this code is for. However, the C11 memory model
also allows the compiler to assume there's no write race, and it thus
could, for example, use the same memory location to hold other temporary
values, which could then be misinterpreted as the function pointer by
concurrent threads. See
http://static.usenix.org/event/hotpar11/tech/final_files/Boehm.pdf for
ideas how this might break -- maybe not right now, but possibly with future
compilers, possibly after this code has evolved a bit.

(I'm not promising that it will actually break, but thread-safety analysis
tools are likely to complain loudly.  And at some point the code might
actually fail spectacularly.)

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


RE: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-29 Thread Gueron, Shay via RT
Thanks you Bodo, for the comments. 

Here are some quick answers

 It seems that the BN_MONT_CTX-related code
The optimization made for the computation of the modular inverse in the ECDSA 
sigh, is using const-time mod-exp. 
Indeed, this is independent of the rest of the patch, and it can be used 
independently (for other usages of the library).
We included this addition in the patch for the particular usage in ECDSA.

The paper: it will be posted soon. 

 Note that in your code, OPENSSL_ia32cap_P-dependent initialization of 
 global variables is not done in a thread-safe way.
This initialization is used for selecting a code path that would use ADCX/ADOX 
instructions when the processor supports them. The outcome depends only on the 
appropriate CPUID bits. Therefore, there is no “thread-safe” issue (because any 
thread would select the same path). 
Of course, feel free to use the patch code and modify this initialization to 
match OpenSSL conventions.  

 Your ec_p256_points_mul implementation is much worse than necessary when 
 then input comprises many points
Indeed right. However, this patch is intended to optimize ECDSA sign/verify 
(and ECDH). This usage does not require adding more than a single point. If 
there are interesting cases - optimized multi-point addition can be added.

Regards, 
Shay Gueron

-Original Message-
From: Bodo Moeller via RT [mailto:r...@openssl.org] 
Sent: Thursday, October 24, 2013 19:18
To: Gueron, Shay
Cc: openssl-dev@openssl.org
Subject: [openssl.org #3149] [patch] Fast and side channel protected 
implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms 

Thanks for the submission!

It seems that the BN_MONT_CTX-related code (used in crypto/ecdsa for 
constant-time signing) is entirely independent of the remainder of the patch, 
and should be considered separately.


Regarding your reference 'S.Gueron and V.Krasnov, Fast Prime Field Elliptic 
Curve Cryptography with 256 Bit Primes' for you NIST P-256 code, is that 
document available? (Web search only pointed me back to your patch.)

I've noticed that for secret-independent constant-time memory access, your code 
relies on the scattering approach. However 
http://cryptojedi.org/peter/data/chesrump-20130822.pdf points out that 
apparently this doesn't actually work as intended. (Dan Bernstein's earlier
references: Sections 14, 15 in http://cr.yp.to/papers.html#cachetiming;
http://cr.yp.to/mac/athlon.html.)

Note that in your code, OPENSSL_ia32cap_P-dependent initialization of global 
variables is not done in a thread-safe way. How about entirely avoiding this 
global state, and passing pointers down to the implementations?

Your ec_p256_points_mul implementation is much worse than necessary when then 
input comprises many points (more precisely, more than one point other than the 
group generator), because you call ec_p256_windowed_mul multiple times 
separately and add the results. I'd suggest instead to implement this modeled 
on ec_GFp_nistp256_points_mul instead to benefit from interleaved left-to-right 
point multiplication. (This avoids the additional point-double operations from 
the separate point multiplication algorithm executions going through each 
additional scalar.) Your approach for precomputation also is different (using 
fewer point operations based on a larger precomputed table than the one we 
currently use in ec_GFp_nistp256_points_mul) -- that table size still seems 
appropriate, so keeping that probably makes sense.

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-29 Thread Bodo Moeller via RT
 This initialization is used for selecting a code path that would use
ADCX/ADOX
 instructions when the processor supports them. The outcome depends only on
 the appropriate CPUID bits. Therefore, there is no “thread-safe” issue
(because
 any thread would select the same path).

I understand that that's the idea, and would have considered the code to be
safe a while ago (and might have written the initialization just like that),
but actually compiler transformations that are legal with the C memory model
could break this. See
http://static.usenix.org/event/hotpar11/tech/final_files/Boehm.pdf for
inspiration.


 Your ec_p256_points_mul implementation is much worse than necessary when
then input comprises many points

 Indeed right. However, this patch is intended to optimize ECDSA sign/verify
(and ECDH). This usage does not require adding more than a single point.

Sure, but there's no compelling reason to make the other (rarer) use cases
slower. Also, adapting the addition/subtraction chain used in the existing
crypto/ec/ecp_nistp256.c (modified Booth encoding instead of unsigned windows)
could make the new implementation even faster.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


RE: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-29 Thread Gueron, Shay
Thanks you Bodo, for the comments. 

Here are some quick answers

 It seems that the BN_MONT_CTX-related code
The optimization made for the computation of the modular inverse in the ECDSA 
sigh, is using const-time mod-exp. 
Indeed, this is independent of the rest of the patch, and it can be used 
independently (for other usages of the library).
We included this addition in the patch for the particular usage in ECDSA.

The paper: it will be posted soon. 

 Note that in your code, OPENSSL_ia32cap_P-dependent initialization of 
 global variables is not done in a thread-safe way.
This initialization is used for selecting a code path that would use ADCX/ADOX 
instructions when the processor supports them. The outcome depends only on the 
appropriate CPUID bits. Therefore, there is no “thread-safe” issue (because any 
thread would select the same path). 
Of course, feel free to use the patch code and modify this initialization to 
match OpenSSL conventions.  

 Your ec_p256_points_mul implementation is much worse than necessary when 
 then input comprises many points
Indeed right. However, this patch is intended to optimize ECDSA sign/verify 
(and ECDH). This usage does not require adding more than a single point. If 
there are interesting cases - optimized multi-point addition can be added.

Regards, 
Shay Gueron

-Original Message-
From: Bodo Moeller via RT [mailto:r...@openssl.org] 
Sent: Thursday, October 24, 2013 19:18
To: Gueron, Shay
Cc: openssl-dev@openssl.org
Subject: [openssl.org #3149] [patch] Fast and side channel protected 
implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms 

Thanks for the submission!

It seems that the BN_MONT_CTX-related code (used in crypto/ecdsa for 
constant-time signing) is entirely independent of the remainder of the patch, 
and should be considered separately.


Regarding your reference 'S.Gueron and V.Krasnov, Fast Prime Field Elliptic 
Curve Cryptography with 256 Bit Primes' for you NIST P-256 code, is that 
document available? (Web search only pointed me back to your patch.)

I've noticed that for secret-independent constant-time memory access, your code 
relies on the scattering approach. However 
http://cryptojedi.org/peter/data/chesrump-20130822.pdf points out that 
apparently this doesn't actually work as intended. (Dan Bernstein's earlier
references: Sections 14, 15 in http://cr.yp.to/papers.html#cachetiming;
http://cr.yp.to/mac/athlon.html.)

Note that in your code, OPENSSL_ia32cap_P-dependent initialization of global 
variables is not done in a thread-safe way. How about entirely avoiding this 
global state, and passing pointers down to the implementations?

Your ec_p256_points_mul implementation is much worse than necessary when then 
input comprises many points (more precisely, more than one point other than the 
group generator), because you call ec_p256_windowed_mul multiple times 
separately and add the results. I'd suggest instead to implement this modeled 
on ec_GFp_nistp256_points_mul instead to benefit from interleaved left-to-right 
point multiplication. (This avoids the additional point-double operations from 
the separate point multiplication algorithm executions going through each 
additional scalar.) Your approach for precomputation also is different (using 
fewer point operations based on a larger precomputed table than the one we 
currently use in ec_GFp_nistp256_points_mul) -- that table size still seems 
appropriate, so keeping that probably makes sense.

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


[openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-24 Thread Bodo Moeller via RT
Thanks for the submission!

It seems that the BN_MONT_CTX-related code (used in crypto/ecdsa for
constant-time signing) is entirely independent of the remainder of the patch,
and should be considered separately.


Regarding your reference 'S.Gueron and V.Krasnov, Fast Prime Field Elliptic
Curve Cryptography with 256 Bit Primes' for you NIST P-256 code, is that
document available? (Web search only pointed me back to your patch.)

I've noticed that for secret-independent constant-time memory access, your code
relies on the scattering approach. However
http://cryptojedi.org/peter/data/chesrump-20130822.pdf points out that
apparently this doesn't actually work as intended. (Dan Bernstein's earlier
references: Sections 14, 15 in http://cr.yp.to/papers.html#cachetiming;
http://cr.yp.to/mac/athlon.html.)

Note that in your code, OPENSSL_ia32cap_P-dependent initialization of global
variables is not done in a thread-safe way. How about entirely avoiding this
global state, and passing pointers down to the implementations?

Your ec_p256_points_mul implementation is much worse than necessary when then
input comprises many points (more precisely, more than one point other than the
group generator), because you call ec_p256_windowed_mul multiple times
separately and add the results. I'd suggest instead to implement this modeled
on ec_GFp_nistp256_points_mul instead to benefit from interleaved left-to-right
point multiplication. (This avoids the additional point-double operations from
the separate point multiplication algorithm executions going through each
additional scalar.) Your approach for precomputation also is different (using
fewer point operations based on a larger precomputed table than the one we
currently use in ec_GFp_nistp256_points_mul) -- that table size still seems
appropriate, so keeping that probably makes sense.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-10-22 Thread Gueron, Shay via RT
Hello all,

This patch is a contribution to OpenSSL.

It offers an efficient and constant-time implementation of EC multiplication 
for NIST P256 curve.
It accelerated ECDSA (sign and verify) as well as ECDH (compute and generate 
key), for the P256 curve.

The implementation is based on Montgomery multiplication, specifically 
optimized for the P-256 prime, using x86-64 assembly. It is intended to run on 
any x86-64 CPU (running Linux).

We measured the performance benefit of this patch on Intel(R) CPUs Codename 
Haswell and Sandy Bridge, using 'openssl speed'. Here are the numbers.

Single thread, single core:

Sandy Bridge@3.4GHz:
ECDSA sign:
OpenSSL - 8,759 sign/sec
OpenSSL with NISTP enabled - 8,822 sign/sec
This patch - 17,111 sign/sec
1.95/1.94X speedup
ECDSA verify:
OpenSSL - 2,303 verify/sec
OpenSSL with NISTP enabled - 3,831 verify/sec
This patch - 6,193 verify/sec
2.69/1.62X speedup
ECDH:
OpenSSL - 2,795 op/sec
OpenSSL with NISTP enabled - 5,189 op/sec
This patch - 8,087 op/sec
2.89X/1.56X speedup

Haswell@3.4GHz:
ECDSA sign:
OpenSSL - 10,212sign/sec
OpenSSL with NISTP enabled - 10,499 sign/sec
This patch - 22,747 sign/sec
2.22X/2.16X speedup
ECDSA verify:
OpenSSL - 2,647 verify/sec
OpenSSL with NISTP enabled - 4,384 verify/sec
This patch - 7,622 verify/sec
2.88X/1.74X speedup
ECDH:
OpenSSL - 3,193 op/sec
OpenSSL with NISTP enabled - 5,932 op/sec
This patch - 10,288 op/sec
3.22X/1.73X speedup

Two threads, single core:

Sandy Bridge@3.4GHz:
ECDSA sign:
OpenSSL - 10,076 sign/sec
OpenSSL with NISTP enabled - 8,911 sign/sec
This patch - 19,901 sign/sec
2/2.2X speedup
ECDSA verify:
OpenSSL - 2,480 verify/sec
OpenSSL with NISTP enabled - 3,840 verify/sec
This patch - 7,056 verify/sec
2.8/1.8X speedup
ECDH:
OpenSSL - 2,983 op/sec
OpenSSL with NISTP enabled - 5,290 op/sec
This patch - 9,657 op/sec
3.2X/1.8X speedup

Haswell@3.4GHz:
ECDSA sign:
OpenSSL - 11,004sign/sec
OpenSSL with NISTP enabled - 10,153 sign/sec
This patch - 25,094 sign/sec
2.3X/2.5X speedup
ECDSA verify:
OpenSSL - 2,598 verify/sec
OpenSSL with NISTP enabled - 3,947 verify/sec
This patch - 7,705 verify/sec
3X/2X speedup
ECDH:
OpenSSL - 3,166 op/sec
OpenSSL with NISTP enabled - 5,387 op/sec
This patch - 10,438 op/sec
3.3X/1.9X speedup

Reference:
[1] S. Gueron, V. Krasnov: Fast Prime Field Elliptic Curve Cryptography with 
256 Bit Primes

Developers and authors:
***
Shay Gueron (1, 2), and Vlad Krasnov (1)
(1) Intel Corporation, Israel Development Center, Haifa, Israel
(2) University of Haifa, Israel
***
Copyright(c) 2013, Intel Corp.


-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org