Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-15 Thread Andy Polyakov via RT
Hi,

 This patch is a contribution to OpenSSL.
 
 It offers an efficient and constant-time implementation of the elliptic
 curve point multiplication, for the following standard NIST/SECG binary
 elliptic curves:
 sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1,
 sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1,
 sect571k1, and sect571r1.
 
 The patch implements several improvements at the algorithmic and the
 coding levels (using SSE/AVX and PCLMULQDQ instructions).

Cool stuff! This is initial message, primarily for reference. The 
submission will be carefully examined aiming to cover more platforms 
than just Intel. More and more processors support polinomial 
multiplication in hardware, SPARC, ARM, POWER8, ...


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-04 Thread Manuel Bluhm
Dear David,

in response to your comment, the numbers below provide a comparison for
the patch, compared to OpenSSL-1.0.1e, on Haswell and Ivy Bridge. The
speedup indicates the different performance of binary and prime curves
of similar bit length.

With this patch, both architectures perform much more ECDH operations
with binary curves. Additionally, more ECDSA sign/verify operations are
achieved on Haswell, and more verifications on Ivy Bridge (but less
signs).


Curves for speed comparison:

GF(p)  GF(2^m)
secp160r1 -  nist(b,k)163
nistp224  -  nist(b,k)233 
nistp256  -  nist(b,k)283 
nistp384  -  nist(b,k)409 
nistp521  -  nist(b,k)571 


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) [1]:

./openssl speed ecdh

ECDH op/s  
(secp160r1)  7391.5 
(nistp224)  11993.8  
(nistp256)   6489.0  
(nistp384)   1848.5  
(nistp521)   1682.8  

   Speedup
(nistk163)  67212.49.09
(nistk233)  39102.23.26
(nistk283)  27586.54.25
(nistk409)  11611.26.28
(nistk571)   5941.83.53

   Speedup
(nistb163)  61667.88.34
(nistb233)  35246.42.94
(nistb283)  24320.73.75
(nistb409)  10238.15.54
(nistb571)   5158.83.07


./openssl speed ecdsa

SIGN/s/s VERIFY/s   
(secp160r1) 21750.8   6029.0   
(nistp224)  18393.5   8345.4   
(nistp256)  11391.7   4744.9   
(nistp384)   6447.4   1566.0   
(nistp521)   2949.5   1249.7   
 
   SIGN/s  VERIFY/s   Speedups
(nistk163) 36660.3  26646.7   1.69 4.42
(nistk233) 23142.7  15842.9   1.26 1.90
(nistk283) 16941.7  11059.7   1.49 2.33
(nistk409)  8198.4   4861.4   1.27 3.10
(nistk571)  4446.7   2547.6   1.51 2.04
 
SIGN/s VERIFY/s  Speedups
(nistb163) 34738.8  25113.6  1.60 4.17
(nistb233) 21531.7  14341.8  1.17 1.72
(nistb283) 15635.1  10061.6  1.37 2.12
(nistb409)  7479.5   4390.6  1.16 2.80
(nistb571)  4029.7   2269.6  1.37 1.82


The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) [2]:

./openssl speed ecdh   

ECDH op/s  
(secp160r1)  .3  
(nistp224)   7573.1  
(nistp256)   3891.5  
(nistp384)   1051.9  
(nistp521)971.0  

   Speedup
(nistk163)  27837.06.26
(nistk233)  14946.41.97
(nistk283)   9026.52.32
(nistk409)   3879.53.69
(nistk571)   1822.31.88

  Speedup
(nistb163)  24043.5   5.41
(nistb233)  13057.0   1.72
(nistb283)   7754.4   1.99
(nistb409)   3319.6   3.16
(nistb571)   1565.0   1.61


./openssl speed ecdsa

 SIGN/s  VERIFY/s
(secp160r1)  12978.6  3671.5
(nistp224)   11196.0  5130.2
(nistp256)6819.4  2829.3
(nistp384)3727.5   849.5
(nistp521)1712.6   723.5

SIGN/s  VERIFY/sSpeedups
(nistk163)  17794.1  11730.11.37  3.19
(nistk233)  10396.8   6450.80.93  1.26
(nistk283)   6671.7   3955.10.98  1.40
(nistk409)   3148.8   1754.40.84  2.07
(nistk571)   1560.1836.10.91  1.16

SIGN/s  VERIFY/sSpeedups
(nistb163)  16278.4  10452.21.25  2.85
(nistb233)   9423.7   5715.40.84  1.11
(nistb283)   5987.4   3458.50.88  1.22
(nistb409)   2769.2   1523.10.74  1.79
(nistb571)   1358.0724.00.79  1.00


The code has been compiled with gcc 4.8.1 and the following
configurations:

 [1]  Core i7-4770  @ 3.40GHz (Haswell):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DFAST_PCLMUL -DOPENSSL_FAST_EC2M

 [2] Core i5-3210M @ 2.50 GHz (Ivy Bridge):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DOPENSSL_FAST_EC2M


Best regards,
Manuel
 

On Mo, 2013-09-02 at 19:57 -0700, David Jacobson wrote:
 Let me chime in with an amendment to Audrey's message.  It would be
 nice if the tables included performance numbers for prime modulus
 curves, even if the technique's of Manuel's patch are not applicable
 there.  Many people would like to know whether there is significant
 performance gains to be had by switching from GF(p) to GF(2^k) curves.
 
 --David Jacobson
 
 On 9/2/13 11:47 AM, Andrey Kulikov wrote:
 
  Dear Manuel,
  
  
  Exciting news!
  
  While your paper still unpublished, could you please advice, it
  there anything even nearly similar possible for curves over primary
  fields?
  
  (e.g. curves secp* )
  
  
  Best regards,
  
  Andrey
  
  
  
  On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote:
  Hello all,
  
  This patch is a contribution to OpenSSL.
  
  It offers an efficient and constant-time implementation of
  the elliptic
  curve point multiplication, for the following standard
  NIST/SECG binary
  elliptic curves:
  sect163k1, sect163r1, sect163r2, sect193r1, sect193r2,
  sect233k1,
  sect233r1, sect239k1, sect283k1, 

Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-04 Thread David Jacobson

Manuel,

Thank you very much for this data.  It is amazing how different the 
results for ECDH and ECDSA sign/verify are.  I would have thought that 
the ECDSA sign/verify times would be dominated by the point 
multiplication and hence would be very similar to ECDH, which is 
basically nothing but point multiplication, but apparently not.


--David

On 9/3/13 7:33 PM, Manuel Bluhm wrote:

Dear David,

in response to your comment, the numbers below provide a comparison for
the patch, compared to OpenSSL-1.0.1e, on Haswell and Ivy Bridge. The
speedup indicates the different performance of binary and prime curves
of similar bit length.

With this patch, both architectures perform much more ECDH operations
with binary curves. Additionally, more ECDSA sign/verify operations are
achieved on Haswell, and more verifications on Ivy Bridge (but less
signs).


Curves for speed comparison:

GF(p)  GF(2^m)
secp160r1 -  nist(b,k)163
nistp224  -  nist(b,k)233
nistp256  -  nist(b,k)283
nistp384  -  nist(b,k)409
nistp521  -  nist(b,k)571


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) [1]:

./openssl speed ecdh

 ECDH op/s
(secp160r1)  7391.5
(nistp224)  11993.8
(nistp256)   6489.0
(nistp384)   1848.5
(nistp521)   1682.8
 
Speedup

(nistk163)  67212.49.09
(nistk233)  39102.23.26
(nistk283)  27586.54.25
(nistk409)  11611.26.28
(nistk571)   5941.83.53
 
Speedup

(nistb163)  61667.88.34
(nistb233)  35246.42.94
(nistb283)  24320.73.75
(nistb409)  10238.15.54
(nistb571)   5158.83.07


./openssl speed ecdsa

 SIGN/s/s VERIFY/s
(secp160r1) 21750.8   6029.0
(nistp224)  18393.5   8345.4
(nistp256)  11391.7   4744.9
(nistp384)   6447.4   1566.0
(nistp521)   2949.5   1249.7
  
SIGN/s  VERIFY/s   Speedups

(nistk163) 36660.3  26646.7   1.69 4.42
(nistk233) 23142.7  15842.9   1.26 1.90
(nistk283) 16941.7  11059.7   1.49 2.33
(nistk409)  8198.4   4861.4   1.27 3.10
(nistk571)  4446.7   2547.6   1.51 2.04
  
 SIGN/s VERIFY/s  Speedups

(nistb163) 34738.8  25113.6  1.60 4.17
(nistb233) 21531.7  14341.8  1.17 1.72
(nistb283) 15635.1  10061.6  1.37 2.12
(nistb409)  7479.5   4390.6  1.16 2.80
(nistb571)  4029.7   2269.6  1.37 1.82


The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) [2]:

./openssl speed ecdh
 
 ECDH op/s

(secp160r1)  .3
(nistp224)   7573.1
(nistp256)   3891.5
(nistp384)   1051.9
(nistp521)971.0
 
Speedup

(nistk163)  27837.06.26
(nistk233)  14946.41.97
(nistk283)   9026.52.32
(nistk409)   3879.53.69
(nistk571)   1822.31.88
 
   Speedup

(nistb163)  24043.5   5.41
(nistb233)  13057.0   1.72
(nistb283)   7754.4   1.99
(nistb409)   3319.6   3.16
(nistb571)   1565.0   1.61


./openssl speed ecdsa

  SIGN/s  VERIFY/s
(secp160r1)  12978.6  3671.5
(nistp224)   11196.0  5130.2
(nistp256)6819.4  2829.3
(nistp384)3727.5   849.5
(nistp521)1712.6   723.5
 
 SIGN/s  VERIFY/sSpeedups

(nistk163)  17794.1  11730.11.37  3.19
(nistk233)  10396.8   6450.80.93  1.26
(nistk283)   6671.7   3955.10.98  1.40
(nistk409)   3148.8   1754.40.84  2.07
(nistk571)   1560.1836.10.91  1.16
 
 SIGN/s  VERIFY/sSpeedups

(nistb163)  16278.4  10452.21.25  2.85
(nistb233)   9423.7   5715.40.84  1.11
(nistb283)   5987.4   3458.50.88  1.22
(nistb409)   2769.2   1523.10.74  1.79
(nistb571)   1358.0724.00.79  1.00


The code has been compiled with gcc 4.8.1 and the following
configurations:

  [1]  Core i7-4770  @ 3.40GHz (Haswell):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DFAST_PCLMUL -DOPENSSL_FAST_EC2M

  [2] Core i5-3210M @ 2.50 GHz (Ivy Bridge):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DOPENSSL_FAST_EC2M

 
Best regards,

Manuel
  


On Mo, 2013-09-02 at 19:57 -0700, David Jacobson wrote:

Let me chime in with an amendment to Audrey's message.  It would be
nice if the tables included performance numbers for prime modulus
curves, even if the technique's of Manuel's patch are not applicable
there.  Many people would like to know whether there is significant
performance gains to be had by switching from GF(p) to GF(2^k) curves.

 --David Jacobson

On 9/2/13 11:47 AM, Andrey Kulikov wrote:


Dear Manuel,


Exciting news!

While your paper still unpublished, could you please advice, it
there anything even nearly similar possible for curves over primary
fields?

(e.g. curves secp* )


Best regards,

Andrey



On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote:
 Hello all,
 
 This patch is a contribution to OpenSSL.
 
 It offers an efficient and constant-time implementation of

 the elliptic
 curve point 

Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-03 Thread Manuel Bluhm
Dear Andrey,

the scope of this work is limited to binary curves only. However, there
might be ways to 
speed up prime curves with vector instructions, but not the same way as
we did for 
binary curves. Most of the performance gain comes from the fast binary
field arithmetic
implementation, which is different from the prime field arithmetic.

Best regards,
Manuel 

On Mo, 2013-09-02 at 22:47 +0400, Andrey Kulikov wrote:
 Dear Manuel,
 
 
 
 Exciting news!
 
 
 While your paper still unpublished, could you please advice, it there
 anything even nearly similar possible for curves over primary fields?
 
 
 (e.g. curves secp* )
 
 
 
 Best regards,
 
 
 Andrey
 
 
 
 
 On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote:
 
 Hello all,
 
 This patch is a contribution to OpenSSL.
 
 It offers an efficient and constant-time implementation of the
 elliptic
 curve point multiplication, for the following standard
 NIST/SECG binary
 elliptic curves:
 sect163k1, sect163r1, sect163r2, sect193r1, sect193r2,
 sect233k1,
 sect233r1, sect239k1, sect283k1, sect283r1, sect409k1,
 sect409r1,
 sect571k1, and sect571r1.
 
 The patch implements several improvements at the algorithmic
 and the
 coding levels (using SSE/AVX and PCLMULQDQ instructions).
 
 Depending on the curve and architecture, this patch offers a
 speedup of
 between 4x to 10x for ECDH and ECDSA, compared to the current
 implementation of OpenSSL 1.0.1e.
 Additionally, it adds side channel protection to avoid (cache)
 timing
 attacks using a number of mechanisms.
 
 The code is written in C and uses compiler intrinsics, for
 simplicity
 and portability. The following results were obtained with gcc
 4.8.1.
 
 For detailed explanations of the rationale and algorithms of
 this code
 refer to [1].
 
 
 ECDH performance
 
 --
 
 The performance was measured by using openssl speed utility as
 follows:
 
 $ openssl speed ecdh
 
 
 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH
 op/s:
 
 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||6586.9  ||  67029.6||  10.18   ||
 (nistk233)  ||5121.9  ||  39441.3||   7.70   ||
 (nistk283)  ||2825.7  ||  27718.5||   9.81   ||
 (nistk409)  ||1745.8  ||  11634.2||   6.66   ||
 (nistk571)  || 763.2  ||   5930.9||   7.77   ||
 (nistb163)  ||6382.5  ||  60729.6||   9.52   ||
 (nistb233)  ||4881.9  ||  35230.4||   7.22   ||
 (nistb283)  ||2651.6  ||  24456.4||   9.22   ||
 (nistb409)  ||1640.3  ||  10228.6||   6.24   ||
 (nistb571)  || 693.8  ||   5172.1||   7.45   ||
 |||| ||  ||
 ||||-||--||
 
 
 The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in
 ECDH op/s:
 
 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||3271.5  ||  28087.3||   8.59   ||
 (nistk233)  ||2504.9  ||  15106.0||   6.03   ||
 (nistk283)  ||1317.0  ||   9030.5||   6.86   ||
 (nistk409)  || 772.1  ||   3880.8||   5.03   ||
 (nistk571)  || 327.3  ||   1821.1||   5.56   ||
 (nistb163)  ||3067.9  ||  24357.1||   7.94   ||
 (nistb233)  ||2424.9  ||   3147.3||   5.42   ||
 (nistb283)  ||1227.0  ||   7765.1||   6.33   ||
 (nistb409)  || 709.7  ||   3319.9||   4.68   ||
 (nistb571)  || 296.2  ||   1563.9||   5.28   ||
 |||| ||  ||
 ||||-||--||
 
 
 
 ECDSA performance
 
 --
 
 The performance was measured by using openssl speed utility as
 follows:
 
 $ openssl speed ecdsa
 
 
 The results for a Core i7-4770 CPU @ 

Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-03 Thread Manuel Bluhm
Dear Andrey,

the scope of this work is limited to binary curves only, however, there
might be ways to 
speed up prime curves with vector instructions. 

Best regards,
Manuel 

On Mo, 2013-09-02 at 22:47 +0400, Andrey Kulikov wrote:
 Dear Manuel,
 
 
 
 Exciting news!
 
 
 While your paper still unpublished, could you please advice, it there
 anything even nearly similar possible for curves over primary fields?
 
 
 (e.g. curves secp* )
 
 
 
 Best regards,
 
 
 Andrey
 
 
 
 
 On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote:
 
 Hello all,
 
 This patch is a contribution to OpenSSL.
 
 It offers an efficient and constant-time implementation of the
 elliptic
 curve point multiplication, for the following standard
 NIST/SECG binary
 elliptic curves:
 sect163k1, sect163r1, sect163r2, sect193r1, sect193r2,
 sect233k1,
 sect233r1, sect239k1, sect283k1, sect283r1, sect409k1,
 sect409r1,
 sect571k1, and sect571r1.
 
 The patch implements several improvements at the algorithmic
 and the
 coding levels (using SSE/AVX and PCLMULQDQ instructions).
 
 Depending on the curve and architecture, this patch offers a
 speedup of
 between 4x to 10x for ECDH and ECDSA, compared to the current
 implementation of OpenSSL 1.0.1e.
 Additionally, it adds side channel protection to avoid (cache)
 timing
 attacks using a number of mechanisms.
 
 The code is written in C and uses compiler intrinsics, for
 simplicity
 and portability. The following results were obtained with gcc
 4.8.1.
 
 For detailed explanations of the rationale and algorithms of
 this code
 refer to [1].
 
 
 ECDH performance
 
 --
 
 The performance was measured by using openssl speed utility as
 follows:
 
 $ openssl speed ecdh
 
 
 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH
 op/s:
 
 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||6586.9  ||  67029.6||  10.18   ||
 (nistk233)  ||5121.9  ||  39441.3||   7.70   ||
 (nistk283)  ||2825.7  ||  27718.5||   9.81   ||
 (nistk409)  ||1745.8  ||  11634.2||   6.66   ||
 (nistk571)  || 763.2  ||   5930.9||   7.77   ||
 (nistb163)  ||6382.5  ||  60729.6||   9.52   ||
 (nistb233)  ||4881.9  ||  35230.4||   7.22   ||
 (nistb283)  ||2651.6  ||  24456.4||   9.22   ||
 (nistb409)  ||1640.3  ||  10228.6||   6.24   ||
 (nistb571)  || 693.8  ||   5172.1||   7.45   ||
 |||| ||  ||
 ||||-||--||
 
 
 The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in
 ECDH op/s:
 
 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||3271.5  ||  28087.3||   8.59   ||
 (nistk233)  ||2504.9  ||  15106.0||   6.03   ||
 (nistk283)  ||1317.0  ||   9030.5||   6.86   ||
 (nistk409)  || 772.1  ||   3880.8||   5.03   ||
 (nistk571)  || 327.3  ||   1821.1||   5.56   ||
 (nistb163)  ||3067.9  ||  24357.1||   7.94   ||
 (nistb233)  ||2424.9  ||   3147.3||   5.42   ||
 (nistb283)  ||1227.0  ||   7765.1||   6.33   ||
 (nistb409)  || 709.7  ||   3319.9||   4.68   ||
 (nistb571)  || 296.2  ||   1563.9||   5.28   ||
 |||| ||  ||
 ||||-||--||
 
 
 
 ECDSA performance
 
 --
 
 The performance was measured by using openssl speed utility as
 follows:
 
 $ openssl speed ecdsa
 
 
 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell):
 
 Curve  ||  OpenSSL 1.0.1e ||This patch ||
 Speedup ||
 
 

Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-03 Thread David Jacobson
Let me chime in with an amendment to Audrey's message.  It would be nice 
if the tables included performance numbers for prime modulus curves, 
even if the technique's of Manuel's patch are not applicable there.  
Many people would like to know whether there is significant performance 
gains to be had by switching from GF(p) to GF(2^k) curves.


--David Jacobson

On 9/2/13 11:47 AM, Andrey Kulikov wrote:

Dear Manuel,

Exciting news!
While your paper still unpublished, could you please advice, it there 
anything even nearly similar possible for curves over primary fields?

(e.g. curves secp* )

Best regards,
Andrey


On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org 
mailto:r...@openssl.org wrote:


Hello all,

This patch is a contribution to OpenSSL.

It offers an efficient and constant-time implementation of the
elliptic
curve point multiplication, for the following standard NIST/SECG
binary
elliptic curves:
sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1,
sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1,
sect571k1, and sect571r1.

The patch implements several improvements at the algorithmic and the
coding levels (using SSE/AVX and PCLMULQDQ instructions).

Depending on the curve and architecture, this patch offers a
speedup of
between 4x to 10x for ECDH and ECDSA, compared to the current
implementation of OpenSSL 1.0.1e.
Additionally, it adds side channel protection to avoid (cache) timing
attacks using a number of mechanisms.

The code is written in C and uses compiler intrinsics, for simplicity
and portability. The following results were obtained with gcc 4.8.1.

For detailed explanations of the rationale and algorithms of this code
refer to [1].


ECDH performance
--

The performance was measured by using openssl speed utility as
follows:

$ openssl speed ecdh


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s:

Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
||||-||--||
|||| ||  ||
(nistk163)  ||6586.9  ||  67029.6||  10.18   ||
(nistk233)  ||5121.9  ||  39441.3||   7.70   ||
(nistk283)  ||2825.7  ||  27718.5||   9.81   ||
(nistk409)  ||1745.8  ||  11634.2||   6.66   ||
(nistk571)  || 763.2  ||   5930.9||   7.77   ||
(nistb163)  ||6382.5  ||  60729.6||   9.52   ||
(nistb233)  ||4881.9  ||  35230.4||   7.22   ||
(nistb283)  ||2651.6  ||  24456.4||   9.22   ||
(nistb409)  ||1640.3  ||  10228.6||   6.24   ||
(nistb571)  || 693.8  ||   5172.1||   7.45   ||
|||| ||  ||
||||-||--||


The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s:

Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
||||-||--||
|||| ||  ||
(nistk163)  ||3271.5  ||  28087.3||   8.59   ||
(nistk233)  ||2504.9  ||  15106.0||   6.03   ||
(nistk283)  ||1317.0  ||   9030.5||   6.86   ||
(nistk409)  || 772.1  ||   3880.8||   5.03   ||
(nistk571)  || 327.3  ||   1821.1||   5.56   ||
(nistb163)  ||3067.9  ||  24357.1||   7.94   ||
(nistb233)  ||2424.9  ||   3147.3||   5.42   ||
(nistb283)  ||1227.0  ||   7765.1||   6.33   ||
(nistb409)  || 709.7  ||   3319.9||   4.68   ||
(nistb571)  || 296.2  ||   1563.9||   5.28   ||
|||| ||  ||
||||-||--||



ECDSA performance
--

The performance was measured by using openssl speed utility as
follows:

$ openssl speed ecdsa


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell):

Curve  ||  OpenSSL 1.0.1e ||This patch || Speedup ||
---||-||---||-||
   || sign/s verify/s || sign/s  verify/s  || sign/s
verify/s ||
 ||-||---||-||
(nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 ||  5.70  
 8.39   ||
(nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 ||  7.06  
 6.43   ||
(nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 ||  7.66  
 8.12   ||
(nistk409) ||   977.0   839.1 ||  8,150.0  4,845.0 ||  8.34  
 5.77 

Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors

2013-09-02 Thread Andrey Kulikov
Dear Manuel,

Exciting news!
While your paper still unpublished, could you please advice, it there
anything even nearly similar possible for curves over primary fields?
(e.g. curves secp* )

Best regards,
Andrey


On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote:

 Hello all,

 This patch is a contribution to OpenSSL.

 It offers an efficient and constant-time implementation of the elliptic
 curve point multiplication, for the following standard NIST/SECG binary
 elliptic curves:
 sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1,
 sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1,
 sect571k1, and sect571r1.

 The patch implements several improvements at the algorithmic and the
 coding levels (using SSE/AVX and PCLMULQDQ instructions).

 Depending on the curve and architecture, this patch offers a speedup of
 between 4x to 10x for ECDH and ECDSA, compared to the current
 implementation of OpenSSL 1.0.1e.
 Additionally, it adds side channel protection to avoid (cache) timing
 attacks using a number of mechanisms.

 The code is written in C and uses compiler intrinsics, for simplicity
 and portability. The following results were obtained with gcc 4.8.1.

 For detailed explanations of the rationale and algorithms of this code
 refer to [1].


 ECDH performance
 --

 The performance was measured by using openssl speed utility as follows:

 $ openssl speed ecdh


 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s:

 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||6586.9  ||  67029.6||  10.18   ||
 (nistk233)  ||5121.9  ||  39441.3||   7.70   ||
 (nistk283)  ||2825.7  ||  27718.5||   9.81   ||
 (nistk409)  ||1745.8  ||  11634.2||   6.66   ||
 (nistk571)  || 763.2  ||   5930.9||   7.77   ||
 (nistb163)  ||6382.5  ||  60729.6||   9.52   ||
 (nistb233)  ||4881.9  ||  35230.4||   7.22   ||
 (nistb283)  ||2651.6  ||  24456.4||   9.22   ||
 (nistb409)  ||1640.3  ||  10228.6||   6.24   ||
 (nistb571)  || 693.8  ||   5172.1||   7.45   ||
 |||| ||  ||
 ||||-||--||


 The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s:

 Curve   || OpenSSL 1.0.1e || This patch  || Speedup  ||
 ||||-||--||
 |||| ||  ||
 (nistk163)  ||3271.5  ||  28087.3||   8.59   ||
 (nistk233)  ||2504.9  ||  15106.0||   6.03   ||
 (nistk283)  ||1317.0  ||   9030.5||   6.86   ||
 (nistk409)  || 772.1  ||   3880.8||   5.03   ||
 (nistk571)  || 327.3  ||   1821.1||   5.56   ||
 (nistb163)  ||3067.9  ||  24357.1||   7.94   ||
 (nistb233)  ||2424.9  ||   3147.3||   5.42   ||
 (nistb283)  ||1227.0  ||   7765.1||   6.33   ||
 (nistb409)  || 709.7  ||   3319.9||   4.68   ||
 (nistb571)  || 296.2  ||   1563.9||   5.28   ||
 |||| ||  ||
 ||||-||--||



 ECDSA performance
 --

 The performance was measured by using openssl speed utility as follows:

 $ openssl speed ecdsa


 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell):

 Curve  ||  OpenSSL 1.0.1e ||This patch || Speedup ||
 ---||-||---||-||
|| sign/s verify/s || sign/s  verify/s  || sign/s verify/s ||
||-||---||-||
 (nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 ||  5.708.39   ||
 (nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 ||  7.066.43   ||
 (nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 ||  7.668.12   ||
 (nistk409) ||   977.0   839.1 ||  8,150.0  4,845.0 ||  8.345.77   ||
 (nistk571) ||   466.4   368.3 ||  4,424.1  2,533.6 ||  9.496.88   ||
 (nistb163) || 6,487.3 3,043.9 || 35,110.0 24,904.8 ||  5.418.18   ||
 (nistb233) || 3,279.2 2,348.0 || 21,468.8 14,095.6 ||  6.556.00   ||
 (nistb283) || 2,196.4 1,283.5 || 15,602.7  9,888.5 ||  7.107.70   ||
 (nistb409) ||   976.3   786.9 ||  7,423.1  4,361.9 ||  7.605.54   ||
 (nistb571) ||   466.6   341.0 ||  3,977.0  2,251.6 ||  8.526.60   ||
|| ||   || ||
 ---||-||---||-||


 The results for a Core i5-3210M CPU @ 2.50 GHz (Ivy Bridge):

 Curve