Manuel,

Thank you very much for this data. It is amazing how different the results for ECDH and ECDSA sign/verify are. I would have thought that the ECDSA sign/verify times would be dominated by the point multiplication and hence would be very similar to ECDH, which is basically nothing but point multiplication, but apparently not.

    --David

On 9/3/13 7:33 PM, Manuel Bluhm wrote:
Dear David,

in response to your comment, the numbers below provide a comparison for
the patch, compared to OpenSSL-1.0.1e, on Haswell and Ivy Bridge. The
speedup indicates the different performance of binary and prime curves
of similar bit length.

With this patch, both architectures perform much more ECDH operations
with binary curves. Additionally, more ECDSA sign/verify operations are
achieved on Haswell, and more verifications on Ivy Bridge (but less
signs).


Curves for speed comparison:

GF(p)          GF(2^m)
secp160r1 <->  nist(b,k)163
nistp224  <->  nist(b,k)233
nistp256  <->  nist(b,k)283
nistp384  <->  nist(b,k)409
nistp521  <->  nist(b,k)571


The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) [1]:

./openssl speed ecdh

             ECDH op/s
(secp160r1)  7391.5
(nistp224)  11993.8
(nistp256)   6489.0
(nistp384)   1848.5
(nistp521)   1682.8
Speedup
(nistk163)  67212.4    9.09
(nistk233)  39102.2    3.26
(nistk283)  27586.5    4.25
(nistk409)  11611.2    6.28
(nistk571)   5941.8    3.53
Speedup
(nistb163)  61667.8    8.34
(nistb233)  35246.4    2.94
(nistb283)  24320.7    3.75
(nistb409)  10238.1    5.54
(nistb571)   5158.8    3.07


./openssl speed ecdsa

             SIGN/s/s VERIFY/s
(secp160r1) 21750.8   6029.0
(nistp224)  18393.5   8345.4
(nistp256)  11391.7   4744.9
(nistp384)   6447.4   1566.0
(nistp521)   2949.5   1249.7
SIGN/s VERIFY/s Speedups
(nistk163) 36660.3  26646.7   1.69 4.42
(nistk233) 23142.7  15842.9   1.26 1.90
(nistk283) 16941.7  11059.7   1.49 2.33
(nistk409)  8198.4   4861.4   1.27 3.10
(nistk571)  4446.7   2547.6   1.51 2.04
SIGN/s VERIFY/s Speedups
(nistb163) 34738.8  25113.6  1.60 4.17
(nistb233) 21531.7  14341.8  1.17 1.72
(nistb283) 15635.1  10061.6  1.37 2.12
(nistb409)  7479.5   4390.6  1.16 2.80
(nistb571)  4029.7   2269.6  1.37 1.82


The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) [2]:

./openssl speed ecdh
ECDH op/s
(secp160r1)  4444.3
(nistp224)   7573.1
(nistp256)   3891.5
(nistp384)   1051.9
(nistp521)    971.0
Speedup
(nistk163)  27837.0    6.26
(nistk233)  14946.4    1.97
(nistk283)   9026.5    2.32
(nistk409)   3879.5    3.69
(nistk571)   1822.3    1.88
Speedup
(nistb163)  24043.5   5.41
(nistb233)  13057.0   1.72
(nistb283)   7754.4   1.99
(nistb409)   3319.6   3.16
(nistb571)   1565.0   1.61


./openssl speed ecdsa

              SIGN/s  VERIFY/s
(secp160r1)  12978.6  3671.5
(nistp224)   11196.0  5130.2
(nistp256)    6819.4  2829.3
(nistp384)    3727.5   849.5
(nistp521)    1712.6   723.5
SIGN/s VERIFY/s Speedups
(nistk163)  17794.1  11730.1    1.37  3.19
(nistk233)  10396.8   6450.8    0.93  1.26
(nistk283)   6671.7   3955.1    0.98  1.40
(nistk409)   3148.8   1754.4    0.84  2.07
(nistk571)   1560.1    836.1    0.91  1.16
SIGN/s VERIFY/s Speedups
(nistb163)  16278.4  10452.2    1.25  2.85
(nistb233)   9423.7   5715.4    0.84  1.11
(nistb283)   5987.4   3458.5    0.88  1.22
(nistb409)   2769.2   1523.1    0.74  1.79
(nistb571)   1358.0    724.0    0.79  1.00


The code has been compiled with gcc 4.8.1 and the following
configurations:

  [1]  Core i7-4770  @ 3.40GHz (Haswell):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DFAST_PCLMUL -DOPENSSL_FAST_EC2M

  [2] Core i5-3210M @ 2.50 GHz (Ivy Bridge):

./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native
-DOPENSSL_FAST_EC2M

Best regards,
Manuel
On Mo, 2013-09-02 at 19:57 -0700, David Jacobson wrote:
Let me chime in with an amendment to Audrey's message.  It would be
nice if the tables included performance numbers for prime modulus
curves, even if the technique's of Manuel's patch are not applicable
there.  Many people would like to know whether there is significant
performance gains to be had by switching from GF(p) to GF(2^k) curves.

     --David Jacobson

On 9/2/13 11:47 AM, Andrey Kulikov wrote:

Dear Manuel,


Exciting news!

While your paper still unpublished, could you please advice, it
there anything even nearly similar possible for curves over primary
fields?

(e.g. curves secp* )


Best regards,

Andrey



On 28 August 2013 09:06, Manuel Bluhm via RT <[email protected]> wrote:
         Hello all,
This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of
         the elliptic
         curve point multiplication, for the following standard
         NIST/SECG binary
         elliptic curves:
         sect163k1, sect163r1, sect163r2, sect193r1, sect193r2,
         sect233k1,
         sect233r1, sect239k1, sect283k1, sect283r1, sect409k1,
         sect409r1,
         sect571k1, and sect571r1.
The patch implements several improvements at the algorithmic
         and the
         coding levels (using SSE/AVX and PCLMULQDQ instructions).
Depending on the curve and architecture, this patch offers a
         speedup of
         between 4x to 10x for ECDH and ECDSA, compared to the
         current
         implementation of OpenSSL 1.0.1e.
         Additionally, it adds side channel protection to avoid
         (cache) timing
         attacks using a number of mechanisms.
The code is written in C and uses compiler intrinsics, for
         simplicity
         and portability. The following results were obtained with
         gcc 4.8.1.
For detailed explanations of the rationale and algorithms of
         this code
         refer to [1].
ECDH performance
         
--------------------------------------------------------------------------
The performance was measured by using openssl speed utility
         as follows:
$ openssl speed ecdh The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in
         ECDH op/s:
Curve || OpenSSL 1.0.1e || This patch || Speedup ||
         ------------||----------------||-------------||----------||
                     ||                ||             ||          ||
         (nistk163)  ||    6586.9      ||  67029.6    ||  10.18   ||
         (nistk233)  ||    5121.9      ||  39441.3    ||   7.70   ||
         (nistk283)  ||    2825.7      ||  27718.5    ||   9.81   ||
         (nistk409)  ||    1745.8      ||  11634.2    ||   6.66   ||
         (nistk571)  ||     763.2      ||   5930.9    ||   7.77   ||
         (nistb163)  ||    6382.5      ||  60729.6    ||   9.52   ||
         (nistb233)  ||    4881.9      ||  35230.4    ||   7.22   ||
         (nistb283)  ||    2651.6      ||  24456.4    ||   9.22   ||
         (nistb409)  ||    1640.3      ||  10228.6    ||   6.24   ||
         (nistb571)  ||     693.8      ||   5172.1    ||   7.45   ||
                     ||                ||             ||          ||
         ------------||----------------||-------------||----------||
The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in
         ECDH op/s:
Curve || OpenSSL 1.0.1e || This patch || Speedup ||
         ------------||----------------||-------------||----------||
                     ||                ||             ||          ||
         (nistk163)  ||    3271.5      ||  28087.3    ||   8.59   ||
         (nistk233)  ||    2504.9      ||  15106.0    ||   6.03   ||
         (nistk283)  ||    1317.0      ||   9030.5    ||   6.86   ||
         (nistk409)  ||     772.1      ||   3880.8    ||   5.03   ||
         (nistk571)  ||     327.3      ||   1821.1    ||   5.56   ||
         (nistb163)  ||    3067.9      ||  24357.1    ||   7.94   ||
         (nistb233)  ||    2424.9      ||   3147.3    ||   5.42   ||
         (nistb283)  ||    1227.0      ||   7765.1    ||   6.33   ||
         (nistb409)  ||     709.7      ||   3319.9    ||   4.68   ||
         (nistb571)  ||     296.2      ||   1563.9    ||   5.28   ||
                     ||                ||             ||          ||
         ------------||----------------||-------------||----------||
ECDSA performance
         
--------------------------------------------------------------------------
The performance was measured by using openssl speed utility
         as follows:
$ openssl speed ecdsa The results for a Core i7-4770 CPU @ 3.40GHz (Haswell): Curve || OpenSSL 1.0.1e || This patch ||
         Speedup     ||
         
-----------||-----------------||-------------------||-----------------||
                    || sign/s verify/s || sign/s  verify/s  || sign/s
         verify/s ||
||-----------------||-------------------||-----------------||
         (nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 ||  5.70
            8.39   ||
         (nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 ||  7.06
            6.43   ||
         (nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 ||  7.66
            8.12   ||
         (nistk409) ||   977.0   839.1 ||  8,150.0  4,845.0 ||  8.34
            5.77   ||
         (nistk571) ||   466.4   368.3 ||  4,424.1  2,533.6 ||  9.49
            6.88   ||
         (nistb163) || 6,487.3 3,043.9 || 35,110.0 24,904.8 ||  5.41
            8.18   ||
         (nistb233) || 3,279.2 2,348.0 || 21,468.8 14,095.6 ||  6.55
            6.00   ||
         (nistb283) || 2,196.4 1,283.5 || 15,602.7  9,888.5 ||  7.10
            7.70   ||
         (nistb409) ||   976.3   786.9 ||  7,423.1  4,361.9 ||  7.60
            5.54   ||
         (nistb571) ||   466.6   341.0 ||  3,977.0  2,251.6 ||  8.52
            6.60   ||
                    ||                 ||                   ||
         ||
         
-----------||-----------------||-------------------||-----------------||
The results for a Core i5-3210M CPU @ 2.50 GHz (Ivy Bridge): Curve || OpenSSL 1.0.1e || This patch ||
          Speedup      ||
         
-----------||-----------------||-------------------||-----------------||
                    || sign/s verify/s || sign/s  verify/s  || sign/s
         verify/s ||
||-----------------||-------------------||-----------------||
         (nistk163) || 3,749.9 1,578.6 || 17,721.8 11,688.1 ||  4.73
            7.40   ||
         (nistk233) || 1,881.7 1,211.6 || 10,359.0  6,439.4 ||  5.51
            5.31   ||
         (nistk283) || 1,267.5   639.3 ||  6,688.9  3,951.1 ||  5.28
            6.18   ||
         (nistk409) ||   542.2   361.9 ||  3,140.9  1,757.1 ||  5.79
            4.86   ||
         (nistk571) ||   257.6   159.9 ||  1,556.0    834.6 ||  6.04
            5.22   ||
         (nistb163) || 3,766.5 1,514.5 || 16,203.5 10,453.8 ||  4.30
            6.90   ||
         (nistb233) || 1,893.1 1,150.4 ||  9,386.5  5,711.9 ||  4.96
            4.97   ||
         (nistb283) || 1,265.7   594.2 ||  5,962.3  3,445.5 ||  4.71
            5.80   ||
         (nistb409) ||   539.3   344.2 ||  2,763.4  1,522.4 ||  5.12
            4.42   ||
         (nistb571) ||   257.2   145.7 ||  1,354.8    724.9 ||  5.27
            4.98   ||
                    ||                 ||                   ||
         ||
         
-----------||-----------------||-------------------||-----------------||
Changes to OpenSSL-1.0.1e
         
--------------------------------------------------------------------------
crypto/bn: bn_gf2m_xmm.c : New file, contains XMM GF2m implementation
         bn.h          : Added new function declarations
         bn_gf2m.c     : Added constant time bn operations
         Makefile      : Added bn_gf2m_xmm.c to makefile
crypto/ec: ec2_nist_mult.c: New file, implements Montgomery point
         multiplication
         ec2_nist.c     : New file, implements EC methods
         ec2_nist_prec.c: New file, implements method to get
         precomputated values
ec.h : Added function declarations (ec_methods)
         ec_lcl.h  : Added function declarations (all functions in
         the ec_method)
         ec_curve.c: Added new EC methods to builtin curves
Makefile : Added new files to makefile Configuration flags
         
--------------------------------------------------------------------------
-DOPENSSL_FAST_EC2M : Enable the fast implementation of
         binary curves
         -DFAST_PCLMUL       : Enable the pclmul reduction for
         pentanomial curves
-mpclmul : Enable pclmulqdq
         -msse4        : Enable SSE4
         -mavx         : Enable AVX
         -mavx2        : Enable AVX2
         -march=native : Enable all instruction subsets
The results above have been created with the following
         configurations:
(1) Core i7-4770 @ 3.40GHz (Haswell): ./config -mavx2 -mpclmul -DFAST_PCLMUL
         -DOPENSSL_FAST_EC2M
(2) Core i5-3210M @ 2.50 GHz (Ivy Bridge): ./config -mavx -mpclmul -DOPENSSL_FAST_EC2M [1] M. Bluhm, S. Gueron, Fast Software Implementation of
         Binary Elliptic
         Curve Cryptography (2013; to be published)
Developers and authors:
         
***************************************************************************
         Manuel Bluhm (1) and Shay Gueron (2, 3)
         (1) Ruhr University Bochum, Germany
         (2) Intel Corporation, Israel Development Center, Haifa,
         Israel
         (3) University of Haifa, Israel
         
***************************************************************************

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to