Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Hi, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1, sect571k1, and sect571r1. The patch implements several improvements at the algorithmic and the coding levels (using SSE/AVX and PCLMULQDQ instructions). Cool stuff! This is initial message, primarily for reference. The submission will be carefully examined aiming to cover more platforms than just Intel. More and more processors support polinomial multiplication in hardware, SPARC, ARM, POWER8, ... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Dear David, in response to your comment, the numbers below provide a comparison for the patch, compared to OpenSSL-1.0.1e, on Haswell and Ivy Bridge. The speedup indicates the different performance of binary and prime curves of similar bit length. With this patch, both architectures perform much more ECDH operations with binary curves. Additionally, more ECDSA sign/verify operations are achieved on Haswell, and more verifications on Ivy Bridge (but less signs). Curves for speed comparison: GF(p) GF(2^m) secp160r1 - nist(b,k)163 nistp224 - nist(b,k)233 nistp256 - nist(b,k)283 nistp384 - nist(b,k)409 nistp521 - nist(b,k)571 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) [1]: ./openssl speed ecdh ECDH op/s (secp160r1) 7391.5 (nistp224) 11993.8 (nistp256) 6489.0 (nistp384) 1848.5 (nistp521) 1682.8 Speedup (nistk163) 67212.49.09 (nistk233) 39102.23.26 (nistk283) 27586.54.25 (nistk409) 11611.26.28 (nistk571) 5941.83.53 Speedup (nistb163) 61667.88.34 (nistb233) 35246.42.94 (nistb283) 24320.73.75 (nistb409) 10238.15.54 (nistb571) 5158.83.07 ./openssl speed ecdsa SIGN/s/s VERIFY/s (secp160r1) 21750.8 6029.0 (nistp224) 18393.5 8345.4 (nistp256) 11391.7 4744.9 (nistp384) 6447.4 1566.0 (nistp521) 2949.5 1249.7 SIGN/s VERIFY/s Speedups (nistk163) 36660.3 26646.7 1.69 4.42 (nistk233) 23142.7 15842.9 1.26 1.90 (nistk283) 16941.7 11059.7 1.49 2.33 (nistk409) 8198.4 4861.4 1.27 3.10 (nistk571) 4446.7 2547.6 1.51 2.04 SIGN/s VERIFY/s Speedups (nistb163) 34738.8 25113.6 1.60 4.17 (nistb233) 21531.7 14341.8 1.17 1.72 (nistb283) 15635.1 10061.6 1.37 2.12 (nistb409) 7479.5 4390.6 1.16 2.80 (nistb571) 4029.7 2269.6 1.37 1.82 The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) [2]: ./openssl speed ecdh ECDH op/s (secp160r1) .3 (nistp224) 7573.1 (nistp256) 3891.5 (nistp384) 1051.9 (nistp521)971.0 Speedup (nistk163) 27837.06.26 (nistk233) 14946.41.97 (nistk283) 9026.52.32 (nistk409) 3879.53.69 (nistk571) 1822.31.88 Speedup (nistb163) 24043.5 5.41 (nistb233) 13057.0 1.72 (nistb283) 7754.4 1.99 (nistb409) 3319.6 3.16 (nistb571) 1565.0 1.61 ./openssl speed ecdsa SIGN/s VERIFY/s (secp160r1) 12978.6 3671.5 (nistp224) 11196.0 5130.2 (nistp256)6819.4 2829.3 (nistp384)3727.5 849.5 (nistp521)1712.6 723.5 SIGN/s VERIFY/sSpeedups (nistk163) 17794.1 11730.11.37 3.19 (nistk233) 10396.8 6450.80.93 1.26 (nistk283) 6671.7 3955.10.98 1.40 (nistk409) 3148.8 1754.40.84 2.07 (nistk571) 1560.1836.10.91 1.16 SIGN/s VERIFY/sSpeedups (nistb163) 16278.4 10452.21.25 2.85 (nistb233) 9423.7 5715.40.84 1.11 (nistb283) 5987.4 3458.50.88 1.22 (nistb409) 2769.2 1523.10.74 1.79 (nistb571) 1358.0724.00.79 1.00 The code has been compiled with gcc 4.8.1 and the following configurations: [1] Core i7-4770 @ 3.40GHz (Haswell): ./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native -DFAST_PCLMUL -DOPENSSL_FAST_EC2M [2] Core i5-3210M @ 2.50 GHz (Ivy Bridge): ./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native -DOPENSSL_FAST_EC2M Best regards, Manuel On Mo, 2013-09-02 at 19:57 -0700, David Jacobson wrote: Let me chime in with an amendment to Audrey's message. It would be nice if the tables included performance numbers for prime modulus curves, even if the technique's of Manuel's patch are not applicable there. Many people would like to know whether there is significant performance gains to be had by switching from GF(p) to GF(2^k) curves. --David Jacobson On 9/2/13 11:47 AM, Andrey Kulikov wrote: Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1,
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Manuel, Thank you very much for this data. It is amazing how different the results for ECDH and ECDSA sign/verify are. I would have thought that the ECDSA sign/verify times would be dominated by the point multiplication and hence would be very similar to ECDH, which is basically nothing but point multiplication, but apparently not. --David On 9/3/13 7:33 PM, Manuel Bluhm wrote: Dear David, in response to your comment, the numbers below provide a comparison for the patch, compared to OpenSSL-1.0.1e, on Haswell and Ivy Bridge. The speedup indicates the different performance of binary and prime curves of similar bit length. With this patch, both architectures perform much more ECDH operations with binary curves. Additionally, more ECDSA sign/verify operations are achieved on Haswell, and more verifications on Ivy Bridge (but less signs). Curves for speed comparison: GF(p) GF(2^m) secp160r1 - nist(b,k)163 nistp224 - nist(b,k)233 nistp256 - nist(b,k)283 nistp384 - nist(b,k)409 nistp521 - nist(b,k)571 The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) [1]: ./openssl speed ecdh ECDH op/s (secp160r1) 7391.5 (nistp224) 11993.8 (nistp256) 6489.0 (nistp384) 1848.5 (nistp521) 1682.8 Speedup (nistk163) 67212.49.09 (nistk233) 39102.23.26 (nistk283) 27586.54.25 (nistk409) 11611.26.28 (nistk571) 5941.83.53 Speedup (nistb163) 61667.88.34 (nistb233) 35246.42.94 (nistb283) 24320.73.75 (nistb409) 10238.15.54 (nistb571) 5158.83.07 ./openssl speed ecdsa SIGN/s/s VERIFY/s (secp160r1) 21750.8 6029.0 (nistp224) 18393.5 8345.4 (nistp256) 11391.7 4744.9 (nistp384) 6447.4 1566.0 (nistp521) 2949.5 1249.7 SIGN/s VERIFY/s Speedups (nistk163) 36660.3 26646.7 1.69 4.42 (nistk233) 23142.7 15842.9 1.26 1.90 (nistk283) 16941.7 11059.7 1.49 2.33 (nistk409) 8198.4 4861.4 1.27 3.10 (nistk571) 4446.7 2547.6 1.51 2.04 SIGN/s VERIFY/s Speedups (nistb163) 34738.8 25113.6 1.60 4.17 (nistb233) 21531.7 14341.8 1.17 1.72 (nistb283) 15635.1 10061.6 1.37 2.12 (nistb409) 7479.5 4390.6 1.16 2.80 (nistb571) 4029.7 2269.6 1.37 1.82 The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) [2]: ./openssl speed ecdh ECDH op/s (secp160r1) .3 (nistp224) 7573.1 (nistp256) 3891.5 (nistp384) 1051.9 (nistp521)971.0 Speedup (nistk163) 27837.06.26 (nistk233) 14946.41.97 (nistk283) 9026.52.32 (nistk409) 3879.53.69 (nistk571) 1822.31.88 Speedup (nistb163) 24043.5 5.41 (nistb233) 13057.0 1.72 (nistb283) 7754.4 1.99 (nistb409) 3319.6 3.16 (nistb571) 1565.0 1.61 ./openssl speed ecdsa SIGN/s VERIFY/s (secp160r1) 12978.6 3671.5 (nistp224) 11196.0 5130.2 (nistp256)6819.4 2829.3 (nistp384)3727.5 849.5 (nistp521)1712.6 723.5 SIGN/s VERIFY/sSpeedups (nistk163) 17794.1 11730.11.37 3.19 (nistk233) 10396.8 6450.80.93 1.26 (nistk283) 6671.7 3955.10.98 1.40 (nistk409) 3148.8 1754.40.84 2.07 (nistk571) 1560.1836.10.91 1.16 SIGN/s VERIFY/sSpeedups (nistb163) 16278.4 10452.21.25 2.85 (nistb233) 9423.7 5715.40.84 1.11 (nistb283) 5987.4 3458.50.88 1.22 (nistb409) 2769.2 1523.10.74 1.79 (nistb571) 1358.0724.00.79 1.00 The code has been compiled with gcc 4.8.1 and the following configurations: [1] Core i7-4770 @ 3.40GHz (Haswell): ./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native -DFAST_PCLMUL -DOPENSSL_FAST_EC2M [2] Core i5-3210M @ 2.50 GHz (Ivy Bridge): ./Configure linux-x86_64 enable-ec_nistp_64_gcc_128 -march=native -DOPENSSL_FAST_EC2M Best regards, Manuel On Mo, 2013-09-02 at 19:57 -0700, David Jacobson wrote: Let me chime in with an amendment to Audrey's message. It would be nice if the tables included performance numbers for prime modulus curves, even if the technique's of Manuel's patch are not applicable there. Many people would like to know whether there is significant performance gains to be had by switching from GF(p) to GF(2^k) curves. --David Jacobson On 9/2/13 11:47 AM, Andrey Kulikov wrote: Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Dear Andrey, the scope of this work is limited to binary curves only. However, there might be ways to speed up prime curves with vector instructions, but not the same way as we did for binary curves. Most of the performance gain comes from the fast binary field arithmetic implementation, which is different from the prime field arithmetic. Best regards, Manuel On Mo, 2013-09-02 at 22:47 +0400, Andrey Kulikov wrote: Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1, sect571k1, and sect571r1. The patch implements several improvements at the algorithmic and the coding levels (using SSE/AVX and PCLMULQDQ instructions). Depending on the curve and architecture, this patch offers a speedup of between 4x to 10x for ECDH and ECDSA, compared to the current implementation of OpenSSL 1.0.1e. Additionally, it adds side channel protection to avoid (cache) timing attacks using a number of mechanisms. The code is written in C and uses compiler intrinsics, for simplicity and portability. The following results were obtained with gcc 4.8.1. For detailed explanations of the rationale and algorithms of this code refer to [1]. ECDH performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdh The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||6586.9 || 67029.6|| 10.18 || (nistk233) ||5121.9 || 39441.3|| 7.70 || (nistk283) ||2825.7 || 27718.5|| 9.81 || (nistk409) ||1745.8 || 11634.2|| 6.66 || (nistk571) || 763.2 || 5930.9|| 7.77 || (nistb163) ||6382.5 || 60729.6|| 9.52 || (nistb233) ||4881.9 || 35230.4|| 7.22 || (nistb283) ||2651.6 || 24456.4|| 9.22 || (nistb409) ||1640.3 || 10228.6|| 6.24 || (nistb571) || 693.8 || 5172.1|| 7.45 || |||| || || ||||-||--|| The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||3271.5 || 28087.3|| 8.59 || (nistk233) ||2504.9 || 15106.0|| 6.03 || (nistk283) ||1317.0 || 9030.5|| 6.86 || (nistk409) || 772.1 || 3880.8|| 5.03 || (nistk571) || 327.3 || 1821.1|| 5.56 || (nistb163) ||3067.9 || 24357.1|| 7.94 || (nistb233) ||2424.9 || 3147.3|| 5.42 || (nistb283) ||1227.0 || 7765.1|| 6.33 || (nistb409) || 709.7 || 3319.9|| 4.68 || (nistb571) || 296.2 || 1563.9|| 5.28 || |||| || || ||||-||--|| ECDSA performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdsa The results for a Core i7-4770 CPU @
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Dear Andrey, the scope of this work is limited to binary curves only, however, there might be ways to speed up prime curves with vector instructions. Best regards, Manuel On Mo, 2013-09-02 at 22:47 +0400, Andrey Kulikov wrote: Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1, sect571k1, and sect571r1. The patch implements several improvements at the algorithmic and the coding levels (using SSE/AVX and PCLMULQDQ instructions). Depending on the curve and architecture, this patch offers a speedup of between 4x to 10x for ECDH and ECDSA, compared to the current implementation of OpenSSL 1.0.1e. Additionally, it adds side channel protection to avoid (cache) timing attacks using a number of mechanisms. The code is written in C and uses compiler intrinsics, for simplicity and portability. The following results were obtained with gcc 4.8.1. For detailed explanations of the rationale and algorithms of this code refer to [1]. ECDH performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdh The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||6586.9 || 67029.6|| 10.18 || (nistk233) ||5121.9 || 39441.3|| 7.70 || (nistk283) ||2825.7 || 27718.5|| 9.81 || (nistk409) ||1745.8 || 11634.2|| 6.66 || (nistk571) || 763.2 || 5930.9|| 7.77 || (nistb163) ||6382.5 || 60729.6|| 9.52 || (nistb233) ||4881.9 || 35230.4|| 7.22 || (nistb283) ||2651.6 || 24456.4|| 9.22 || (nistb409) ||1640.3 || 10228.6|| 6.24 || (nistb571) || 693.8 || 5172.1|| 7.45 || |||| || || ||||-||--|| The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||3271.5 || 28087.3|| 8.59 || (nistk233) ||2504.9 || 15106.0|| 6.03 || (nistk283) ||1317.0 || 9030.5|| 6.86 || (nistk409) || 772.1 || 3880.8|| 5.03 || (nistk571) || 327.3 || 1821.1|| 5.56 || (nistb163) ||3067.9 || 24357.1|| 7.94 || (nistb233) ||2424.9 || 3147.3|| 5.42 || (nistb283) ||1227.0 || 7765.1|| 6.33 || (nistb409) || 709.7 || 3319.9|| 4.68 || (nistb571) || 296.2 || 1563.9|| 5.28 || |||| || || ||||-||--|| ECDSA performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdsa The results for a Core i7-4770 CPU @ 3.40GHz (Haswell): Curve || OpenSSL 1.0.1e ||This patch || Speedup ||
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Let me chime in with an amendment to Audrey's message. It would be nice if the tables included performance numbers for prime modulus curves, even if the technique's of Manuel's patch are not applicable there. Many people would like to know whether there is significant performance gains to be had by switching from GF(p) to GF(2^k) curves. --David Jacobson On 9/2/13 11:47 AM, Andrey Kulikov wrote: Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org mailto:r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1, sect571k1, and sect571r1. The patch implements several improvements at the algorithmic and the coding levels (using SSE/AVX and PCLMULQDQ instructions). Depending on the curve and architecture, this patch offers a speedup of between 4x to 10x for ECDH and ECDSA, compared to the current implementation of OpenSSL 1.0.1e. Additionally, it adds side channel protection to avoid (cache) timing attacks using a number of mechanisms. The code is written in C and uses compiler intrinsics, for simplicity and portability. The following results were obtained with gcc 4.8.1. For detailed explanations of the rationale and algorithms of this code refer to [1]. ECDH performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdh The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||6586.9 || 67029.6|| 10.18 || (nistk233) ||5121.9 || 39441.3|| 7.70 || (nistk283) ||2825.7 || 27718.5|| 9.81 || (nistk409) ||1745.8 || 11634.2|| 6.66 || (nistk571) || 763.2 || 5930.9|| 7.77 || (nistb163) ||6382.5 || 60729.6|| 9.52 || (nistb233) ||4881.9 || 35230.4|| 7.22 || (nistb283) ||2651.6 || 24456.4|| 9.22 || (nistb409) ||1640.3 || 10228.6|| 6.24 || (nistb571) || 693.8 || 5172.1|| 7.45 || |||| || || ||||-||--|| The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||3271.5 || 28087.3|| 8.59 || (nistk233) ||2504.9 || 15106.0|| 6.03 || (nistk283) ||1317.0 || 9030.5|| 6.86 || (nistk409) || 772.1 || 3880.8|| 5.03 || (nistk571) || 327.3 || 1821.1|| 5.56 || (nistb163) ||3067.9 || 24357.1|| 7.94 || (nistb233) ||2424.9 || 3147.3|| 5.42 || (nistb283) ||1227.0 || 7765.1|| 6.33 || (nistb409) || 709.7 || 3319.9|| 4.68 || (nistb571) || 296.2 || 1563.9|| 5.28 || |||| || || ||||-||--|| ECDSA performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdsa The results for a Core i7-4770 CPU @ 3.40GHz (Haswell): Curve || OpenSSL 1.0.1e ||This patch || Speedup || ---||-||---||-|| || sign/s verify/s || sign/s verify/s || sign/s verify/s || ||-||---||-|| (nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 || 5.70 8.39 || (nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 || 7.06 6.43 || (nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 || 7.66 8.12 || (nistk409) || 977.0 839.1 || 8,150.0 4,845.0 || 8.34 5.77
Re: [openssl.org #3117] [PATCH] A fast vectorized implementation of binary elliptic curves on x86-64 processors
Dear Manuel, Exciting news! While your paper still unpublished, could you please advice, it there anything even nearly similar possible for curves over primary fields? (e.g. curves secp* ) Best regards, Andrey On 28 August 2013 09:06, Manuel Bluhm via RT r...@openssl.org wrote: Hello all, This patch is a contribution to OpenSSL. It offers an efficient and constant-time implementation of the elliptic curve point multiplication, for the following standard NIST/SECG binary elliptic curves: sect163k1, sect163r1, sect163r2, sect193r1, sect193r2, sect233k1, sect233r1, sect239k1, sect283k1, sect283r1, sect409k1, sect409r1, sect571k1, and sect571r1. The patch implements several improvements at the algorithmic and the coding levels (using SSE/AVX and PCLMULQDQ instructions). Depending on the curve and architecture, this patch offers a speedup of between 4x to 10x for ECDH and ECDSA, compared to the current implementation of OpenSSL 1.0.1e. Additionally, it adds side channel protection to avoid (cache) timing attacks using a number of mechanisms. The code is written in C and uses compiler intrinsics, for simplicity and portability. The following results were obtained with gcc 4.8.1. For detailed explanations of the rationale and algorithms of this code refer to [1]. ECDH performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdh The results for a Core i7-4770 CPU @ 3.40GHz (Haswell) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||6586.9 || 67029.6|| 10.18 || (nistk233) ||5121.9 || 39441.3|| 7.70 || (nistk283) ||2825.7 || 27718.5|| 9.81 || (nistk409) ||1745.8 || 11634.2|| 6.66 || (nistk571) || 763.2 || 5930.9|| 7.77 || (nistb163) ||6382.5 || 60729.6|| 9.52 || (nistb233) ||4881.9 || 35230.4|| 7.22 || (nistb283) ||2651.6 || 24456.4|| 9.22 || (nistb409) ||1640.3 || 10228.6|| 6.24 || (nistb571) || 693.8 || 5172.1|| 7.45 || |||| || || ||||-||--|| The results for a Core i5-3210M @ 2.50 GHz (Ivy Bridge) in ECDH op/s: Curve || OpenSSL 1.0.1e || This patch || Speedup || ||||-||--|| |||| || || (nistk163) ||3271.5 || 28087.3|| 8.59 || (nistk233) ||2504.9 || 15106.0|| 6.03 || (nistk283) ||1317.0 || 9030.5|| 6.86 || (nistk409) || 772.1 || 3880.8|| 5.03 || (nistk571) || 327.3 || 1821.1|| 5.56 || (nistb163) ||3067.9 || 24357.1|| 7.94 || (nistb233) ||2424.9 || 3147.3|| 5.42 || (nistb283) ||1227.0 || 7765.1|| 6.33 || (nistb409) || 709.7 || 3319.9|| 4.68 || (nistb571) || 296.2 || 1563.9|| 5.28 || |||| || || ||||-||--|| ECDSA performance -- The performance was measured by using openssl speed utility as follows: $ openssl speed ecdsa The results for a Core i7-4770 CPU @ 3.40GHz (Haswell): Curve || OpenSSL 1.0.1e ||This patch || Speedup || ---||-||---||-|| || sign/s verify/s || sign/s verify/s || sign/s verify/s || ||-||---||-|| (nistk163) || 6,465.3 3,159.5 || 36,872.6 26,508.4 || 5.708.39 || (nistk233) || 3,259.2 2,419.8 || 22,998.4 15,557.1 || 7.066.43 || (nistk283) || 2,204.7 1,355.7 || 16,884.9 11,003.2 || 7.668.12 || (nistk409) || 977.0 839.1 || 8,150.0 4,845.0 || 8.345.77 || (nistk571) || 466.4 368.3 || 4,424.1 2,533.6 || 9.496.88 || (nistb163) || 6,487.3 3,043.9 || 35,110.0 24,904.8 || 5.418.18 || (nistb233) || 3,279.2 2,348.0 || 21,468.8 14,095.6 || 6.556.00 || (nistb283) || 2,196.4 1,283.5 || 15,602.7 9,888.5 || 7.107.70 || (nistb409) || 976.3 786.9 || 7,423.1 4,361.9 || 7.605.54 || (nistb571) || 466.6 341.0 || 3,977.0 2,251.6 || 8.526.60 || || || || || ---||-||---||-|| The results for a Core i5-3210M CPU @ 2.50 GHz (Ivy Bridge): Curve