Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > I think this is complete now (except updating hogweed-benchmark), just > pushed to the ed448 branch. Thanks for the patience. It seems I forgot to add the new files in the first attempt. Ooops. Fixed with a forced update on this branch. Now ubsan fails, it doesn't like calling memcpy with null ptr and zero size. I have to fix that. Not sure what's prettiest, either a conditional, or a function pointer like it was done in the original patch. In the mean time, I've updated hogweed benchmark (not yet committed). This is what I get on my old (x86_64) laptop: name size sign/ms verify/ms ecdsa 1925.21941.7914 ecdsa 2243.22971.1658 ecdsa 2563.11531.0347 ecdsa 3841.43930.4607 ecdsa 5210.72770.2308 eddsa 2556.02431.5598 eddsa 4481.74640.4595 So speedwise, ed25519 is comparable to ecdsa on secp_192r1, and ed448 is comparable to ecdsa on secp_384r1. In both cases, signing is a bit faster (15%-20%), and verify is the same or a bit slower. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > Thank you very much for all the Curve448/SHAKE256 work for merging (I'm > slowly catching up). I think this is complete now (except updating hogweed-benchmark), just pushed to the ed448 branch. Thanks for the patience. >> These corner cases are a bit hard to test. > > For what it's worth, the original issue was reliably reproducible with > the GnuTLS port[1] against the OpenSSL client. Here is a test vector > extracted from the interaction: I'm afraid this doesn't exercise the corner cases. The thing is, we have q close to 2^k (k = 2^252 for ed25519, k = 446 for ed448). Then we want to reduce r = hi 2^k + lo modulo q, canonically. If we set r' = r - hi * q then it's highly likely that 0 <= r' < q, but not certain. For ed25519, q > 2^k, so we are guaranteed that r' < 2^k < q, but we may get r' < 0. For ed448, q < 2^k, so we are guaranteed that r' > 0, and we may instead get r' >= q. For now, I've added the following logic to _eddsa_sign: if (ecc->p.bit_size == 255) { /* FIXME: Special code duplicated in ecc_25519_modq Define a suitable method for canonical reduction? */ /* q is slightly larger than 2^252, underflow from below mpn_submul_1 is unlikely. */ unsigned shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1); q = sp[ecc->p.size-1] >> shift; } else { unsigned shift; assert (ecc->p.bit_size == 448); /* q is slightly smaller than 2^446 */ shift = 446 - GMP_NUMB_BITS * (ecc->p.size - 1); /* Add one, then it's possible but unlikely that below mpn_submul_1 does *not* underflow. */ q = (sp[ecc->p.size-1] >> shift) + 1; } cy = mpn_submul_1 (sp, ecc->q.m, ecc->p.size, q); assert (cy < 2); cy -= cnd_add_n (cy, sp, ecc->q.m, ecc->p.size); assert (cy == 0); I think that's correct, but it seems tricky to find inputs to _eddsa_sign that will hit the corner cases. I've added some debug printouts to verify that mpn_submul_1 returns 0 for the ed25519 testcases, and 1 for all the ed448 testcases. If it's taken out to a separate function/method, then it gets easier to unit test. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Hello Niels, Thank you very much for all the Curve448/SHAKE256 work for merging (I'm slowly catching up). ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> Daiki Ueno writes: >> >>> For curve25519, q is defined as: >>> >>> 2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed >>> >>> whose bit pattern starts with 0x1000, so r - q * (r>>252) should >>> work. >>> >>> On the other hand, for curve448, q is defined as: >>> >>> 2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d >>> >>> whose bit pattern starts with 0x. In that case the formula (r - q * >>> (r>>445)) could be incorrect due to the accumulated errors by >>> multiplication (i.e. q * 0x7FFF...). >> >> Good catch! Right, this needs a bit more analysis. Fur curve25519, the >> subtraction can underflow (unlikely), which is addressed with the >> conditional addition a few lines down. > > For ecc_ah_to_a, this code was deleted, but it's still an issue for > eddsa_sign. Maybe need special cases for both ed25519 and ed448 for now. > Or some logic looking at the high limb of q. > > These corner cases are a bit hard to test. For what it's worth, the original issue was reliably reproducible with the GnuTLS port[1] against the OpenSSL client. Here is a test vector extracted from the interaction: test_one ("0cf87eb094bf46d161bde3b99d1d32856fecfae0142392cd98c091db206d174bbf8ef476a9cf746d94306c565f97ac50796f021eff8d779ca5" "9addde61f668f2dbc0ac24874adb47a2aa6ad59fa888bdc5d430705ed0796a8c330782b51860785be63fd79b1c7cf58fd728b2bf3d77395100:" "9addde61f668f2dbc0ac24874adb47a2aa6ad59fa888bdc5d430705ed0796a8c330782b51860785be63fd79b1c7cf58fd728b2bf3d77395100:" "20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020544c5320312e332c207365727665722043657274696669636174655665726966790090da2c6178a3019274ed029ba5ad28f25662a78d71e8429c19f96007df39d7a77d7cb80f221c76db5e1c397714f48692:" "91554b9b85058d3d6885997adf47e1f766ae780018ca26873de854fb12d789f3bf1f85d3ce5b23265d8d8900f62906e2eb4a064887beaf9cea26f0edeff35be1e969df77ab1368ced966beb0c7b6242aa0d8844d773e254cfed823d3a5e53b3ef557e716ce7cc2aaca127e86798f2b00" "20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020544c5320312e332c207365727665722043657274696669636174655665726966790090da2c6178a3019274ed029ba5ad28f25662a78d71e8429c19f96007df39d7a77d7cb80f221c76db5e1c397714f48692:"); >> It might make sense to instead add a function pointer to struct >> ecc_modulo to do canonical reduction; that's needed in a few different >> places, not only here. > > I still think think this makes sense, but it's not clear to me what the > usage really is. Regards, Footnotes: [1] https://gitlab.com/gnutls/gnutls/merge_requests/984 -- Daiki Ueno ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > Daiki Ueno writes: > >> For curve25519, q is defined as: >> >> 2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed >> >> whose bit pattern starts with 0x1000, so r - q * (r>>252) should >> work. >> >> On the other hand, for curve448, q is defined as: >> >> 2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d >> >> whose bit pattern starts with 0x. In that case the formula (r - q * >> (r>>445)) could be incorrect due to the accumulated errors by >> multiplication (i.e. q * 0x7FFF...). > > Good catch! Right, this needs a bit more analysis. Fur curve25519, the > subtraction can underflow (unlikely), which is addressed with the > conditional addition a few lines down. For ecc_ah_to_a, this code was deleted, but it's still an issue for eddsa_sign. Maybe need special cases for both ed25519 and ed448 for now. Or some logic looking at the high limb of q. These corner cases are a bit hard to test. > It might make sense to instead add a function pointer to struct > ecc_modulo to do canonical reduction; that's needed in a few different > places, not only here. I still think think this makes sense, but it's not clear to me what the usage really is. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > Implement SHAKE128/256 functions I've merged a stripped-down version of this patch, introducing a single new function sha3_256_shake. If I've understood it correctly, that's what is needed for ed448 signatures. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > I don't understand the gnutls failure. I've logged in to gitlab and > pressed the "Retry"-button. If it keeps failing, I'll need some help > investigating. The retry passed. Merged this deletion (ecdsa over curve25519) to master now. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> ni...@lysator.liu.se (Niels Möller) writes: >> >>> I'm considering the below patch. I think there's room for further >>> improvement, maybe splitting the h_to_a method up (it's called with op >>> == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe >>> adding a some ecc_mod_canonical function. But deleting this unneeded >>> code right away seems like an improvement in itself. >> >> Now pushed to master-updates. > > Failed the gnutls test "testcompat-openssl", > > ### > # Server mode tests (gnutls server-openssl cli# > ### > Check TLS 1.0 with DHE-RSA ciphersuite > %NO_ETM: Check TLS 1.0 with DHE-RSA ciphersuite > %DISABLE_SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite > %NO_TICKETS: Check TLS 1.0 with DHE-RSA ciphersuite > %COMPAT: Check TLS 1.0 with DHE-RSA ciphersuite > %SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite > HTTP Server listening on IPv4 0.0.0.0 port 18033...done > HTTP Server listening on IPv6 :: port 18033...done > HTTP Server listening on IPv4 0.0.0.0 port 22536...done > HTTP Server listening on IPv6 :: port 22536...done > HTTP Server listening on IPv4 0.0.0.0 port 15595...done > HTTP Server listening on IPv6 :: port 15595...done > HTTP Server listening on IPv4 0.0.0.0 port 22743...done > HTTP Server listening on IPv6 :: port 22743...done > HTTP Server listening on IPv4 0.0.0.0 port 10935...done > HTTP Server listening on IPv6 :: port 10935...done > HTTP Server listening on IPv4 0.0.0.0 port 43747...done > HTTP Server listening on IPv6 :: port 43747...done > Exiting via signal 15 > > Maybe unrelated to this change? I don't understand the gnutls failure. I've logged in to gitlab and pressed the "Retry"-button. If it keeps failing, I'll need some help investigating. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001 > From: Daiki Ueno > Date: Fri, 23 Jun 2017 17:26:18 +0200 > Subject: [PATCH 2/4] Implement Curve448 primitives > > This patch adds the necessary primitives for "curve448", defined in > RFC 7748. Those primitives are namely: addition, doubling, scalar > multiplication of the generator or an arbitrary point, inversion, and > square root. [...] > +/* Computes a^{(p-3)/4} = a^{2^446-2^222-1} mod m. Needs 9 * n scratch > + space. */ > +static void > +ecc_mod_pow_446m224m1 (const struct ecc_modulo *p, > +mp_limb_t *rp, const mp_limb_t *ap, > +mp_limb_t *scratch) > +{ > +#define t0 scratch > +#define t1 (scratch + 3*ECC_LIMB_SIZE) > +#define t2 (scratch + 6*ECC_LIMB_SIZE) I think 6*n scratch space should be enough (with no other changes to this function), #define t0 scratch #define t1 (scratch + 2*ECC_LIMB_SIZE) #define t2 (scratch + 4*ECC_LIMB_SIZE) (And it could possibly be trimmed down a bit further, by storing the reused value a^{2^222 - 1} first). Do you agree? Then storage for a few other things can likely be trimmed down too, in particular, curve448_mul would get the same scratch need as curve25519_mul, 12*n rather than 14*n. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > I've tried out this mod function (for 64-bit): > > static void > ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp) ... > This gives a speedup of 85% over the general ecc_mod (on my machine), > and gives about 35% speedup for scalar multiplication (both mul_g and > mul_a). So with this change, performance of mul_g and mul_1 is roughly > midway between secp384 and secp521. Tried the below first implementation of an x86_64 mod function. Gives a speedup of almost three times over the above C function. With this, the mul_g operation is 20% slower than for secp384, and the mul_a operation is slightly faster. Rgards, /Niels diff --git a/configure.ac b/configure.ac index 3547cae4..2933facf 100644 --- a/configure.ac +++ b/configure.ac @@ -476,7 +476,8 @@ asm_nettle_optional_list="gcm-hash8.asm cpuid.asm \ asm_hogweed_optional_list="" if test "x$enable_public_key" = "xyes" ; then asm_hogweed_optional_list="ecc-192-modp.asm ecc-224-modp.asm \ -ecc-25519-modp.asm ecc-256-redc.asm ecc-384-modp.asm ecc-521-modp.asm" +ecc-256-redc.asm ecc-384-modp.asm ecc-521-modp.asm \ +ecc-25519-modp.asm ecc-curve448-modp.asm" fi OPT_NETTLE_OBJS="" @@ -580,6 +581,7 @@ AH_VERBATIM([HAVE_NATIVE], #undef HAVE_NATIVE_ecc_256_redc #undef HAVE_NATIVE_ecc_384_modp #undef HAVE_NATIVE_ecc_384_redc +#undef HAVE_NATIVE_ecc_curve448_modp #undef HAVE_NATIVE_ecc_521_modp #undef HAVE_NATIVE_ecc_521_redc #undef HAVE_NATIVE_gcm_hash8 diff --git a/ecc-448.c b/ecc-448.c index 7d68e1c8..2e840024 100644 --- a/ecc-448.c +++ b/ecc-448.c @@ -45,7 +45,11 @@ #include "ecc-448.h" -#if GMP_NUMB_BITS == 64 +#if HAVE_NATIVE_ecc_curve448_modp +#define ecc_448_modp nettle_ecc_curve448_modp +void +ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp); +#elif GMP_NUMB_BITS == 64 static void ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp) { diff --git a/x86_64/ecc-curve448-modp.asm b/x86_64/ecc-curve448-modp.asm new file mode 100644 index ..5ce81960 --- /dev/null +++ b/x86_64/ecc-curve448-modp.asm @@ -0,0 +1,141 @@ +C x86_64/ecc-curve448-modp.asm + +ifelse(< + Copyright (C) 2019 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +>) + + .file "ecc-curve448-modp.asm" + +define(, <%rsi>) +define(, <%rax>) +define(, <%rbx>) +define(, <%rcx>) +define(, <%rdx>) +define(, <%rbp>) +define(, <%rdi>) +define(, <%r8>) +define(, <%r9>) +define(, <%r10>) +define(, <%r11>) +define(, <%r12>) + +PROLOGUE(nettle_ecc_curve448_modp) + W64_ENTRY(2, 0) + + push%rbx + push%rbp + push%r12 + + C First load the values to be shifted by 32. + mov 88(RP), X1 + mov X1, X0 + mov 96(RP), X2 + mov X1, T0 + mov 104(RP), X3 + mov X2, T1 + mov 56(RP), X4 + mov X3, T2 + mov 64(RP), X5 + mov 72(RP), X6 + mov 80(RP), X7 + + C Multiply by 2^32 + shl $32, X0 + shrd $32, X2, X1 + shrd $32, X3, X2 + shrd $32, X4, X3 + shrd $32, X5, X4 + shrd $32, X6, X5 + shrd $32, X7, X6 + shr $32, X7 + + C Multiply by 2 + add T0, T0 + adc T1, T1 + adc T2, T2 + adc $0, X7 + + C Main additions + add 56(RP), X0 + adc 64(RP), X1 + adc 72(RP), X2 + adc 80(RP), X3 + adc T0, X4 + adc T1, X5 + adc T2, X6 + adc $0, X7 + + add (RP), X0 + adc 8(RP), X1 + adc 16(RP), X2 + adc 24(RP), X3 + adc 32(RP), X4 + adc 40(RP), X5 + adc 48(RP), X6 + adc $0, X7 + + mov X7, T0 + mov X7, T1 + shl $32, T0 + shr $32, T1 + xor T2, T2 + add X7, X0 + adc $0, X1 + adc $0, X2 + adc T0, X3 + adc T1, X4 + adc $0, X5 + adc $0, X6 + adc $0, T2 + + mov T2, T0 + shl $32, T0 + + add T2, X0 + mov X0, (RP) + adc $0, X1 + mov X1, 8(RP) + adc $0, X2 + mov X2, 16(RP) + adc T0, X3 + mov X3, 24(RP)
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> I'm considering the below patch. I think there's room for further >> improvement, maybe splitting the h_to_a method up (it's called with op >> == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe >> adding a some ecc_mod_canonical function. But deleting this unneeded >> code right away seems like an improvement in itself. > > Now pushed to master-updates. Failed the gnutls test "testcompat-openssl", ### # Server mode tests (gnutls server-openssl cli# ### Check TLS 1.0 with DHE-RSA ciphersuite %NO_ETM: Check TLS 1.0 with DHE-RSA ciphersuite %DISABLE_SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite %NO_TICKETS: Check TLS 1.0 with DHE-RSA ciphersuite %COMPAT: Check TLS 1.0 with DHE-RSA ciphersuite %SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite HTTP Server listening on IPv4 0.0.0.0 port 18033...done HTTP Server listening on IPv6 :: port 18033...done HTTP Server listening on IPv4 0.0.0.0 port 22536...done HTTP Server listening on IPv6 :: port 22536...done HTTP Server listening on IPv4 0.0.0.0 port 15595...done HTTP Server listening on IPv6 :: port 15595...done HTTP Server listening on IPv4 0.0.0.0 port 22743...done HTTP Server listening on IPv6 :: port 22743...done HTTP Server listening on IPv4 0.0.0.0 port 10935...done HTTP Server listening on IPv6 :: port 10935...done HTTP Server listening on IPv4 0.0.0.0 port 43747...done HTTP Server listening on IPv6 :: port 43747...done Exiting via signal 15 Maybe unrelated to this change? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > I'm considering the below patch. I think there's room for further > improvement, maybe splitting the h_to_a method up (it's called with op > == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe > adding a some ecc_mod_canonical function. But deleting this unneeded > code right away seems like an improvement in itself. Now pushed to master-updates. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> I see you've made some chenges to the needed scratch space, if I >> understand it correctly, you need to allow h_to_a_itch larger than >> mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH >> and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that? >> >> The only reason ECDSA is affected at all by curve448, is that we have >> tests for ecdsa over the curve25519 and curve448, even though that's >> not the way these curves are intended to be used. Maybe that should >> just be deleted. > > Indeed, I agree to remove the tests and affected parts in the library. I'm considering the below patch. I think there's room for further improvement, maybe splitting the h_to_a method up (it's called with op == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe adding a some ecc_mod_canonical function. But deleting this unneeded code right away seems like an improvement in itself. Regards, /Niels diff --git a/ecc-eh-to-a.c b/ecc-eh-to-a.c index 8173b887..89d2b6e3 100644 --- a/ecc-eh-to-a.c +++ b/ecc-eh-to-a.c @@ -56,6 +56,8 @@ ecc_eh_to_a (const struct ecc_curve *ecc, mp_limb_t cy; + assert(op == 0); + /* Needs 2*size + scratch for the invert call. */ ecc->p.invert (>p, izp, zp, tp + ecc->p.size); @@ -63,25 +65,6 @@ ecc_eh_to_a (const struct ecc_curve *ecc, cy = mpn_sub_n (r, tp, ecc->p.m, ecc->p.size); cnd_copy (cy, r, tp, ecc->p.size); - if (op) -{ - /* Skip y coordinate */ - if (op > 1) - { - /* Reduce modulo q. Hardcoded for curve25519, duplicates end -of ecc_25519_modq. FIXME: Is this needed at all? op > 0 -is only used by ecdsa code, and ecdsa on Edwards curves -makes little sense and is is only used by tests. */ - unsigned shift; - assert (ecc->p.bit_size == 255); - shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1); - cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size, -r[ecc->p.size-1] >> shift); - assert (cy < 2); - cnd_add_n (cy, r, ecc->q.m, ecc->p.size); - } - return; -} ecc_modp_mul (ecc, tp, yp, izp); cy = mpn_sub_n (r + ecc->p.size, tp, ecc->p.m, ecc->p.size); cnd_copy (cy, r + ecc->p.size, tp, ecc->p.size); diff --git a/testsuite/ecdsa-keygen-test.c b/testsuite/ecdsa-keygen-test.c index a96c09ef..0deb7214 100644 --- a/testsuite/ecdsa-keygen-test.c +++ b/testsuite/ecdsa-keygen-test.c @@ -78,6 +78,10 @@ test_main (void) struct ecc_point pub; struct ecc_scalar key; + if (ecc->p.bit_size == 255) + /* Exclude curve25519, which isn't supported with ECDSA. */ + continue; + if (verbose) fprintf (stderr, "Curve %d\n", ecc->p.bit_size); diff --git a/testsuite/ecdsa-sign-test.c b/testsuite/ecdsa-sign-test.c index 23275357..b240a31b 100644 --- a/testsuite/ecdsa-sign-test.c +++ b/testsuite/ecdsa-sign-test.c @@ -156,18 +156,4 @@ test_main (void) "97536710 1F67D1CF 9BCCBF2F 3D239534" "FA509E70 AAC851AE 01AAC68D 62F86647" "2660"); /* s */ - - /* Non-standard ecdsa using curve25519. Not interop-tested with - anything else. */ - test_ecdsa (&_nettle_curve25519, - "1db511101b8fd16f e0212c5679ef53f3" - "323bde77f9efa442 617314d576d1dbcb", /* z */ - "aa2fa8facfdc3a99 ec466d41a2c9211c" - "e62e1706f54037ff 8486e26153b0fa79", /* k */ - SHEX("e99df2a098c3c590 ea1e1db6d9547339" - "ae760d5331496119 5d967fd881e3b0f5"), /* h */ - " 515c3a485f57432 0daf3353a0d08110" - "64157c556296de09 4132f74865961b37", /* r */ - " 78f23367291b01 3fc430fb09322d95" - "4384723649868d8e 88effc7ac8b141d7"); /* s */ } diff --git a/testsuite/ecdsa-verify-test.c b/testsuite/ecdsa-verify-test.c index 971988c3..6a593d6f 100644 --- a/testsuite/ecdsa-verify-test.c +++ b/testsuite/ecdsa-verify-test.c @@ -145,17 +145,4 @@ test_main (void) "97536710 1F67D1CF 9BCCBF2F 3D239534" "FA509E70 AAC851AE 01AAC68D 62F86647" "2660"); /* s */ - - test_ecdsa (&_nettle_curve25519, - /* Public key corresponding to the key in ecdsa-sign-test */ - "59f8f317fd5f4e82 c02f8d4dec665fe1" - "230f83b8572638e1 b2ac34a30028e24d", /* x */ - "1902a72dc1a6525a 811b9c1845978d56" - "fd97dce5e278ebdd ec695349d7e41498", /* y */ - SHEX("e99df2a098c3c590 ea1e1db6d9547339" - "ae760d5331496119 5d967fd881e3b0f5"), /* h */ - " 515c3a485f57432 0daf3353a0d08110" - "64157c556296de09 4132f74865961b37", /* r */ - " 78f23367291b01 3fc430fb09322d95" - "4384723649868d8e 88effc7ac8b141d7"); /* s */ } -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > Performance for the scalar multiplication primitives seem to be slower > than secp384 and slightly faster than secp521, and looking at point > addition, it's slower than secp521. I hope that will be improved a quite > a bit with an optimized mod operation for the curve448 prime. I've tried out this mod function (for 64-bit): static void ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp) { /* Let B = 2^64, b = 2^32 = sqrt(B). p = B^7 - b B^3 - 1 ==> B^7 = b B^3 + 1 {x_{13}, ..., x_0} = {x_6,...,x_0} + {x_{10},...,x_7} + 2 {x_{13},x_{12}, x_{11}} B^4 + b {x_{10},...,x_7,x_{13},x_{12},x_{11} */ mp_limb_t c3, c4, c7; mp_limb_t *tp = rp + 7; c4 = mpn_add_n (rp, rp, rp + 7, 4); c7 = mpn_addmul_1 (rp + 4, rp + 11, 3, 2); c3 = mpn_addmul_1 (rp, rp + 11, 3, (mp_limb_t) 1 << 32); c7 += mpn_addmul_1 (rp + 3, rp + 7, 4, (mp_limb_t) 1 << 32); tp[0] = c7; tp[1] = tp[2] = 0; tp[3] = c3 + (c7 << 32); tp[4] = c4 + (c7 >> 32) + (tp[3] < c3); tp[5] = tp[6] = 0; c7 = mpn_add_n (rp, rp, tp, 7); c7 = cnd_add_n (c7, rp, m->B, 7); assert(c7 == 0); } This gives a speedup of 85% over the general ecc_mod (on my machine), and gives about 35% speedup for scalar multiplication (both mul_g and mul_a). So with this change, performance of mul_g and mul_1 is roughly midway between secp384 and secp521. Not sure if replacing the addmul_1 calls with shifts is worthwhile for the C code (we'll get more function calls and more passes over the data, which should still be worthwhile for machines with slow multiplication), but for assembly implementation, the addmul_1(..., 2) call should be adds only, in registers, and the addmul_1(,..., 1<<32) should be shift and add, preferably in registers. I'm going to leave randomized testing running for a few hours. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > Thank you. By the way, one thing I realized in my past rebase attempts > is that, this commit doing the final reduction of a value by mod q seems > to be incorrect for curve448 and should probably be reverted: > > commit 6cf6abd68eb3d6c8c8e5ab217be734f9c537037f > Author: Daiki Ueno > Date: Sat Aug 5 09:43:47 2017 +0200 > > ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value > > This allows the same code to be reused in curve448 and Ed448. > > Signed-off-by: Daiki Ueno > > - shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1); > + shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1); > cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size, > r[ecc->p.size-1] >> shift); > > For curve25519, q is defined as: > > 2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed > > whose bit pattern starts with 0x1000, so r - q * (r>>252) should > work. > > On the other hand, for curve448, q is defined as: > > 2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d > > whose bit pattern starts with 0x. In that case the formula (r - q * > (r>>445)) could be incorrect due to the accumulated errors by > multiplication (i.e. q * 0x7FFF...). Good catch! Right, this needs a bit more analysis. Fur curve25519, the subtraction can underflow (unlikely), which is addressed with the conditional addition a few lines down. > Therefore, I suggest using r - q * (r>>446) instead, though it would > introduce another hard-coded value. But for curve448, that subtraction will never underflow, instead it will sometimes produce a non-canonical result, r >= q. So correcting the shift isn't enough. On the other hand, this code should perhaps be deleted altogether, I think h_to_a with flags == 2 is used only for ecdsa. It might make sense to instead add a function pointer to struct ecc_modulo to do canonical reduction; that's needed in a few different places, not only here. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > Daiki Ueno writes: > >> From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001 >> From: Daiki Ueno >> Date: Fri, 23 Jun 2017 17:26:18 +0200 >> Subject: [PATCH 2/4] Implement Curve448 primitives >> >> This patch adds the necessary primitives for "curve448", defined in >> RFC 7748. Those primitives are namely: addition, doubling, scalar >> multiplication of the generator or an arbitrary point, inversion, and >> square root. > > At last, I've now merged this onto the curve448 branch. That's a great news, thank you Niels! > I see you've made some chenges to the needed scratch space, if I > understand it correctly, you need to allow h_to_a_itch larger than > mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH > and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that? > > The only reason ECDSA is affected at all by curve448, is that we have > tests for ecdsa over the curve25519 and curve448, even though that's > not the way these curves are intended to be used. Maybe that should > just be deleted. Indeed, I agree to remove the tests and affected parts in the library. > Performance for the scalar multiplication primitives seem to be slower > than secp384 and slightly faster than secp521, and looking at point > addition, it's slower than secp521. I hope that will be improved a quite > a bit with an optimized mod operation for the curve448 prime. > >> While the interface is similar to curve25519, the implementation is >> slightly different. For curve25519, the Pippenger tables are >> generated through the coordinates on the Montgomery curve. On the >> other hand, the tables for curve448 are directly generated from the >> coordinates on the corresponding Edwards curve ("edwards448"). > > This is no longer the case, since the handling curve 25519 was changed > early on, based on your patches back then. Thank you. By the way, one thing I realized in my past rebase attempts is that, this commit doing the final reduction of a value by mod q seems to be incorrect for curve448 and should probably be reverted: commit 6cf6abd68eb3d6c8c8e5ab217be734f9c537037f Author: Daiki Ueno Date: Sat Aug 5 09:43:47 2017 +0200 ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value This allows the same code to be reused in curve448 and Ed448. Signed-off-by: Daiki Ueno - shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1); + shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1); cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size, r[ecc->p.size-1] >> shift); For curve25519, q is defined as: 2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed whose bit pattern starts with 0x1000, so r - q * (r>>252) should work. On the other hand, for curve448, q is defined as: 2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d whose bit pattern starts with 0x. In that case the formula (r - q * (r>>445)) could be incorrect due to the accumulated errors by multiplication (i.e. q * 0x7FFF...). Therefore, I suggest using r - q * (r>>446) instead, though it would introduce another hard-coded value. Regards, -- Daiki Ueno ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Ueno writes: > From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001 > From: Daiki Ueno > Date: Fri, 23 Jun 2017 17:26:18 +0200 > Subject: [PATCH 2/4] Implement Curve448 primitives > > This patch adds the necessary primitives for "curve448", defined in > RFC 7748. Those primitives are namely: addition, doubling, scalar > multiplication of the generator or an arbitrary point, inversion, and > square root. At last, I've now merged this onto the curve448 branch. I see you've made some chenges to the needed scratch space, if I understand it correctly, you need to allow h_to_a_itch larger than mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that? The only reason ECDSA is affected at all by curve448, is that we have tests for ecdsa over the curve25519 and curve448, even though that's not the way these curves are intended to be used. Maybe that should just be deleted. Performance for the scalar multiplication primitives seem to be slower than secp384 and slightly faster than secp521, and looking at point addition, it's slower than secp521. I hope that will be improved a quite a bit with an optimized mod operation for the curve448 prime. > While the interface is similar to curve25519, the implementation is > slightly different. For curve25519, the Pippenger tables are > generated through the coordinates on the Montgomery curve. On the > other hand, the tables for curve448 are directly generated from the > coordinates on the corresponding Edwards curve ("edwards448"). This is no longer the case, since the handling curve 25519 was changed early on, based on your patches back then. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Hello Niels, Daiki Ueno writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> I think there are three main pieces left to integrate. >> >> 1. Curve operations to support Curve448 (i.e., diffie-hellman >>operations). I have made some progress, on my curve448 branch, >> >> 2. SHAKE 128/256. I think I had some question on the interface design. >> >> 3. EdDSA 448. >> >> Optimization of the mod p arithmetic isn't that important yet, Is there anything I can do to get this merged in upstream? Now that OpenSSL supports the curve and EdDSA, it would be interesting to add it in GnuTLS. I tried to integrate it into GnuTLS bundling the current code, and it can interoperate with OpenSSL: https://gitlab.com/gnutls/gnutls/merge_requests/984 For convenience I am attaching the remaining patches for nettle. Regards, -- Daiki Ueno ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
ni...@lysator.liu.se (Niels Möller) writes: > I think there are three main pieces left to integrate. > > 1. Curve operations to support Curve448 (i.e., diffie-hellman >operations). I have made some progress, on my curve448 branch, > > 2. SHAKE 128/256. I think I had some question on the interface design. > > 3. EdDSA 448. > > Optimization of the mod p arithmetic isn't that important yet, I see. I thought that the performance of curve operations should at least be comparable to P-521. However, even with the generic ecc_mod for mod p, those are already close. So let's look at the above items first. I have rebased my patch implementing (1) on the curve448 branch: https://gitlab.com/dueno/nettle/commits/wip/dueno/curve448-2 One thing I noticed is that the point addition formula for untwisted curves doesn't look correct: https://gitlab.com/dueno/nettle/commit/4e3a50f4a50d8d03536dc107d7b77c84462e3068#6c80341e16ba39077bf2507d8450393d7e7e677a_261_262 > but I'll nevertheless try to explain how I think about it. Thank you for the detailed explanation. I ran the benchmark for those 3 variants: (1) the original version using ecc_mod, (2) the two step reduction as you suggest, and (3) my formula optimized with single 7-limbs operations: size modp reduce modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh mul_g mul_a (us) 448 0.0727 0.0720 0.0739 44.01 1.451 52.92 1.088 1.456 1.406 299.6 557.6 521 0.0139 0.0151 0.1003 77.72 1.703 101.59 0.728 0.995 1.277 255.8 588.4 448 0.0496 0.0497 0.0764 34.77 1.500 49.59 0.923 1.158 1.169 273.5 500.1 521 0.0147 0.0144 0.1027 77.63 1.816 88.57 0.716 0.934 1.276 237.2 589.9 448 0.0641 0.0644 0.0809 52.76 1.570 49.42 1.007 1.340 1.343 288.1 570.5 521 0.0139 0.0141 0.0967 78.22 1.697 99.44 0.714 1.012 1.264 235.8 589.2 on Core i7-6600U CPU @ 2.60GHz. My code could be wrong or inefficient, but actually (2) is the fastest. (3) is slower due to the final carry handling; the carry is accumulated at most 3 and wrapping around it with cnd_add_n seems to be costly. Regards, -- Daiki Ueno ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Uenowrites: > Thank you for the suggestions and sorry for the shameless delay. No problem, I'm also quite slow. I think there are three main pieces left to integrate. 1. Curve operations to support Curve448 (i.e., diffie-hellman operations). I have made some progress, on my curve448 branch, 2. SHAKE 128/256. I think I had some question on the interface design. 3. EdDSA 448. Optimization of the mod p arithmetic isn't that important yet, but I'll nevertheless try to explain how I think about it. > Due to my ignorance, I probably don't get what you mean, but as far as I > read the implementations of other curves in Nettle, some of them seem to > use the property of generalized Mersenne numbers described in: > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2133 > and it's also applicable to the Curve448 prime. The way I think about it, we have p = 2^448 - 2^224 - 1. This implies that 2^448 = 2^224 + 1 (mod p) (since a = b (mod p) means that a - b is a multiple of p, and in this case, the difference between left and right hand is exactly p). > So a 896-bit product here: > > a0 + a1 * 2^k + a2 * 2^2k + a3 * 2^3k (k = 224) And in this notation, 2^2k = 2^k + 1 (mod p) and 2^3k = 2^2k + 2^k (mod p). If we use these relations to replace the high powers of two, we get precisely your formulas: > can be reduced to: > > b0 + b1 * 2^k (mod p) > > through modular additions of the smaller numbers: > > b0 = a0 + a2 + a3 (mod p) > b1 = a1 + a2 + a3 + a3 (mod p) I'm not sure what's the best way to organize this. I suggested doing it in two steps, first reduce the high three chunks to two and leave a0 unchanged: a1 + a2 2^k + a3 2^2k = (a1 + a3) + (a2 + a3) 2^k = b1 + b_2 2^k (mod p) where we can do the carry propagation and carry wrap-around to get b0 and b1 as 224-bit numbers. And then repeat the same procedure once more a0 + b1 2^k + b2 2^2k = (a0 + b2) + (b1 + b2) 2^k = c0 + c1 2^k This is going to work out nicely on 32-bit platforms. For 64-bit, one should get better performance the more shift operations can be avoided, and your variant might be faster then the above. On the other hand, for assembly implementation keeping values in registers, the two step way needs fewer registers, since loading the lowest limbs can be postponed until after the higher order limbs are eliminated and no longer need any registers. Looking at your formulas, > b0 = a0 + a2 + a3 (mod p) > b1 = a1 + a2 + a3 + a3 (mod p) I think it should be possible to implement with only a single 7-limb shift: In the b0 formula, only a3 needs shifting, a0 and a2 are already at the right position. And in the b1 formula, a1 and a2 are at the right position, but a2 needs a shift. I think I'd try the following: 1. Do the additions a0 += a2, a1 += a3, which I think can be done as a single 7-limb add, with plain carry propagation between the pieces. Carry out can be folded immediately, or saved for later. 2. Do the second a1 += a3 (needs some masking to extract the a3 bits, but no shift). Or maybe skip the masking, and handle the low 32 bits of a3 some other way. 3. Do an in-place 32-bit left shift of a2, a3. 4. Add the shifted terms, a0 += a3, a1 += a2. This gets even better if we make the a2, a3 shift above a rotate storing the shifted out high bits of a3 back at the position of the low bits of a2. 5. Do final carry handling. There's no need to try to find the optimal reduction strategy right away, though, other missing pieces are more important. > I tried to implement it (as attached) and got 20-30% speed-up with > mini-gmp. For good test coverage of these functions, I recommend building with the real gmp (to get access to its mpz_rrandomb function), and then let while NETTLE_TEST_SEED=0 ./ecc-mod-test ; do : ; done run over night. And possibly also hacking ecc_mod_test.c to only test the curve of interest. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Hello, Thank you for the suggestions and sorry for the shameless delay. ni...@lysator.liu.se (Niels Möller) writes: >> Also, optimized implementation of modular reduction is currently >> missing, which is beyond my expertise. I would appreciate any >> suggestions regarding that. > > If we do Euclidean reduction, we should use the property that > > 2^448 = 2^224 + 1 (mod) > > And we'd need to use this twice to reduce a 896-bit product to 448 bits. > On 64-bit machines, we'll get some shifting since 224 isn't a multiple > of 64. Due to my ignorance, I probably don't get what you mean, but as far as I read the implementations of other curves in Nettle, some of them seem to use the property of generalized Mersenne numbers described in: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2133 and it's also applicable to the Curve448 prime. So a 896-bit product here: a0 + a1 * 2^k + a2 * 2^2k + a3 * 2^3k (k = 224) can be reduced to: b0 + b1 * 2^k (mod p) through modular additions of the smaller numbers: b0 = a0 + a2 + a3 (mod p) b1 = a1 + a2 + a3 + a3 (mod p) I tried to implement it (as attached) and got 20-30% speed-up with mini-gmp. size modp reduce modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh mul_g mul_a (us) 192 0.0066 0.0061 0.0770 28.90 0.000 0.00 0.698 0.811 1.014 44.2 188.5 224 0.0091 0.0091 0.1231 47.57 0.000 0.00 0.969 1.285 1.566 90.2 329.1 255 0.0081 0.0078 0.1550 18.70 0.000 0.00 0.767 1.014 1.082 88.7 278.4 256 0.1288 0.0115 0.1464 50.22 0.000 0.00 1.004 1.457 1.701 105.6 380.1 384 0.0170 0.0170 0.1260 106.51 0.000 0.00 1.791 2.240 3.113 305.9 1015.7 448 0.1892 0.1886 0.1886 171.57 0.000 0.00 3.630 4.914 4.940 842.1 1966.8 521 0.0163 0.0158 0.2821 217.44 0.000 0.00 3.555 4.507 6.107 788.8 2637.8 size modp reduce modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh mul_g mul_a (us) 192 0.0080 0.0082 0.0748 30.61 0.000 0.00 0.663 0.810 1.007 43.9 189.9 224 0.0103 0.0108 0.1281 47.48 0.000 0.00 0.968 1.481 1.632 89.4 323.2 255 0.0095 0.0090 0.1545 18.71 0.000 0.00 0.828 1.081 1.085 90.8 284.9 256 0.1353 0.0123 0.1423 51.25 0.000 0.00 0.992 1.326 1.726 107.4 379.3 384 0.0174 0.0171 0.1253 108.82 0.000 0.00 1.760 2.233 2.991 302.9 1004.2 448 0.1382 0.1380 0.1865 147.87 0.000 0.00 3.167 4.161 4.211 738.6 1668.3 521 0.0186 0.0179 0.2750 216.30 0.000 0.00 3.438 4.249 5.756 743.5 2510.6 Do you think this is acceptable or need further optimization? Regards, -- Daiki Ueno >From bc210d94144d6b5d1e050f78b7261d58a58d8f19 Mon Sep 17 00:00:00 2001 From: Daiki UenoDate: Tue, 9 Jan 2018 16:13:53 +0100 Subject: [PATCH] ecc: Add optimized modp implementation for curve448 Signed-off-by: Daiki Ueno --- ecc-448.c | 72 +-- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/ecc-448.c b/ecc-448.c index f98616d8..d62e8e1b 100644 --- a/ecc-448.c +++ b/ecc-448.c @@ -45,6 +45,74 @@ #include "ecc-448.h" +#if HAVE_NATIVE_ecc_448_modp + +#define ecc_448_modp nettle_ecc_448_modp +void +ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp); + +#elif GMP_NUMB_BITS == 64 +/* + * Use 2^448 = 2^224 + 1 (mod p), compute: + * b0 + b1 * 2^224 = a0 + a1 * 2^224 + a2 * 2^448 + a3 * 2^672 (mod p) + * + * as p is considered as a generalized Mersenne number, b0 and b1 can + * be calculated with the following shortcut: + * + * b0 = a0 + a2 + a3 (mod p) + * b1 = a1 + a2 + a3 + a3 (mod p) + */ +static void +ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp) +{ + mp_limb_t a1[4], a3[4], cy, mask; + + /* Extract a1 */ + mpn_copyi (a1, rp + 3, 4); + mpn_rshift (a1, a1, 4, 32); + rp[3] &= 0x; + + /* Extract a3 */ + mpn_copyi (a3, rp + 10, 4); + mpn_rshift (a3, a3, 4, 32); + rp[10] &= 0x; + + /* a2 = a2 + a3 */ + mpn_add_n (rp + 7, rp + 7, a3, 4); + + /* a0 = a0 + a2 (carry propagated to a2) */ + mpn_add_n (rp, rp, rp + 7, 4); + mpn_rshift (, rp + 3, 1, 32); + sec_add_1 (rp + 7, rp + 7, 4, cy); + + /* a2 = a2 + a1 */ + mpn_add_n (rp + 7, rp + 7, a1, 4); + + /* a2 = a2 + a3 */ + mpn_add_n (rp + 7, rp + 7, a3, 4); + mpn_rshift (, rp + 10, 1, 32); + + /* Move a2 next to a0 */ + rp[6] = 0; + mpn_lshift (rp + 6, rp + 7, 4, 32); + rp[3] &= 0x; + rp[3] |= rp[6]; + mpn_copyi (rp + 4, rp + 7, 3); + + assert (cy <= 3); + mask = (1 << cy) - 1; + cy = cnd_add_n (mask & 1, rp, ecc_Bmodp, 7); + assert (cy == 0); + cy = cnd_add_n (mask & 2, rp, ecc_Bmodp, 7); + assert (cy == 0); + cy = cnd_add_n (mask & 4, rp, ecc_Bmodp, 7); + assert (cy == 0); +} + +#else +#define ecc_448_modp ecc_mod +#endif + /* Needs 2*ecc->size limbs at rp, and 2*ecc->size additional limbs of scratch space. No overlap allowed. */ static void @@ -221,8 +289,8 @@ const struct ecc_curve _nettle_curve448 =
Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448
Daiki Uenowrites: > This patch series implements the Curve448 Diffie-Hellman protocol (RFC > 7748) and the Ed448 signature scheme (RFC 8032). Although I tried to > make it as close as possible to the Curve25519 and Ed25519 > implementations, I had to add a few special cases, namely: Thanks a lot for this work. I've started by applying the first 4 easy patches (currently on the master-updates branch). > - for Curve448, eccdata directly calculates points on the Edwards > curve instead of the equivalent Montgomery curve It would be nice to do it the same way. Are Montgomery computations the same, regardless of twist/no-twist? > - untwisted versions of ecc_add_eh* and ecc_dup_eh are added Just a note that the twisted versions should at some point be updated to use more effficient formulas. > - the point decoding for Ed448 uses a different formula to recover u and v Does it have to be done differently? I'll have to read up to find out. Efficient sqrt is going to be tailored for this prime to be most efficient. > Also, optimized implementation of modular reduction is currently > missing, which is beyond my expertise. I would appreciate any > suggestions regarding that. If we do Euclidean reduction, we should use the property that 2^448 = 2^224 + 1 (mod) And we'd need to use this twice to reduce a 896-bit product to 448 bits. On 64-bit machines, we'll get some shifting since 224 isn't a multiple of 64. Redc might be slightly more efficient than the above reduction. Due to the structure, it might also work nicely to do a "half redc", reducing from both ends, and they might share the shift. I'll have to think through the details. But unlikely to be a big win over Euclidean. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
[PATCH 0/8] Implement Curve448 ECDH and Ed448
Hello, This patch series implements the Curve448 Diffie-Hellman protocol (RFC 7748) and the Ed448 signature scheme (RFC 8032). Although I tried to make it as close as possible to the Curve25519 and Ed25519 implementations, I had to add a few special cases, namely: - for Curve448, eccdata directly calculates points on the Edwards curve instead of the equivalent Montgomery curve - untwisted versions of ecc_add_eh* and ecc_dup_eh are added - the point decoding for Ed448 uses a different formula to recover u and v Also, optimized implementation of modular reduction is currently missing, which is beyond my expertise. I would appreciate any suggestions regarding that. The patches are also available on: https://gitlab.com/dueno/nettle/commits/wip/dueno/ed448 Thanks to Hubert Kario and Nikos Mavrogiannopoulos for initial reviews. Daiki Ueno (8): ecc-mul-test: Fix mpn_cmp calls eccdata: Emit correct ecc_Bmodq_shifted for curve448 eccdata: Redirect ecc_point_out to given stream, instead of stderr ecc: Add add_hh and dup members to ecc_curve ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value Implement curve448 primitives Implement SHAKE128/256 functions Implement Ed448 signature scheme .gitignore |1 + Makefile.in | 18 +- curve448-eh-to-x.c | 73 + curve448-mul-g.c| 74 + curve448-mul.c | 148 + curve448.h | 58 + ecc-192.c |5 + ecc-224.c |5 + ecc-25519.c |5 + ecc-256.c |5 + ecc-384.c |5 + ecc-448.c | 273 ++ ecc-521.c |5 + ecc-add-eh.c| 74 +- ecc-add-ehh.c | 77 +- ecc-dup-eh.c| 55 +- ecc-eh-to-a.c |4 +- ecc-internal.h | 57 +- ecc-mul-a-eh.c | 12 +- ecc-mul-g-eh.c |4 +- ecc-point-mul-g.c |7 +- ecc-point-mul.c |2 +- ecc-point.c | 15 + eccdata.c | 183 +- ecdsa-keygen.c |4 +- ed25519-sha512-sign.c | 15 + ed448-shake256-pubkey.c | 60 + ed448-shake256-sign.c | 92 + ed448-shake256-verify.c | 66 + eddsa-compress.c| 11 +- eddsa-decompress.c | 15 +- eddsa-expand.c | 20 +- eddsa-hash.c| 35 + eddsa-pubkey.c |2 +- eddsa-sign.c| 18 +- eddsa-verify.c | 16 +- eddsa.h | 24 + examples/ecc-benchmark.c|1 + nettle-internal.h |2 +- nettle-meta-hashes.c|2 + nettle-meta.h |2 + nettle.texinfo | 152 +- sha3.c | 13 + sha3.h | 56 + shake128-meta.c | 42 + shake128.c | 84 + shake256-meta.c | 42 + shake256.c | 84 + testsuite/.test-rules.make | 12 + testsuite/Makefile.in |5 +- testsuite/curve448-dh-test.c| 100 + testsuite/ecc-add-test.c| 48 +- testsuite/ecc-dup-test.c| 12 +- testsuite/ecc-mul-a-test.c |6 +- testsuite/ecc-mul-g-test.c |6 +- testsuite/ecdh-test.c | 16 +- testsuite/ecdsa-keygen-test.c | 16 + testsuite/ed448-test.c | 240 ++ testsuite/eddsa-compress-test.c | 137 +- testsuite/eddsa-sign-test.c | 66 +- testsuite/eddsa-verify-test.c | 49 +- testsuite/meta-hash-test.c |2 + testsuite/shake.awk | 14 + testsuite/shake128-test.c | 6183 +++ testsuite/shake256-test.c | 6183 +++ testsuite/testutils.c | 57 +- 66 files changed, 14976 insertions(+), 199 deletions(-) create mode 100644 curve448-eh-to-x.c create mode 100644 curve448-mul-g.c create mode 100644 curve448-mul.c create mode 100644 curve448.h create mode 100644 ecc-448.c create mode 100644 ed448-shake256-pubkey.c create mode 100644 ed448-shake256-sign.c create mode 100644 ed448-shake256-verify.c create mode 100644 shake128-meta.c create mode 100644 shake128.c create mode 100644 shake256-meta.c create mode 100644 shake256.c create mode 100644 testsuite/curve448-dh-test.c create mode 100644 testsuite/ed448-test.c create mode 100755 testsuite/shake.awk create mode 100644 testsuite/shake128-test.c create mode 100644 testsuite/shake256-test.c -- 2.13.3 ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs