Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2020-01-02 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> I think this is complete now (except updating hogweed-benchmark), just
> pushed to the ed448 branch. Thanks for the patience.

It seems I forgot to add the new files in the first attempt. Ooops.
Fixed with a forced update on this branch.

Now ubsan fails, it doesn't like calling memcpy with null ptr and zero
size. I have to fix that. Not sure what's prettiest, either a
conditional, or a function pointer like it was done in the original patch.

In the mean time, I've updated hogweed benchmark (not yet committed).
This is what I get on my old (x86_64) laptop:

name size   sign/ms verify/ms
   ecdsa  1925.21941.7914
   ecdsa  2243.22971.1658
   ecdsa  2563.11531.0347
   ecdsa  3841.43930.4607
   ecdsa  5210.72770.2308
   eddsa  2556.02431.5598
   eddsa  4481.74640.4595

So speedwise, ed25519 is comparable to ecdsa on secp_192r1, and ed448 is
comparable to ecdsa on secp_384r1. In both cases, signing is a bit
faster (15%-20%), and verify is the same or a bit slower.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2020-01-02 Thread Niels Möller
Daiki Ueno  writes:

> Thank you very much for all the Curve448/SHAKE256 work for merging (I'm
> slowly catching up).

I think this is complete now (except updating hogweed-benchmark), just
pushed to the ed448 branch. Thanks for the patience.

>> These corner cases are a bit hard to test.
>
> For what it's worth, the original issue was reliably reproducible with
> the GnuTLS port[1] against the OpenSSL client.  Here is a test vector
> extracted from the interaction:

I'm afraid this doesn't exercise the corner cases. The thing is, we have q
close to 2^k (k = 2^252 for ed25519, k = 446 for ed448).

Then we want to reduce 

  r = hi 2^k + lo

modulo q, canonically. If we set 

  r' = r - hi * q

then it's highly likely that 0 <= r' < q, but not certain. 

For ed25519, q > 2^k, so we are guaranteed that r' < 2^k < q, but we may
get r' < 0.

For ed448, q < 2^k, so we are guaranteed that r' > 0, and we may instead
get r' >= q. 

For now, I've added the following logic to _eddsa_sign:

  if (ecc->p.bit_size == 255)
{
  /* FIXME: Special code duplicated in ecc_25519_modq
 Define a suitable method for canonical reduction? */

  /* q is slightly larger than 2^252, underflow from below
 mpn_submul_1 is unlikely. */
  unsigned shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1);
  q = sp[ecc->p.size-1] >> shift;
}
  else
{
  unsigned shift;

  assert (ecc->p.bit_size == 448);
  /* q is slightly smaller than 2^446 */
  shift = 446 - GMP_NUMB_BITS * (ecc->p.size - 1);
  /* Add one, then it's possible but unlikely that below
 mpn_submul_1 does *not* underflow. */
  q = (sp[ecc->p.size-1] >> shift) + 1;
}

  cy = mpn_submul_1 (sp, ecc->q.m, ecc->p.size, q);
  assert (cy < 2);
  cy -= cnd_add_n (cy, sp, ecc->q.m, ecc->p.size);
  assert (cy == 0);

I think that's correct, but it seems tricky to find inputs to
_eddsa_sign that will hit the corner cases. I've added some debug
printouts to verify that mpn_submul_1 returns 0 for the ed25519
testcases, and 1 for all the ed448 testcases. If it's taken out to a
separate function/method, then it gets easier to unit test.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2020-01-02 Thread Daiki Ueno
Hello Niels,

Thank you very much for all the Curve448/SHAKE256 work for merging (I'm
slowly catching up).

ni...@lysator.liu.se (Niels Möller) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> Daiki Ueno  writes:
>>
>>> For curve25519, q is defined as:
>>>
>>>   2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed
>>>
>>> whose bit pattern starts with 0x1000, so r - q * (r>>252) should
>>> work.
>>>
>>> On the other hand, for curve448, q is defined as:
>>>
>>>   2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d
>>>
>>> whose bit pattern starts with 0x.  In that case the formula (r - q *
>>> (r>>445)) could be incorrect due to the accumulated errors by
>>> multiplication (i.e. q * 0x7FFF...).
>>
>> Good catch! Right, this needs a bit more analysis. Fur curve25519, the
>> subtraction can underflow (unlikely), which is addressed with the
>> conditional addition a few lines down.
>
> For ecc_ah_to_a, this code was deleted, but it's still an issue for
> eddsa_sign. Maybe need special cases for both ed25519 and ed448 for now.
> Or some logic looking at the high limb of q.
>  
> These corner cases are a bit hard to test.

For what it's worth, the original issue was reliably reproducible with
the GnuTLS port[1] against the OpenSSL client.  Here is a test vector
extracted from the interaction:

  test_one 
("0cf87eb094bf46d161bde3b99d1d32856fecfae0142392cd98c091db206d174bbf8ef476a9cf746d94306c565f97ac50796f021eff8d779ca5"

"9addde61f668f2dbc0ac24874adb47a2aa6ad59fa888bdc5d430705ed0796a8c330782b51860785be63fd79b1c7cf58fd728b2bf3d77395100:"

"9addde61f668f2dbc0ac24874adb47a2aa6ad59fa888bdc5d430705ed0796a8c330782b51860785be63fd79b1c7cf58fd728b2bf3d77395100:"

"20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020544c5320312e332c207365727665722043657274696669636174655665726966790090da2c6178a3019274ed029ba5ad28f25662a78d71e8429c19f96007df39d7a77d7cb80f221c76db5e1c397714f48692:"

"91554b9b85058d3d6885997adf47e1f766ae780018ca26873de854fb12d789f3bf1f85d3ce5b23265d8d8900f62906e2eb4a064887beaf9cea26f0edeff35be1e969df77ab1368ced966beb0c7b6242aa0d8844d773e254cfed823d3a5e53b3ef557e716ce7cc2aaca127e86798f2b00"

"20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020544c5320312e332c207365727665722043657274696669636174655665726966790090da2c6178a3019274ed029ba5ad28f25662a78d71e8429c19f96007df39d7a77d7cb80f221c76db5e1c397714f48692:");

>> It might make sense to instead add a function pointer to struct
>> ecc_modulo to do canonical reduction; that's needed in a few different
>> places, not only here.
>
> I still think think this makes sense, but it's not clear to me what the
> usage really is.

Regards,

Footnotes:
[1]  https://gitlab.com/gnutls/gnutls/merge_requests/984

-- 
Daiki Ueno
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2020-01-02 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> Daiki Ueno  writes:
>
>> For curve25519, q is defined as:
>>
>>   2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed
>>
>> whose bit pattern starts with 0x1000, so r - q * (r>>252) should
>> work.
>>
>> On the other hand, for curve448, q is defined as:
>>
>>   2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d
>>
>> whose bit pattern starts with 0x.  In that case the formula (r - q *
>> (r>>445)) could be incorrect due to the accumulated errors by
>> multiplication (i.e. q * 0x7FFF...).
>
> Good catch! Right, this needs a bit more analysis. Fur curve25519, the
> subtraction can underflow (unlikely), which is addressed with the
> conditional addition a few lines down.

For ecc_ah_to_a, this code was deleted, but it's still an issue for
eddsa_sign. Maybe need special cases for both ed25519 and ed448 for now.
Or some logic looking at the high limb of q.
 
These corner cases are a bit hard to test.

> It might make sense to instead add a function pointer to struct
> ecc_modulo to do canonical reduction; that's needed in a few different
> places, not only here.

I still think think this makes sense, but it's not clear to me what the
usage really is.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-25 Thread Niels Möller
Daiki Ueno  writes:

>   Implement SHAKE128/256 functions

I've merged a stripped-down version of this patch, introducing a single new
function sha3_256_shake. If I've understood it correctly, that's what is
needed for ed448 signatures.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-08 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> I don't understand the gnutls failure. I've logged in to gitlab and
> pressed the "Retry"-button. If it keeps failing, I'll need some help
> investigating.

The retry passed. Merged this deletion (ecdsa over curve25519) to master
now.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-08 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> ni...@lysator.liu.se (Niels Möller) writes:
>>
>>> I'm considering the below patch. I think there's room for further
>>> improvement, maybe splitting the h_to_a method up (it's called with op
>>> == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe
>>> adding a some ecc_mod_canonical function. But deleting this unneeded
>>> code right away seems like an improvement in itself.
>>
>> Now pushed to master-updates.
>
> Failed the gnutls test "testcompat-openssl",
>
>   ###
>   # Server mode tests (gnutls server-openssl cli#
>   ###
>   Check TLS 1.0 with DHE-RSA ciphersuite
>   %NO_ETM: Check TLS 1.0 with DHE-RSA ciphersuite
>   %DISABLE_SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite
>   %NO_TICKETS: Check TLS 1.0 with DHE-RSA ciphersuite
>   %COMPAT: Check TLS 1.0 with DHE-RSA ciphersuite
>   %SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite
>   HTTP Server listening on IPv4 0.0.0.0 port 18033...done
>   HTTP Server listening on IPv6 :: port 18033...done
>   HTTP Server listening on IPv4 0.0.0.0 port 22536...done
>   HTTP Server listening on IPv6 :: port 22536...done
>   HTTP Server listening on IPv4 0.0.0.0 port 15595...done
>   HTTP Server listening on IPv6 :: port 15595...done
>   HTTP Server listening on IPv4 0.0.0.0 port 22743...done
>   HTTP Server listening on IPv6 :: port 22743...done
>   HTTP Server listening on IPv4 0.0.0.0 port 10935...done
>   HTTP Server listening on IPv6 :: port 10935...done
>   HTTP Server listening on IPv4 0.0.0.0 port 43747...done
>   HTTP Server listening on IPv6 :: port 43747...done
>   Exiting via signal 15
>
> Maybe unrelated to this change?

I don't understand the gnutls failure. I've logged in to gitlab and
pressed the "Retry"-button. If it keeps failing, I'll need some help
investigating.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-08 Thread Niels Möller
Daiki Ueno  writes:

> From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001
> From: Daiki Ueno 
> Date: Fri, 23 Jun 2017 17:26:18 +0200
> Subject: [PATCH 2/4] Implement Curve448 primitives
>
> This patch adds the necessary primitives for "curve448", defined in
> RFC 7748.  Those primitives are namely: addition, doubling, scalar
> multiplication of the generator or an arbitrary point, inversion, and
> square root.

[...]

> +/* Computes a^{(p-3)/4} = a^{2^446-2^222-1} mod m. Needs 9 * n scratch
> +   space. */
> +static void
> +ecc_mod_pow_446m224m1 (const struct ecc_modulo *p,
> +mp_limb_t *rp, const mp_limb_t *ap,
> +mp_limb_t *scratch)
> +{
> +#define t0 scratch
> +#define t1 (scratch + 3*ECC_LIMB_SIZE)
> +#define t2 (scratch + 6*ECC_LIMB_SIZE)

I think 6*n scratch space should be enough (with no other changes to
this function),

  #define t0 scratch
  #define t1 (scratch + 2*ECC_LIMB_SIZE)
  #define t2 (scratch + 4*ECC_LIMB_SIZE)

(And it could possibly be trimmed down a bit further, by storing the
reused value a^{2^222 - 1} first).

Do you agree? Then storage for a few other things can likely be trimmed
down too, in particular, curve448_mul would get the same scratch need as
curve25519_mul, 12*n rather than 14*n. 

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-07 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> I've tried out this mod function (for 64-bit):
>
> static void
> ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp)
...

> This gives a speedup of 85% over the general ecc_mod (on my machine),
> and gives about 35% speedup for scalar multiplication (both mul_g and
> mul_a). So with this change, performance of mul_g and mul_1 is roughly
> midway between secp384 and secp521.

Tried the below first implementation of an x86_64 mod function. Gives a
speedup of almost three times over the above C function. With this, the
mul_g operation is 20% slower than for secp384, and the mul_a operation
is slightly faster.

Rgards,
/Niels

diff --git a/configure.ac b/configure.ac
index 3547cae4..2933facf 100644
--- a/configure.ac
+++ b/configure.ac
@@ -476,7 +476,8 @@ asm_nettle_optional_list="gcm-hash8.asm cpuid.asm \
 asm_hogweed_optional_list=""
 if test "x$enable_public_key" = "xyes" ; then
   asm_hogweed_optional_list="ecc-192-modp.asm ecc-224-modp.asm \
-ecc-25519-modp.asm ecc-256-redc.asm ecc-384-modp.asm ecc-521-modp.asm"
+ecc-256-redc.asm ecc-384-modp.asm ecc-521-modp.asm \
+ecc-25519-modp.asm ecc-curve448-modp.asm"
 fi
 
 OPT_NETTLE_OBJS=""
@@ -580,6 +581,7 @@ AH_VERBATIM([HAVE_NATIVE],
 #undef HAVE_NATIVE_ecc_256_redc
 #undef HAVE_NATIVE_ecc_384_modp
 #undef HAVE_NATIVE_ecc_384_redc
+#undef HAVE_NATIVE_ecc_curve448_modp
 #undef HAVE_NATIVE_ecc_521_modp
 #undef HAVE_NATIVE_ecc_521_redc
 #undef HAVE_NATIVE_gcm_hash8
diff --git a/ecc-448.c b/ecc-448.c
index 7d68e1c8..2e840024 100644
--- a/ecc-448.c
+++ b/ecc-448.c
@@ -45,7 +45,11 @@
 
 #include "ecc-448.h"
 
-#if GMP_NUMB_BITS == 64
+#if HAVE_NATIVE_ecc_curve448_modp
+#define ecc_448_modp nettle_ecc_curve448_modp
+void
+ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp);
+#elif GMP_NUMB_BITS == 64
 static void
 ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp)
 {
diff --git a/x86_64/ecc-curve448-modp.asm b/x86_64/ecc-curve448-modp.asm
new file mode 100644
index ..5ce81960
--- /dev/null
+++ b/x86_64/ecc-curve448-modp.asm
@@ -0,0 +1,141 @@
+C x86_64/ecc-curve448-modp.asm
+
+ifelse(<
+   Copyright (C) 2019 Niels Möller
+
+   This file is part of GNU Nettle.
+
+   GNU Nettle is free software: you can redistribute it and/or
+   modify it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+   Software Foundation; either version 3 of the License, or (at your
+   option) any later version.
+
+   or
+
+ * the GNU General Public License as published by the Free
+   Software Foundation; either version 2 of the License, or (at your
+   option) any later version.
+
+   or both in parallel, as here.
+
+   GNU Nettle is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received copies of the GNU General Public License and
+   the GNU Lesser General Public License along with this program.  If
+   not, see http://www.gnu.org/licenses/.
+>)
+
+   .file "ecc-curve448-modp.asm"
+
+define(, <%rsi>)
+define(, <%rax>)
+define(, <%rbx>)
+define(, <%rcx>)
+define(, <%rdx>)
+define(, <%rbp>)
+define(, <%rdi>)
+define(, <%r8>)
+define(, <%r9>)
+define(, <%r10>)
+define(, <%r11>)
+define(, <%r12>)
+
+PROLOGUE(nettle_ecc_curve448_modp)
+   W64_ENTRY(2, 0)
+
+   push%rbx
+   push%rbp
+   push%r12
+
+   C First load the values to be shifted by 32.
+   mov 88(RP), X1
+   mov X1, X0
+   mov 96(RP), X2
+   mov X1, T0
+   mov 104(RP), X3
+   mov X2, T1
+   mov 56(RP), X4
+   mov X3, T2
+   mov 64(RP), X5
+   mov 72(RP), X6
+   mov 80(RP), X7
+
+   C Multiply by 2^32
+   shl $32, X0
+   shrd $32, X2, X1
+   shrd $32, X3, X2
+   shrd $32, X4, X3
+   shrd $32, X5, X4
+   shrd $32, X6, X5
+   shrd $32, X7, X6
+   shr $32, X7
+
+   C Multiply by 2
+   add T0, T0
+   adc T1, T1
+   adc T2, T2
+   adc $0, X7
+
+   C Main additions
+   add 56(RP), X0
+   adc 64(RP), X1
+   adc 72(RP), X2
+   adc 80(RP), X3
+   adc T0, X4
+   adc T1, X5
+   adc T2, X6
+   adc $0, X7
+
+   add (RP), X0
+   adc 8(RP), X1
+   adc 16(RP), X2
+   adc 24(RP), X3
+   adc 32(RP), X4
+   adc 40(RP), X5
+   adc 48(RP), X6
+   adc $0, X7
+
+   mov X7, T0
+   mov X7, T1
+   shl $32, T0
+   shr $32, T1
+   xor T2, T2
+   add X7, X0
+   adc $0, X1
+   adc $0, X2
+   adc T0, X3
+   adc T1, X4
+   adc $0, X5
+   adc $0, X6
+   adc $0, T2
+
+   mov T2, T0
+   shl $32, T0
+
+   add T2, X0
+   mov X0, (RP)
+   adc $0, X1
+   mov X1, 8(RP)
+   adc $0, X2
+   mov X2, 16(RP)
+   adc T0, X3
+   mov X3, 24(RP)

Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-07 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I'm considering the below patch. I think there's room for further
>> improvement, maybe splitting the h_to_a method up (it's called with op
>> == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe
>> adding a some ecc_mod_canonical function. But deleting this unneeded
>> code right away seems like an improvement in itself.
>
> Now pushed to master-updates.

Failed the gnutls test "testcompat-openssl",

  ###
  # Server mode tests (gnutls server-openssl cli#
  ###
  Check TLS 1.0 with DHE-RSA ciphersuite
  %NO_ETM: Check TLS 1.0 with DHE-RSA ciphersuite
  %DISABLE_SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite
  %NO_TICKETS: Check TLS 1.0 with DHE-RSA ciphersuite
  %COMPAT: Check TLS 1.0 with DHE-RSA ciphersuite
  %SAFE_RENEGOTIATION: Check TLS 1.0 with DHE-RSA ciphersuite
  HTTP Server listening on IPv4 0.0.0.0 port 18033...done
  HTTP Server listening on IPv6 :: port 18033...done
  HTTP Server listening on IPv4 0.0.0.0 port 22536...done
  HTTP Server listening on IPv6 :: port 22536...done
  HTTP Server listening on IPv4 0.0.0.0 port 15595...done
  HTTP Server listening on IPv6 :: port 15595...done
  HTTP Server listening on IPv4 0.0.0.0 port 22743...done
  HTTP Server listening on IPv6 :: port 22743...done
  HTTP Server listening on IPv4 0.0.0.0 port 10935...done
  HTTP Server listening on IPv6 :: port 10935...done
  HTTP Server listening on IPv4 0.0.0.0 port 43747...done
  HTTP Server listening on IPv6 :: port 43747...done
  Exiting via signal 15

Maybe unrelated to this change?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> I'm considering the below patch. I think there's room for further
> improvement, maybe splitting the h_to_a method up (it's called with op
> == 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe
> adding a some ecc_mod_canonical function. But deleting this unneeded
> code right away seems like an improvement in itself.

Now pushed to master-updates.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-04 Thread Niels Möller
Daiki Ueno  writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I see you've made some chenges to the needed scratch space, if I
>> understand it correctly, you need to allow h_to_a_itch larger than
>> mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH
>> and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that?
>>
>> The only reason ECDSA is affected at all by curve448, is that we have
>> tests for ecdsa over the curve25519 and curve448, even though that's
>> not the way these curves are intended to be used. Maybe that should
>> just be deleted.
>
> Indeed, I agree to remove the tests and affected parts in the library.

I'm considering the below patch. I think there's room for further
improvement, maybe splitting the h_to_a method up (it's called with op
== 0, and with op == 2 from the ecdsa, but never with op == 1). Maybe
adding a some ecc_mod_canonical function. But deleting this unneeded
code right away seems like an improvement in itself.

Regards,
/Niels

diff --git a/ecc-eh-to-a.c b/ecc-eh-to-a.c
index 8173b887..89d2b6e3 100644
--- a/ecc-eh-to-a.c
+++ b/ecc-eh-to-a.c
@@ -56,6 +56,8 @@ ecc_eh_to_a (const struct ecc_curve *ecc,
 
   mp_limb_t cy;
 
+  assert(op == 0);
+
   /* Needs 2*size + scratch for the invert call. */
   ecc->p.invert (>p, izp, zp, tp + ecc->p.size);
 
@@ -63,25 +65,6 @@ ecc_eh_to_a (const struct ecc_curve *ecc,
   cy = mpn_sub_n (r, tp, ecc->p.m, ecc->p.size);
   cnd_copy (cy, r, tp, ecc->p.size);
 
-  if (op)
-{
-  /* Skip y coordinate */
-  if (op > 1)
-   {
- /* Reduce modulo q. Hardcoded for curve25519, duplicates end
-of ecc_25519_modq. FIXME: Is this needed at all? op > 0
-is only used by ecdsa code, and ecdsa on Edwards curves
-makes little sense and is is only used by tests. */
- unsigned shift;
- assert (ecc->p.bit_size == 255);
- shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1);
- cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size,
-r[ecc->p.size-1] >> shift);
- assert (cy < 2);
- cnd_add_n (cy, r, ecc->q.m, ecc->p.size);
-   }
-  return;
-}
   ecc_modp_mul (ecc, tp, yp, izp);
   cy = mpn_sub_n (r + ecc->p.size, tp, ecc->p.m, ecc->p.size);
   cnd_copy (cy, r + ecc->p.size, tp, ecc->p.size);
diff --git a/testsuite/ecdsa-keygen-test.c b/testsuite/ecdsa-keygen-test.c
index a96c09ef..0deb7214 100644
--- a/testsuite/ecdsa-keygen-test.c
+++ b/testsuite/ecdsa-keygen-test.c
@@ -78,6 +78,10 @@ test_main (void)
   struct ecc_point pub;
   struct ecc_scalar key;
 
+  if (ecc->p.bit_size == 255)
+   /* Exclude curve25519, which isn't supported with ECDSA. */
+   continue;
+
   if (verbose)
fprintf (stderr, "Curve %d\n", ecc->p.bit_size);
 
diff --git a/testsuite/ecdsa-sign-test.c b/testsuite/ecdsa-sign-test.c
index 23275357..b240a31b 100644
--- a/testsuite/ecdsa-sign-test.c
+++ b/testsuite/ecdsa-sign-test.c
@@ -156,18 +156,4 @@ test_main (void)
  "97536710 1F67D1CF 9BCCBF2F 3D239534"
  "FA509E70 AAC851AE 01AAC68D 62F86647"
  "2660"); /* s */
-
-  /* Non-standard ecdsa using curve25519. Not interop-tested with
- anything else. */
-  test_ecdsa (&_nettle_curve25519,
- "1db511101b8fd16f e0212c5679ef53f3"
- "323bde77f9efa442 617314d576d1dbcb", /* z */
- "aa2fa8facfdc3a99 ec466d41a2c9211c"
- "e62e1706f54037ff 8486e26153b0fa79", /* k */
- SHEX("e99df2a098c3c590 ea1e1db6d9547339"
-  "ae760d5331496119 5d967fd881e3b0f5"), /* h */
- " 515c3a485f57432 0daf3353a0d08110"
- "64157c556296de09 4132f74865961b37", /* r */
- "  78f23367291b01 3fc430fb09322d95"
- "4384723649868d8e 88effc7ac8b141d7"); /* s */
 }
diff --git a/testsuite/ecdsa-verify-test.c b/testsuite/ecdsa-verify-test.c
index 971988c3..6a593d6f 100644
--- a/testsuite/ecdsa-verify-test.c
+++ b/testsuite/ecdsa-verify-test.c
@@ -145,17 +145,4 @@ test_main (void)
  "97536710 1F67D1CF 9BCCBF2F 3D239534" 
  "FA509E70 AAC851AE 01AAC68D 62F86647"
  "2660"); /* s */
-
-  test_ecdsa (&_nettle_curve25519,
- /* Public key corresponding to the key in ecdsa-sign-test */
- "59f8f317fd5f4e82 c02f8d4dec665fe1"
- "230f83b8572638e1 b2ac34a30028e24d", /* x */
- "1902a72dc1a6525a 811b9c1845978d56"
- "fd97dce5e278ebdd ec695349d7e41498", /* y */
- SHEX("e99df2a098c3c590 ea1e1db6d9547339"
-  "ae760d5331496119 5d967fd881e3b0f5"), /* h */
- " 515c3a485f57432 0daf3353a0d08110"
- "64157c556296de09 4132f74865961b37", /* r */
- "  78f23367291b01 3fc430fb09322d95"
- "4384723649868d8e 88effc7ac8b141d7"); /* s */
 }

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale 

Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-02 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> Performance for the scalar multiplication primitives seem to be slower
> than secp384 and slightly faster than secp521, and looking at point
> addition, it's slower than secp521. I hope that will be improved a quite
> a bit with an optimized mod operation for the curve448 prime.

I've tried out this mod function (for 64-bit):

static void
ecc_448_modp(const struct ecc_modulo *m, mp_limb_t *rp)
{
  /* Let B = 2^64, b = 2^32 = sqrt(B).
 p = B^7 - b B^3 - 1 ==> B^7 = b B^3 + 1

 {x_{13}, ..., x_0} =
   {x_6,...,x_0} 
 + {x_{10},...,x_7}
 + 2 {x_{13},x_{12}, x_{11}} B^4 
 + b {x_{10},...,x_7,x_{13},x_{12},x_{11}
  */
  mp_limb_t c3, c4, c7;
  mp_limb_t *tp = rp + 7;

  c4 = mpn_add_n (rp, rp, rp + 7, 4);
  c7 = mpn_addmul_1 (rp + 4, rp + 11, 3, 2);
  c3 = mpn_addmul_1 (rp, rp + 11, 3, (mp_limb_t) 1 << 32);
  c7 += mpn_addmul_1 (rp + 3, rp + 7, 4, (mp_limb_t) 1 << 32);
  tp[0] = c7;
  tp[1] = tp[2] = 0;
  tp[3] = c3 + (c7 << 32);
  tp[4] = c4 + (c7 >> 32) + (tp[3] < c3);
  tp[5] = tp[6] = 0;
  c7 = mpn_add_n (rp, rp, tp, 7);
  c7 = cnd_add_n (c7, rp, m->B, 7);
  assert(c7 == 0);  
}

This gives a speedup of 85% over the general ecc_mod (on my machine),
and gives about 35% speedup for scalar multiplication (both mul_g and
mul_a). So with this change, performance of mul_g and mul_1 is roughly
midway between secp384 and secp521.

Not sure if replacing the addmul_1 calls with shifts is worthwhile for
the C code (we'll get more function calls and more passes over the data,
which should still be worthwhile for machines with slow multiplication),
but for assembly implementation, the addmul_1(..., 2) call should be adds
only, in registers, and the addmul_1(,..., 1<<32) should be shift and
add, preferably in registers.

I'm going to leave randomized testing running for a few hours.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-02 Thread Niels Möller
Daiki Ueno  writes:

> Thank you.  By the way, one thing I realized in my past rebase attempts
> is that, this commit doing the final reduction of a value by mod q seems
> to be incorrect for curve448 and should probably be reverted:
>
>   commit 6cf6abd68eb3d6c8c8e5ab217be734f9c537037f
>   Author: Daiki Ueno 
>   Date:   Sat Aug 5 09:43:47 2017 +0200
>
>   ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value
>   
>   This allows the same code to be reused in curve448 and Ed448.
>   
>   Signed-off-by: Daiki Ueno 
>
> - shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1);
> + shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1);
>   cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size,
>  r[ecc->p.size-1] >> shift);
>
> For curve25519, q is defined as:
>
>   2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed
>
> whose bit pattern starts with 0x1000, so r - q * (r>>252) should
> work.
>
> On the other hand, for curve448, q is defined as:
>
>   2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d
>
> whose bit pattern starts with 0x.  In that case the formula (r - q *
> (r>>445)) could be incorrect due to the accumulated errors by
> multiplication (i.e. q * 0x7FFF...).

Good catch! Right, this needs a bit more analysis. Fur curve25519, the
subtraction can underflow (unlikely), which is addressed with the
conditional addition a few lines down.

> Therefore, I suggest using r - q * (r>>446) instead, though it would
> introduce another hard-coded value.

But for curve448, that subtraction will never underflow, instead it will
sometimes produce a non-canonical result, r >= q. So correcting the
shift isn't enough.

On the other hand, this code should perhaps be deleted altogether, I
think h_to_a with flags == 2 is used only for ecdsa. It might make sense
to instead add a function pointer to struct ecc_modulo to do canonical
reduction; that's needed in a few different places, not only here.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-02 Thread Daiki Ueno
ni...@lysator.liu.se (Niels Möller) writes:

> Daiki Ueno  writes:
>
>> From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001
>> From: Daiki Ueno 
>> Date: Fri, 23 Jun 2017 17:26:18 +0200
>> Subject: [PATCH 2/4] Implement Curve448 primitives
>>
>> This patch adds the necessary primitives for "curve448", defined in
>> RFC 7748.  Those primitives are namely: addition, doubling, scalar
>> multiplication of the generator or an arbitrary point, inversion, and
>> square root.
>
> At last, I've now merged this onto the curve448 branch.

That's a great news, thank you Niels!

> I see you've made some chenges to the needed scratch space, if I
> understand it correctly, you need to allow h_to_a_itch larger than
> mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH
> and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that?
>
> The only reason ECDSA is affected at all by curve448, is that we have
> tests for ecdsa over the curve25519 and curve448, even though that's
> not the way these curves are intended to be used. Maybe that should
> just be deleted.

Indeed, I agree to remove the tests and affected parts in the library.

> Performance for the scalar multiplication primitives seem to be slower
> than secp384 and slightly faster than secp521, and looking at point
> addition, it's slower than secp521. I hope that will be improved a quite
> a bit with an optimized mod operation for the curve448 prime.
>
>> While the interface is similar to curve25519, the implementation is
>> slightly different.  For curve25519, the Pippenger tables are
>> generated through the coordinates on the Montgomery curve.  On the
>> other hand, the tables for curve448 are directly generated from the
>> coordinates on the corresponding Edwards curve ("edwards448").
>
> This is no longer the case, since the handling curve 25519 was changed
> early on, based on your patches back then.

Thank you.  By the way, one thing I realized in my past rebase attempts
is that, this commit doing the final reduction of a value by mod q seems
to be incorrect for curve448 and should probably be reverted:

  commit 6cf6abd68eb3d6c8c8e5ab217be734f9c537037f
  Author: Daiki Ueno 
  Date:   Sat Aug 5 09:43:47 2017 +0200

  ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value
  
  This allows the same code to be reused in curve448 and Ed448.
  
  Signed-off-by: Daiki Ueno 

- shift = 252 - GMP_NUMB_BITS * (ecc->p.size - 1);
+ shift = ecc->q.bit_size - 1 - GMP_NUMB_BITS * (ecc->p.size - 1);
  cy = mpn_submul_1 (r, ecc->q.m, ecc->p.size,
 r[ecc->p.size-1] >> shift);

For curve25519, q is defined as:

  2^252 + 0x14def9dea2f79cd65812631a5cf5d3ed

whose bit pattern starts with 0x1000, so r - q * (r>>252) should
work.

On the other hand, for curve448, q is defined as:

  2^446 - 0x8335dc163bb124b65129c96fde933d8d723a70aadc873d6d54a7bb0d

whose bit pattern starts with 0x.  In that case the formula (r - q *
(r>>445)) could be incorrect due to the accumulated errors by
multiplication (i.e. q * 0x7FFF...).

Therefore, I suggest using r - q * (r>>446) instead, though it would
introduce another hard-coded value.

Regards,
-- 
Daiki Ueno
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-12-01 Thread Niels Möller
Daiki Ueno  writes:

> From 8bc6e735d4b40cbab5e187a28e01b63a04ecd92b Mon Sep 17 00:00:00 2001
> From: Daiki Ueno 
> Date: Fri, 23 Jun 2017 17:26:18 +0200
> Subject: [PATCH 2/4] Implement Curve448 primitives
>
> This patch adds the necessary primitives for "curve448", defined in
> RFC 7748.  Those primitives are namely: addition, doubling, scalar
> multiplication of the generator or an arbitrary point, inversion, and
> square root.

At last, I've now merged this onto the curve448 branch.

I see you've made some chenges to the needed scratch space, if I
understand it correctly, you need to allow h_to_a_itch larger than
mul_itch or mul_g_itch. You increase the value of ECC_ECDSA_SIGN_ITCH
and add a new ECC_ECDSA_KEYGEN_ITCH. Can you comment on that?

The only reason ECDSA is affected at all by curve448, is that we have
tests for ecdsa over the curve25519 and curve448, even though that's not
the way these curves are intended to be used. Maybe that should just be
deleted.

Performance for the scalar multiplication primitives seem to be slower
than secp384 and slightly faster than secp521, and looking at point
addition, it's slower than secp521. I hope that will be improved a quite
a bit with an optimized mod operation for the curve448 prime.

> While the interface is similar to curve25519, the implementation is
> slightly different.  For curve25519, the Pippenger tables are
> generated through the coordinates on the Montgomery curve.  On the
> other hand, the tables for curve448 are directly generated from the
> coordinates on the corresponding Edwards curve ("edwards448").

This is no longer the case, since the handling curve 25519 was changed
early on, based on your patches back then.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2019-04-26 Thread Daiki Ueno
Hello Niels,

Daiki Ueno  writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I think there are three main pieces left to integrate.
>>
>> 1. Curve operations to support Curve448 (i.e., diffie-hellman
>>operations). I have made some progress, on my curve448 branch,
>>
>> 2. SHAKE 128/256. I think I had some question on the interface design.
>>
>> 3. EdDSA 448.
>>
>> Optimization of the mod p arithmetic isn't that important yet,

Is there anything I can do to get this merged in upstream?  Now that
OpenSSL supports the curve and EdDSA, it would be interesting to add it
in GnuTLS.

I tried to integrate it into GnuTLS bundling the current code, and it
can interoperate with OpenSSL:
https://gitlab.com/gnutls/gnutls/merge_requests/984

For convenience I am attaching the remaining patches for nettle.

Regards,
-- 
Daiki Ueno
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2018-01-12 Thread Daiki Ueno
ni...@lysator.liu.se (Niels Möller) writes:

> I think there are three main pieces left to integrate.
>
> 1. Curve operations to support Curve448 (i.e., diffie-hellman
>operations). I have made some progress, on my curve448 branch,
>
> 2. SHAKE 128/256. I think I had some question on the interface design.
>
> 3. EdDSA 448.
>
> Optimization of the mod p arithmetic isn't that important yet,

I see.  I thought that the performance of curve operations should at
least be comparable to P-521.  However, even with the generic ecc_mod
for mod p, those are already close.  So let's look at the above items
first.  I have rebased my patch implementing (1) on the curve448 branch:
https://gitlab.com/dueno/nettle/commits/wip/dueno/curve448-2

One thing I noticed is that the point addition formula for untwisted
curves doesn't look correct:
https://gitlab.com/dueno/nettle/commit/4e3a50f4a50d8d03536dc107d7b77c84462e3068#6c80341e16ba39077bf2507d8450393d7e7e677a_261_262

> but I'll nevertheless try to explain how I think about it.

Thank you for the detailed explanation.  I ran the benchmark for those 3
variants: (1) the original version using ecc_mod, (2) the two step
reduction as you suggest, and (3) my formula optimized with single
7-limbs operations:

size   modp reduce   modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh  mul_g  
mul_a (us)
 448 0.0727 0.0720 0.0739  44.01  1.451  52.92  1.088  1.456  1.406  299.6  
557.6
 521 0.0139 0.0151 0.1003  77.72  1.703 101.59  0.728  0.995  1.277  255.8  
588.4

 448 0.0496 0.0497 0.0764  34.77  1.500  49.59  0.923  1.158  1.169  273.5  
500.1
 521 0.0147 0.0144 0.1027  77.63  1.816  88.57  0.716  0.934  1.276  237.2  
589.9

 448 0.0641 0.0644 0.0809  52.76  1.570  49.42  1.007  1.340  1.343  288.1  
570.5
 521 0.0139 0.0141 0.0967  78.22  1.697  99.44  0.714  1.012  1.264  235.8  
589.2

on Core i7-6600U CPU @ 2.60GHz.

My code could be wrong or inefficient, but actually (2) is the fastest.
(3) is slower due to the final carry handling; the carry is accumulated
at most 3 and wrapping around it with cnd_add_n seems to be costly.

Regards,
-- 
Daiki Ueno
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2018-01-09 Thread Niels Möller
Daiki Ueno  writes:

> Thank you for the suggestions and sorry for the shameless delay.

No problem, I'm also quite slow. I think there are three main pieces
left to integrate.

1. Curve operations to support Curve448 (i.e., diffie-hellman
   operations). I have made some progress, on my curve448 branch,

2. SHAKE 128/256. I think I had some question on the interface design.

3. EdDSA 448.

Optimization of the mod p arithmetic isn't that important yet, but I'll
nevertheless try to explain how I think about it.

> Due to my ignorance, I probably don't get what you mean, but as far as I
> read the implementations of other curves in Nettle, some of them seem to
> use the property of generalized Mersenne numbers described in:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2133
> and it's also applicable to the Curve448 prime.

The way I think about it, we have p = 2^448 - 2^224 - 1. This implies
that

   2^448 = 2^224 + 1 (mod p)

(since a = b (mod p) means that a - b is a multiple of p, and in this
case, the difference between left and right hand is exactly p).

> So a 896-bit product here:
>
>   a0 + a1 * 2^k + a2 * 2^2k + a3 * 2^3k (k = 224)

And in this notation, 2^2k = 2^k + 1 (mod p) and 2^3k = 2^2k + 2^k (mod p). If
we use these relations to replace the high powers of two, we get
precisely your formulas:

> can be reduced to:
>
>   b0 + b1 * 2^k (mod p)
>
> through modular additions of the smaller numbers:
>
>   b0 = a0 + a2 + a3 (mod p)
>   b1 = a1 + a2 + a3 + a3 (mod p)

I'm not sure what's the best way to organize this. I suggested doing it
in two steps, first reduce the high three chunks to two and leave a0 unchanged:

a1 +  a2 2^k + a3 2^2k 
 = (a1 + a3) + (a2 + a3) 2^k
 =  b1 + b_2 2^k  (mod p)

where we can do the carry propagation and carry wrap-around to get b0
and b1 as 224-bit numbers.

And then repeat the same procedure once more

a0 +  b1 2^k + b2 2^2k 
 = (a0 + b2) + (b1 + b2) 2^k
 =  c0 +  c1 2^k

This is going to work out nicely on 32-bit platforms. For 64-bit, one
should get better performance the more shift operations can be avoided,
and your variant might be faster then the above. 

On the other hand, for assembly implementation keeping values in
registers, the two step way needs fewer registers, since loading the
lowest limbs can be postponed until after the higher order limbs are
eliminated and no longer need any registers.

Looking at your formulas,

>   b0 = a0 + a2 + a3 (mod p)
>   b1 = a1 + a2 + a3 + a3 (mod p)

I think it should be possible to implement with only a single 7-limb
shift: In the b0 formula, only a3 needs shifting, a0 and a2 are already
at the right position. And in the b1 formula, a1 and a2 are at the right
position, but a2 needs a shift. I think I'd try the following:

1. Do the additions a0 += a2, a1 += a3, which I think can be done as a
   single 7-limb add, with plain carry propagation between the pieces.
   Carry out can be folded immediately, or saved for later.

2. Do the second a1 += a3 (needs some masking to extract the a3 bits,
   but no shift). Or maybe skip the masking, and handle the low 32 bits
   of a3 some other way.

3. Do an in-place 32-bit left shift of a2, a3.

4. Add the shifted terms, a0 += a3, a1 += a2. This gets even better if
   we make the a2, a3 shift above a rotate storing the shifted out high
   bits of a3 back at the position of the low bits of a2.

5. Do final carry handling.

There's no need to try to find the optimal reduction strategy right
away, though, other missing pieces are more important.

> I tried to implement it (as attached) and got 20-30% speed-up with
> mini-gmp.

For good test coverage of these functions, I recommend building with the real 
gmp (to get access to
its mpz_rrandomb function), and then let

  while NETTLE_TEST_SEED=0 ./ecc-mod-test ; do : ; done

run over night. And possibly also hacking ecc_mod_test.c to only test
the curve of interest.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2018-01-09 Thread Daiki Ueno
Hello,

Thank you for the suggestions and sorry for the shameless delay.

ni...@lysator.liu.se (Niels Möller) writes:

>> Also, optimized implementation of modular reduction is currently
>> missing, which is beyond my expertise.  I would appreciate any
>> suggestions regarding that.
>
> If we do Euclidean reduction, we should use the property that
>
>   2^448 = 2^224 + 1 (mod)
>
> And we'd need to use this twice to reduce a 896-bit product to 448 bits.
> On 64-bit machines, we'll get some shifting since 224 isn't a multiple
> of 64.

Due to my ignorance, I probably don't get what you mean, but as far as I
read the implementations of other curves in Nettle, some of them seem to
use the property of generalized Mersenne numbers described in:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2133
and it's also applicable to the Curve448 prime.

So a 896-bit product here:

  a0 + a1 * 2^k + a2 * 2^2k + a3 * 2^3k (k = 224)

can be reduced to:

  b0 + b1 * 2^k (mod p)

through modular additions of the smaller numbers:

  b0 = a0 + a2 + a3 (mod p)
  b1 = a1 + a2 + a3 + a3 (mod p)

I tried to implement it (as attached) and got 20-30% speed-up with
mini-gmp.

size   modp reduce   modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh  mul_g  
mul_a (us)
 192 0.0066 0.0061 0.0770  28.90  0.000   0.00  0.698  0.811  1.014   44.2  
188.5
 224 0.0091 0.0091 0.1231  47.57  0.000   0.00  0.969  1.285  1.566   90.2  
329.1
 255 0.0081 0.0078 0.1550  18.70  0.000   0.00  0.767  1.014  1.082   88.7  
278.4
 256 0.1288 0.0115 0.1464  50.22  0.000   0.00  1.004  1.457  1.701  105.6  
380.1
 384 0.0170 0.0170 0.1260 106.51  0.000   0.00  1.791  2.240  3.113  305.9 
1015.7
 448 0.1892 0.1886 0.1886 171.57  0.000   0.00  3.630  4.914  4.940  842.1 
1966.8
 521 0.0163 0.0158 0.2821 217.44  0.000   0.00  3.555  4.507  6.107  788.8 
2637.8

size   modp reduce   modq modinv mi_gcd mi_pow dup_jj ad_jja ad_hhh  mul_g  
mul_a (us)
 192 0.0080 0.0082 0.0748  30.61  0.000   0.00  0.663  0.810  1.007   43.9  
189.9
 224 0.0103 0.0108 0.1281  47.48  0.000   0.00  0.968  1.481  1.632   89.4  
323.2
 255 0.0095 0.0090 0.1545  18.71  0.000   0.00  0.828  1.081  1.085   90.8  
284.9
 256 0.1353 0.0123 0.1423  51.25  0.000   0.00  0.992  1.326  1.726  107.4  
379.3
 384 0.0174 0.0171 0.1253 108.82  0.000   0.00  1.760  2.233  2.991  302.9 
1004.2
 448 0.1382 0.1380 0.1865 147.87  0.000   0.00  3.167  4.161  4.211  738.6 
1668.3
 521 0.0186 0.0179 0.2750 216.30  0.000   0.00  3.438  4.249  5.756  743.5 
2510.6

Do you think this is acceptable or need further optimization?

Regards,
-- 
Daiki Ueno
>From bc210d94144d6b5d1e050f78b7261d58a58d8f19 Mon Sep 17 00:00:00 2001
From: Daiki Ueno 
Date: Tue, 9 Jan 2018 16:13:53 +0100
Subject: [PATCH] ecc: Add optimized modp implementation for curve448

Signed-off-by: Daiki Ueno 
---
 ecc-448.c | 72 +--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/ecc-448.c b/ecc-448.c
index f98616d8..d62e8e1b 100644
--- a/ecc-448.c
+++ b/ecc-448.c
@@ -45,6 +45,74 @@
 
 #include "ecc-448.h"
 
+#if HAVE_NATIVE_ecc_448_modp
+
+#define ecc_448_modp nettle_ecc_448_modp
+void
+ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp);
+
+#elif GMP_NUMB_BITS == 64
+/*
+ * Use 2^448 = 2^224 + 1 (mod p), compute:
+ *   b0 + b1 * 2^224 = a0 + a1 * 2^224 + a2 * 2^448 + a3 * 2^672 (mod p)
+ *
+ * as p is considered as a generalized Mersenne number, b0 and b1 can
+ * be calculated with the following shortcut:
+ *
+ *   b0 = a0 + a2 + a3 (mod p)
+ *   b1 = a1 + a2 + a3 + a3 (mod p)
+ */
+static void
+ecc_448_modp (const struct ecc_modulo *m, mp_limb_t *rp)
+{
+  mp_limb_t a1[4], a3[4], cy, mask;
+
+  /* Extract a1 */
+  mpn_copyi (a1, rp + 3, 4);
+  mpn_rshift (a1, a1, 4, 32);
+  rp[3] &= 0x;
+
+  /* Extract a3 */
+  mpn_copyi (a3, rp + 10, 4);
+  mpn_rshift (a3, a3, 4, 32);
+  rp[10] &= 0x;
+
+  /* a2 = a2 + a3 */
+  mpn_add_n (rp + 7, rp + 7, a3, 4);
+
+  /* a0 = a0 + a2 (carry propagated to a2) */
+  mpn_add_n (rp, rp, rp + 7, 4);
+  mpn_rshift (, rp + 3, 1, 32);
+  sec_add_1 (rp + 7, rp + 7, 4, cy);
+
+  /* a2 = a2 + a1 */
+  mpn_add_n (rp + 7, rp + 7, a1, 4);
+
+  /* a2 = a2 + a3 */
+  mpn_add_n (rp + 7, rp + 7, a3, 4);
+  mpn_rshift (, rp + 10, 1, 32);
+
+  /* Move a2 next to a0 */
+  rp[6] = 0;
+  mpn_lshift (rp + 6, rp + 7, 4, 32);
+  rp[3] &= 0x;
+  rp[3] |= rp[6];
+  mpn_copyi (rp + 4, rp + 7, 3);
+
+  assert (cy <= 3);
+  mask = (1 << cy) - 1;
+  cy = cnd_add_n (mask & 1, rp, ecc_Bmodp, 7);
+  assert (cy == 0);
+  cy = cnd_add_n (mask & 2, rp, ecc_Bmodp, 7);
+  assert (cy == 0);
+  cy = cnd_add_n (mask & 4, rp, ecc_Bmodp, 7);
+  assert (cy == 0);
+}
+
+#else
+#define ecc_448_modp ecc_mod
+#endif
+
 /* Needs 2*ecc->size limbs at rp, and 2*ecc->size additional limbs of
scratch space. No overlap allowed. */
 static void
@@ -221,8 +289,8 @@ const struct ecc_curve _nettle_curve448 =
 

Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

2017-09-09 Thread Niels Möller
Daiki Ueno  writes:

> This patch series implements the Curve448 Diffie-Hellman protocol (RFC
> 7748) and the Ed448 signature scheme (RFC 8032).  Although I tried to
> make it as close as possible to the Curve25519 and Ed25519
> implementations, I had to add a few special cases, namely:

Thanks a lot for this work. I've started by applying the first 4 easy
patches (currently on the master-updates branch).

> - for Curve448, eccdata directly calculates points on the Edwards
>   curve instead of the equivalent Montgomery curve

It would be nice to do it the same way. Are Montgomery computations the
same, regardless of twist/no-twist?

> - untwisted versions of ecc_add_eh* and ecc_dup_eh are added

Just a note that the twisted versions should at some point be updated to
use more effficient formulas.

> - the point decoding for Ed448 uses a different formula to recover u and v

Does it have to be done differently? I'll have to read up to find out.
Efficient sqrt is going to be tailored for this prime to be most
efficient.

> Also, optimized implementation of modular reduction is currently
> missing, which is beyond my expertise.  I would appreciate any
> suggestions regarding that.

If we do Euclidean reduction, we should use the property that

  2^448 = 2^224 + 1 (mod)

And we'd need to use this twice to reduce a 896-bit product to 448 bits.
On 64-bit machines, we'll get some shifting since 224 isn't a multiple
of 64.

Redc might be slightly more efficient than the above reduction. Due to
the structure, it might also work nicely to do a "half redc", reducing
from both ends, and they might share the shift. I'll have to think
through the details. But unlikely to be a big win over Euclidean.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


[PATCH 0/8] Implement Curve448 ECDH and Ed448

2017-08-05 Thread Daiki Ueno
Hello,

This patch series implements the Curve448 Diffie-Hellman protocol (RFC
7748) and the Ed448 signature scheme (RFC 8032).  Although I tried to
make it as close as possible to the Curve25519 and Ed25519
implementations, I had to add a few special cases, namely:

- for Curve448, eccdata directly calculates points on the Edwards
  curve instead of the equivalent Montgomery curve
- untwisted versions of ecc_add_eh* and ecc_dup_eh are added
- the point decoding for Ed448 uses a different formula to recover u and v

Also, optimized implementation of modular reduction is currently
missing, which is beyond my expertise.  I would appreciate any
suggestions regarding that.

The patches are also available on:
https://gitlab.com/dueno/nettle/commits/wip/dueno/ed448

Thanks to Hubert Kario and Nikos Mavrogiannopoulos for initial reviews.

Daiki Ueno (8):
  ecc-mul-test: Fix mpn_cmp calls
  eccdata: Emit correct ecc_Bmodq_shifted for curve448
  eccdata: Redirect ecc_point_out to given stream, instead of stderr
  ecc: Add add_hh and dup members to ecc_curve
  ecc-eh-to-a, eddsa-sign: Parameterize hard-coded value
  Implement curve448 primitives
  Implement SHAKE128/256 functions
  Implement Ed448 signature scheme

 .gitignore  |1 +
 Makefile.in |   18 +-
 curve448-eh-to-x.c  |   73 +
 curve448-mul-g.c|   74 +
 curve448-mul.c  |  148 +
 curve448.h  |   58 +
 ecc-192.c   |5 +
 ecc-224.c   |5 +
 ecc-25519.c |5 +
 ecc-256.c   |5 +
 ecc-384.c   |5 +
 ecc-448.c   |  273 ++
 ecc-521.c   |5 +
 ecc-add-eh.c|   74 +-
 ecc-add-ehh.c   |   77 +-
 ecc-dup-eh.c|   55 +-
 ecc-eh-to-a.c   |4 +-
 ecc-internal.h  |   57 +-
 ecc-mul-a-eh.c  |   12 +-
 ecc-mul-g-eh.c  |4 +-
 ecc-point-mul-g.c   |7 +-
 ecc-point-mul.c |2 +-
 ecc-point.c |   15 +
 eccdata.c   |  183 +-
 ecdsa-keygen.c  |4 +-
 ed25519-sha512-sign.c   |   15 +
 ed448-shake256-pubkey.c |   60 +
 ed448-shake256-sign.c   |   92 +
 ed448-shake256-verify.c |   66 +
 eddsa-compress.c|   11 +-
 eddsa-decompress.c  |   15 +-
 eddsa-expand.c  |   20 +-
 eddsa-hash.c|   35 +
 eddsa-pubkey.c  |2 +-
 eddsa-sign.c|   18 +-
 eddsa-verify.c  |   16 +-
 eddsa.h |   24 +
 examples/ecc-benchmark.c|1 +
 nettle-internal.h   |2 +-
 nettle-meta-hashes.c|2 +
 nettle-meta.h   |2 +
 nettle.texinfo  |  152 +-
 sha3.c  |   13 +
 sha3.h  |   56 +
 shake128-meta.c |   42 +
 shake128.c  |   84 +
 shake256-meta.c |   42 +
 shake256.c  |   84 +
 testsuite/.test-rules.make  |   12 +
 testsuite/Makefile.in   |5 +-
 testsuite/curve448-dh-test.c|  100 +
 testsuite/ecc-add-test.c|   48 +-
 testsuite/ecc-dup-test.c|   12 +-
 testsuite/ecc-mul-a-test.c  |6 +-
 testsuite/ecc-mul-g-test.c  |6 +-
 testsuite/ecdh-test.c   |   16 +-
 testsuite/ecdsa-keygen-test.c   |   16 +
 testsuite/ed448-test.c  |  240 ++
 testsuite/eddsa-compress-test.c |  137 +-
 testsuite/eddsa-sign-test.c |   66 +-
 testsuite/eddsa-verify-test.c   |   49 +-
 testsuite/meta-hash-test.c  |2 +
 testsuite/shake.awk |   14 +
 testsuite/shake128-test.c   | 6183 +++
 testsuite/shake256-test.c   | 6183 +++
 testsuite/testutils.c   |   57 +-
 66 files changed, 14976 insertions(+), 199 deletions(-)
 create mode 100644 curve448-eh-to-x.c
 create mode 100644 curve448-mul-g.c
 create mode 100644 curve448-mul.c
 create mode 100644 curve448.h
 create mode 100644 ecc-448.c
 create mode 100644 ed448-shake256-pubkey.c
 create mode 100644 ed448-shake256-sign.c
 create mode 100644 ed448-shake256-verify.c
 create mode 100644 shake128-meta.c
 create mode 100644 shake128.c
 create mode 100644 shake256-meta.c
 create mode 100644 shake256.c
 create mode 100644 testsuite/curve448-dh-test.c
 create mode 100644 testsuite/ed448-test.c
 create mode 100755 testsuite/shake.awk
 create mode 100644 testsuite/shake128-test.c
 create mode 100644 testsuite/shake256-test.c

-- 
2.13.3

___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs