Re: [curves] new 25519 measurements of formally verified implementations

2018-02-23 Thread Armando Faz Hernández


Quoting "Jason A. Donenfeld" :

Hi Armando,

I've started importing your precomputation implementation into kernel
space for use in kbench9000 (and in WireGuard and the kernel crypto
library too, of course).

- The first problem remains the license. The kernel requires
GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't
up to me at all, unfortunately, so this stuff will have to be licensed
differently in order to be useful.



The rfc7748_precomputed library is now released under LGPLv2.1.
We are happy to see our code integrated in more projects.

Quoting "Jason A. Donenfeld" :

- It looks like the precomputation implementation is failing some unit
tests! Perhaps it's not properly reducing incoming public points?

There's the vector if you'd like to play with it. The other test
vectors I have do pass, though, which is good I suppose.


Thanks, for this observation. The code was missing to handle some carry bits,
producing incorrect outputs for numbers between 2p and 2^256. Now, I have
rewritten some operations for GF(2^255-19) considering all of these cases.
More tests were added and fuzz test against HACL implementation.

Code is available at:
  https://github.com/armfazh/rfc7748_precomputed  (commit c79ca5e...)

*Disclaimer: More test and work is needed for the GF(2^448-2^224-1)  
arithmetic.



On the plus side, the implementation is super fast:
With turbo on, on my E3-1505Mv5, I'm getting:

donna64: 121793 cycles per call
 hacl64: 109793 cycles per call
 fiat64: 108937 cycles per call
sandy2x: 103003 cycles per call
  amd64: 108688 cycles per call
precomp: 83391 cycles per call
 fiat32: 232835 cycles per call
donna32: 411511 cycles per call

The benchmark of your precomputation implementation has what's
referred to by medical doctors as "less digits".


Due to the bug's corrections, a slight loss of performance was observed;
however, other operations were optimized too counteracting the losses.
Let us know about your new measurements.



--
Armando Faz Hernández, PhD Candidate.
Instituto de Computação, Unicamp.
Campinas, Brasil.

___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-02-01 Thread Jason A. Donenfeld
Hi Armando,

I've started importing your precomputation implementation into kernel
space for use in kbench9000 (and in WireGuard and the kernel crypto
library too, of course).

- The first problem remains the license. The kernel requires
GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't
up to me at all, unfortunately, so this stuff will have to be licensed
differently in order to be useful.

- It looks like the precomputation implementation is failing some unit
tests! Perhaps it's not properly reducing incoming public points?

{
.private = { 1 },
.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
},
.result = { 0xb3, 0x2d, 0x13, 0x62, 0xc2, 0x48, 0xd6, 0x2f, 0xe6,
0x26, 0x19, 0xcf, 0xf0, 0x4d, 0xd4, 0x3d, 0xb7, 0x3f, 0xfc, 0x1b,
0x63, 0x8, 0xed, 0xe3, 0xb, 0x78, 0xd8, 0x73, 0x80, 0xf1, 0xe8, 0x34 }
}

[ 8855.567043] Expected: b3 2d 13 62 c2 48 d6 2f e6 26 19 cf f0 4d d4
3d  .-.b.H./.&...M.=
[ 8855.567044] Expected: b7 3f fc 1b 63 08 ed e3 0b 78 d8 73 80 f1 e8
34  .?..cx.s...4
[ 8855.567046] Actual: eb 1b 2b df 13 6a 3e bc 30 9f a4 f7 a1 95 a7 08
 ..+..j>.0...
[ 8855.567047] Actual: 11 7f 7c e4 6e 65 a4 44 48 22 4d 00 78 54 70 5b
 ..|.ne.DH"M.xTp[
[ 8855.567048] kbench9000: precomp self-test 4: FAIL

There's the vector if you'd like to play with it. The other test
vectors I have do pass, though, which is good I suppose.

On the plus side, the implementation is super fast:

With turbo on, on my E3-1505Mv5, I'm getting:

donna64: 121793 cycles per call
 hacl64: 109793 cycles per call
 fiat64: 108937 cycles per call
sandy2x: 103003 cycles per call
  amd64: 108688 cycles per call
precomp: 83391 cycles per call
 fiat32: 232835 cycles per call
donna32: 411511 cycles per call

The benchmark of your precomputation implementation has what's
referred to by medical doctors as "less digits".

Regards,
Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-02-01 Thread Jason A. Donenfeld
Hi Armando,

Sure, I'll have a look at this.

I've also found https://github.com/armfazh/hp-ecc-vec . Is this the
code related to your 2015 paper entitled, "Fast Implementation of
Curve25519 Using AVX2"? Or the presentation Dan mentioned a few posts
up? Or both at once?

Also, would you consider relicensing these as GPLv2 so that they can
be used in the Linux kernel? (Alternatively, BSD/MIT/Public Domain for
everyone else?)

Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-31 Thread Jason A. Donenfeld
I've loaded in fiat64 into the latest kbench curve testing branch, and
it seems to be the fastest generic C version, at least on my Skylake
laptop, inching out slightly in front of hacl64:

donna64: 121790 cycles per call
hacl64: 109782 cycles per call
fiat64: 108984 cycles per call
sandy2x: 102996 cycles per call
 amd64: 108563 cycles per call
fiat32: 232826 cycles per call
donna32: 412092 cycles per call
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-27 Thread Jason A. Donenfeld
Hey Dan,

Thanks for the pointer and the link to the slides. I've heard about
this implementation before, but I was never able to get a hold of the
source to try it out. I just emailed him to see if it's available
somewhere. Looks like there's a conference paper from Latincrypt 2015
that describes its implementation.

Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-26 Thread D. J. Bernstein
Tung Chou's sandy2x code was (as the name suggests) optimized for Sandy
Bridge. For Haswell and Skylake, the slides from Julio Lopez in

   https://hyperelliptic.org/tanja/lc17/ascrypto.html

report two followup implementations producing roughly 25% speedups for
Curve25519; see slide 67/83.

I do think that the hacl64 Curve25519 speeds are fast enough for pretty
much everybody, and verification is certainly a huge plus, but people
who want more speed should be aware of what's possible---and people
working on Curve25519 verification shouldn't think they're done yet!

---Dan
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves