Re: [curves] new 25519 measurements of formally verified implementations

2018-02-23 Thread Jason A. Donenfeld
Hey Armando,

Thanks for taking the time to fix things up.

I've loaded this into my kbench9000 software
, in the branch
"jd/curve-comparison", and tested it on two Skylake systems -- a
laptop and a server. With turbo disabled, results are fairly similar
between the two:

Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
 donna64: 160942 cycles per call
  hacl64: 140902 cycles per call
  fiat64: 144106 cycles per call
 sandy2x: 136074 cycles per call
precomp_bmi2: 121350 cycles per call
 precomp_adx: 117676 cycles per call
   amd64: 143628 cycles per call
  fiat32: 307971 cycles per call
 donna32: 544254 cycles per call

Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz
 donna64: 162308 cycles per call
  hacl64: 141948 cycles per call
  fiat64: 146188 cycles per call
 sandy2x: 135502 cycles per call
precomp_bmi2: 121061 cycles per call
 precomp_adx: 117636 cycles per call
   amd64: 146382 cycles per call
  fiat32: 30 cycles per call
 donna32: 548081 cycles per call

Your implementations are the two precomp_ ones. I split it into
separate paths for adx and for bmi2, so that we could test it together
on one system.

Regards,
Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-02-23 Thread Armando Faz Hernández


Quoting "Jason A. Donenfeld" :

Hi Armando,

I've started importing your precomputation implementation into kernel
space for use in kbench9000 (and in WireGuard and the kernel crypto
library too, of course).

- The first problem remains the license. The kernel requires
GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't
up to me at all, unfortunately, so this stuff will have to be licensed
differently in order to be useful.



The rfc7748_precomputed library is now released under LGPLv2.1.
We are happy to see our code integrated in more projects.

Quoting "Jason A. Donenfeld" :

- It looks like the precomputation implementation is failing some unit
tests! Perhaps it's not properly reducing incoming public points?

There's the vector if you'd like to play with it. The other test
vectors I have do pass, though, which is good I suppose.


Thanks, for this observation. The code was missing to handle some carry bits,
producing incorrect outputs for numbers between 2p and 2^256. Now, I have
rewritten some operations for GF(2^255-19) considering all of these cases.
More tests were added and fuzz test against HACL implementation.

Code is available at:
  https://github.com/armfazh/rfc7748_precomputed  (commit c79ca5e...)

*Disclaimer: More test and work is needed for the GF(2^448-2^224-1)  
arithmetic.



On the plus side, the implementation is super fast:
With turbo on, on my E3-1505Mv5, I'm getting:

donna64: 121793 cycles per call
 hacl64: 109793 cycles per call
 fiat64: 108937 cycles per call
sandy2x: 103003 cycles per call
  amd64: 108688 cycles per call
precomp: 83391 cycles per call
 fiat32: 232835 cycles per call
donna32: 411511 cycles per call

The benchmark of your precomputation implementation has what's
referred to by medical doctors as "less digits".


Due to the bug's corrections, a slight loss of performance was observed;
however, other operations were optimized too counteracting the losses.
Let us know about your new measurements.



--
Armando Faz Hernández, PhD Candidate.
Instituto de Computação, Unicamp.
Campinas, Brasil.

___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-02-01 Thread Jason A. Donenfeld
Hi Armando,

I've started importing your precomputation implementation into kernel
space for use in kbench9000 (and in WireGuard and the kernel crypto
library too, of course).

- The first problem remains the license. The kernel requires
GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't
up to me at all, unfortunately, so this stuff will have to be licensed
differently in order to be useful.

- It looks like the precomputation implementation is failing some unit
tests! Perhaps it's not properly reducing incoming public points?

{
.private = { 1 },
.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
},
.result = { 0xb3, 0x2d, 0x13, 0x62, 0xc2, 0x48, 0xd6, 0x2f, 0xe6,
0x26, 0x19, 0xcf, 0xf0, 0x4d, 0xd4, 0x3d, 0xb7, 0x3f, 0xfc, 0x1b,
0x63, 0x8, 0xed, 0xe3, 0xb, 0x78, 0xd8, 0x73, 0x80, 0xf1, 0xe8, 0x34 }
}

[ 8855.567043] Expected: b3 2d 13 62 c2 48 d6 2f e6 26 19 cf f0 4d d4
3d  .-.b.H./.&...M.=
[ 8855.567044] Expected: b7 3f fc 1b 63 08 ed e3 0b 78 d8 73 80 f1 e8
34  .?..cx.s...4
[ 8855.567046] Actual: eb 1b 2b df 13 6a 3e bc 30 9f a4 f7 a1 95 a7 08
 ..+..j>.0...
[ 8855.567047] Actual: 11 7f 7c e4 6e 65 a4 44 48 22 4d 00 78 54 70 5b
 ..|.ne.DH"M.xTp[
[ 8855.567048] kbench9000: precomp self-test 4: FAIL

There's the vector if you'd like to play with it. The other test
vectors I have do pass, though, which is good I suppose.

On the plus side, the implementation is super fast:

With turbo on, on my E3-1505Mv5, I'm getting:

donna64: 121793 cycles per call
 hacl64: 109793 cycles per call
 fiat64: 108937 cycles per call
sandy2x: 103003 cycles per call
  amd64: 108688 cycles per call
precomp: 83391 cycles per call
 fiat32: 232835 cycles per call
donna32: 411511 cycles per call

The benchmark of your precomputation implementation has what's
referred to by medical doctors as "less digits".

Regards,
Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-02-01 Thread Jason A. Donenfeld
Hi Armando,

Sure, I'll have a look at this.

I've also found https://github.com/armfazh/hp-ecc-vec . Is this the
code related to your 2015 paper entitled, "Fast Implementation of
Curve25519 Using AVX2"? Or the presentation Dan mentioned a few posts
up? Or both at once?

Also, would you consider relicensing these as GPLv2 so that they can
be used in the Linux kernel? (Alternatively, BSD/MIT/Public Domain for
everyone else?)

Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-31 Thread Armando Faz Hernández


Quoting curves-requ...@moderncrypto.org:


Message: 1
Date: Wed, 31 Jan 2018 16:37:12 +0100
From: "Jason A. Donenfeld" 
To: curves 

I've loaded in fiat64 into the latest kbench curve testing branch, and
it seems to be the fastest generic C version, at least on my Skylake
laptop, inching out slightly in front of hacl64:



Hi Jason,

It would be interesting to see your benchmarks
using code in [1] derived from our SAC2017 paper [2].

[1] https://github.com/armfazh/rfc7748_precomputed
[2] How to (pre-)compute a ladder, SAC.
https://doi.org/10.1007/978-3-319-72565-9_9







--
Armando Faz Hernández, PhD Candidate.
Instituto de Computação, Unicamp.
Campinas, Brasil.

___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-31 Thread Jason A. Donenfeld
I've loaded in fiat64 into the latest kbench curve testing branch, and
it seems to be the fastest generic C version, at least on my Skylake
laptop, inching out slightly in front of hacl64:

donna64: 121790 cycles per call
hacl64: 109782 cycles per call
fiat64: 108984 cycles per call
sandy2x: 102996 cycles per call
 amd64: 108563 cycles per call
fiat32: 232826 cycles per call
donna32: 412092 cycles per call
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-27 Thread Jason A. Donenfeld
Hey Dan,

Thanks for the pointer and the link to the slides. I've heard about
this implementation before, but I was never able to get a hold of the
source to try it out. I just emailed him to see if it's available
somewhere. Looks like there's a conference paper from Latincrypt 2015
that describes its implementation.

Jason
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves


Re: [curves] new 25519 measurements of formally verified implementations

2018-01-26 Thread D. J. Bernstein
Tung Chou's sandy2x code was (as the name suggests) optimized for Sandy
Bridge. For Haswell and Skylake, the slides from Julio Lopez in

   https://hyperelliptic.org/tanja/lc17/ascrypto.html

report two followup implementations producing roughly 25% speedups for
Curve25519; see slide 67/83.

I do think that the hacl64 Curve25519 speeds are fast enough for pretty
much everybody, and verification is certainly a huge plus, but people
who want more speed should be aware of what's possible---and people
working on Curve25519 verification shouldn't think they're done yet!

---Dan
___
Curves mailing list
Curves@moderncrypto.org
https://moderncrypto.org/mailman/listinfo/curves