Re: [curves] new 25519 measurements of formally verified implementations
Quoting "Jason A. Donenfeld": Hi Armando, I've started importing your precomputation implementation into kernel space for use in kbench9000 (and in WireGuard and the kernel crypto library too, of course). - The first problem remains the license. The kernel requires GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't up to me at all, unfortunately, so this stuff will have to be licensed differently in order to be useful. The rfc7748_precomputed library is now released under LGPLv2.1. We are happy to see our code integrated in more projects. Quoting "Jason A. Donenfeld" : - It looks like the precomputation implementation is failing some unit tests! Perhaps it's not properly reducing incoming public points? There's the vector if you'd like to play with it. The other test vectors I have do pass, though, which is good I suppose. Thanks, for this observation. The code was missing to handle some carry bits, producing incorrect outputs for numbers between 2p and 2^256. Now, I have rewritten some operations for GF(2^255-19) considering all of these cases. More tests were added and fuzz test against HACL implementation. Code is available at: https://github.com/armfazh/rfc7748_precomputed (commit c79ca5e...) *Disclaimer: More test and work is needed for the GF(2^448-2^224-1) arithmetic. On the plus side, the implementation is super fast: With turbo on, on my E3-1505Mv5, I'm getting: donna64: 121793 cycles per call hacl64: 109793 cycles per call fiat64: 108937 cycles per call sandy2x: 103003 cycles per call amd64: 108688 cycles per call precomp: 83391 cycles per call fiat32: 232835 cycles per call donna32: 411511 cycles per call The benchmark of your precomputation implementation has what's referred to by medical doctors as "less digits". Due to the bug's corrections, a slight loss of performance was observed; however, other operations were optimized too counteracting the losses. Let us know about your new measurements. -- Armando Faz Hernández, PhD Candidate. Instituto de Computação, Unicamp. Campinas, Brasil. ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves
Re: [curves] new 25519 measurements of formally verified implementations
Hi Armando, I've started importing your precomputation implementation into kernel space for use in kbench9000 (and in WireGuard and the kernel crypto library too, of course). - The first problem remains the license. The kernel requires GPLv2-compatible code. GPLv3 isn't compatible with GPLv2. This isn't up to me at all, unfortunately, so this stuff will have to be licensed differently in order to be useful. - It looks like the precomputation implementation is failing some unit tests! Perhaps it's not properly reducing incoming public points? { .private = { 1 }, .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, .result = { 0xb3, 0x2d, 0x13, 0x62, 0xc2, 0x48, 0xd6, 0x2f, 0xe6, 0x26, 0x19, 0xcf, 0xf0, 0x4d, 0xd4, 0x3d, 0xb7, 0x3f, 0xfc, 0x1b, 0x63, 0x8, 0xed, 0xe3, 0xb, 0x78, 0xd8, 0x73, 0x80, 0xf1, 0xe8, 0x34 } } [ 8855.567043] Expected: b3 2d 13 62 c2 48 d6 2f e6 26 19 cf f0 4d d4 3d .-.b.H./.&...M.= [ 8855.567044] Expected: b7 3f fc 1b 63 08 ed e3 0b 78 d8 73 80 f1 e8 34 .?..cx.s...4 [ 8855.567046] Actual: eb 1b 2b df 13 6a 3e bc 30 9f a4 f7 a1 95 a7 08 ..+..j>.0... [ 8855.567047] Actual: 11 7f 7c e4 6e 65 a4 44 48 22 4d 00 78 54 70 5b ..|.ne.DH"M.xTp[ [ 8855.567048] kbench9000: precomp self-test 4: FAIL There's the vector if you'd like to play with it. The other test vectors I have do pass, though, which is good I suppose. On the plus side, the implementation is super fast: With turbo on, on my E3-1505Mv5, I'm getting: donna64: 121793 cycles per call hacl64: 109793 cycles per call fiat64: 108937 cycles per call sandy2x: 103003 cycles per call amd64: 108688 cycles per call precomp: 83391 cycles per call fiat32: 232835 cycles per call donna32: 411511 cycles per call The benchmark of your precomputation implementation has what's referred to by medical doctors as "less digits". Regards, Jason ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves
Re: [curves] new 25519 measurements of formally verified implementations
Hi Armando, Sure, I'll have a look at this. I've also found https://github.com/armfazh/hp-ecc-vec . Is this the code related to your 2015 paper entitled, "Fast Implementation of Curve25519 Using AVX2"? Or the presentation Dan mentioned a few posts up? Or both at once? Also, would you consider relicensing these as GPLv2 so that they can be used in the Linux kernel? (Alternatively, BSD/MIT/Public Domain for everyone else?) Jason ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves
Re: [curves] new 25519 measurements of formally verified implementations
I've loaded in fiat64 into the latest kbench curve testing branch, and it seems to be the fastest generic C version, at least on my Skylake laptop, inching out slightly in front of hacl64: donna64: 121790 cycles per call hacl64: 109782 cycles per call fiat64: 108984 cycles per call sandy2x: 102996 cycles per call amd64: 108563 cycles per call fiat32: 232826 cycles per call donna32: 412092 cycles per call ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves
Re: [curves] new 25519 measurements of formally verified implementations
Hey Dan, Thanks for the pointer and the link to the slides. I've heard about this implementation before, but I was never able to get a hold of the source to try it out. I just emailed him to see if it's available somewhere. Looks like there's a conference paper from Latincrypt 2015 that describes its implementation. Jason ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves
Re: [curves] new 25519 measurements of formally verified implementations
Tung Chou's sandy2x code was (as the name suggests) optimized for Sandy Bridge. For Haswell and Skylake, the slides from Julio Lopez in https://hyperelliptic.org/tanja/lc17/ascrypto.html report two followup implementations producing roughly 25% speedups for Curve25519; see slide 67/83. I do think that the hacl64 Curve25519 speeds are fast enough for pretty much everybody, and verification is certainly a huge plus, but people who want more speed should be aware of what's possible---and people working on Curve25519 verification shouldn't think they're done yet! ---Dan ___ Curves mailing list Curves@moderncrypto.org https://moderncrypto.org/mailman/listinfo/curves