Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.
On Thu, Dec 11, 2014 at 3:30 PM, Adam Langley a...@google.com wrote: Thanks. So far that version is good to ~1B random tests. I'll leave it going until Monday. It's good for ~6B random tests. Of course, that's not as compelling for 64-bit code as it would be for 32-bit, but I think we can safely say that there aren't any low-hanging bugs left. Thanks Andy. Cheers AGL ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.
On Wed, Dec 10, 2014 at 10:05 AM, Andy Polyakov via RT r...@openssl.org wrote: Patching went wrong for you. As you seem to operate in 1.0.2 context attached is corresponding ecp_nistz256.pl. Thanks. So far that version is good to ~1B random tests. I'll leave it going until Monday. Cheers AGL ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.
On Fri, Dec 5, 2014 at 6:33 AM, Andy Polyakov via RT r...@openssl.org wrote: Attached. A little bit worse performance on some CPUs. I also took opportunity to harmonize ecp_nistz256_from_mont by applying same pattern for reduction. The patch is cumulative, i.e. is not incremental to previously posted one[s], and addresses both problems, originally reported one and discovered in the course. Patch to ecp_nistz256.c referred above doesn't matter. When applying just that patch, the original test case fails. Specially this test code (C++): BIGNUM *n = nullptr, *X = nullptr, *Y = nullptr, *Z = nullptr; BIGNUM *x = BN_new(); BIGNUM *y = BN_new(); ASSERT_NE(BN_hex2bn(n, 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD), 0); ASSERT_NE(BN_hex2bn(X, C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F), 0); ASSERT_NE(BN_hex2bn(Y, 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328), 0); ASSERT_NE(BN_hex2bn(Z, F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631), 0); EC_GROUP *group = EC_GROUP_new_by_curve_name(NID_X9_62_prime256v1); EC_POINT *p = EC_POINT_new(group); BN_CTX *ctx = BN_CTX_new(); ASSERT_EQ(1, EC_POINT_set_Jprojective_coordinates_GFp(group, p, X, Y, Z, ctx)); EC_POINT *r = EC_POINT_new(group); // Set r = 핡×n. ASSERT_EQ(1, EC_POINT_mul(group, r, NULL, p, n, ctx)); ASSERT_EQ(1, EC_POINT_get_affine_coordinates_GFp(group, r, x, y, ctx)); char *x_out = BN_bn2hex(x); char *y_out = BN_bn2hex(y); EXPECT_STREQ(C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863, x_out); EXPECT_STREQ(C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF, y_out); free(x_out); free(y_out); BN_free(x); BN_free(y); BN_free(X); BN_free(Y); BN_free(Z); BN_free(n); EC_POINT_free(r); EC_POINT_free(p); BN_CTX_free(ctx); EC_GROUP_free(group); Just to check that I'm not doing anything stupid (which is always a distinct possibility), here are the .pl[1] and resulting .s[2] file that I ended up with. [1] https://drive.google.com/file/d/0B_OzbbAp1CG5OVdVc196QmV3bG8/view?usp=sharing [2] https://drive.google.com/file/d/0B_OzbbAp1CG5Z3NoZzBqU09scFE/view?usp=sharing Cheers AGL ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.
Attached. A little bit worse performance on some CPUs. I also took opportunity to harmonize ecp_nistz256_from_mont by applying same pattern for reduction. The patch is cumulative, i.e. is not incremental to previously posted one[s], and addresses both problems, originally reported one and discovered in the course. Patch to ecp_nistz256.c referred above doesn't matter. When applying just that patch, the original test case fails. Specially this test code (C++): ... Just to check that I'm not doing anything stupid (which is always a distinct possibility), here are the .pl[1] and resulting .s[2] file that I ended up with. [1] https://drive.google.com/file/d/0B_OzbbAp1CG5OVdVc196QmV3bG8/view?usp=sharing [2] https://drive.google.com/file/d/0B_OzbbAp1CG5Z3NoZzBqU09scFE/view?usp=sharing Patching went wrong for you. As you seem to operate in 1.0.2 context attached is corresponding ecp_nistz256.pl. #!/usr/bin/env perl ## ## # Copyright 2014 Intel Corporation # ## # Licensed under the Apache License, Version 2.0 (the License);# # you may not use this file except in compliance with the License. # # You may obtain a copy of the License at# ## #http://www.apache.org/licenses/LICENSE-2.0 # ## # Unless required by applicable law or agreed to in writing, software# # distributed under the License is distributed on an AS IS BASIS, # # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # # See the License for the specific language governing permissions and# # limitations under the License. # ## ## ## # Developers and authors: # # Shay Gueron (1, 2), and Vlad Krasnov (1) # # (1) Intel Corporation, Israel Development Center # # (2) University of Haifa # # Reference:# # S.Gueron and V.Krasnov, Fast Prime Field Elliptic Curve Cryptography with# # 256 Bit Primes # ## ## # Further optimization by ap...@openssl.org: # # this/original # Opteron +12-49% # Bulldozer +14-45% # P4 +18-46% # Westmere +12-34% # Sandy Bridge +9-35% # Ivy Bridge +9-35% # Haswell +8-37% # Broadwell +18-58% # Atom +15-50% # VIA Nano +43-160% # # Ranges denote minimum and maximum improvement coefficients depending # on benchmark. $flavour = shift; $output = shift; if ($flavour =~ /\./) { $output = $flavour; undef $flavour; } $win64=0; $win64=1 if ($flavour =~ /[nm]asm|mingw64/ || $output =~ /\.asm$/); $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; ( $xlate=${dir}x86_64-xlate.pl and -f $xlate ) or ( $xlate=${dir}../../perlasm/x86_64-xlate.pl and -f $xlate) or die can't locate x86_64-xlate.pl; open OUT,| \$^X\ $xlate $flavour $output; *STDOUT=*OUT; if (`$ENV{CC} -Wa,-v -c -o /dev/null -x assembler /dev/null 21` =~ /GNU assembler version ([2-9]\.[0-9]+)/) { $avx = ($1=2.19) + ($1=2.22); $addx = ($1=2.23); } if (!$addx $win64 ($flavour =~ /nasm/ || $ENV{ASM} =~ /nasm/) `nasm -v 21` =~ /NASM version ([2-9]\.[0-9]+)/) { $avx = ($1=2.09) + ($1=2.10); $addx = ($1=2.10); } if (!$addx $win64 ($flavour =~ /masm/ || $ENV{ASM} =~ /ml64/) `ml64 21` =~ /Version ([0-9]+)\./) { $avx = ($1=10) + ($1=11); $addx = ($1=12); } if (!$addx `$ENV{CC} -v 21` =~ /(^clang version|based on LLVM) ([3-9])\.([0-9]+)/) { my $ver = $2 + $3/100.0; # 3.1-3.01, 3.10-3.10 $avx = ($ver=3.0) + ($ver=3.01); $addx = ($ver=3.03); } $code.=___; .text .extern OPENSSL_ia32cap_P # The polynomial .align 64 .Lpoly: .quad 0x, 0x, 0x, 0x0001 # 2^512 mod P precomputed for NIST P256 polynomial .LRR: .quad 0x0003, 0xfffb, 0xfffe, 0x0004fffd .LOne: .long 1,1,1,1,1,1,1,1 .LTwo: .long 2,2,2,2,2,2,2,2 .LThree: .long 3,3,3,3,3,3,3,3 .LONE_mont: .quad 0x0001, 0x,
Re: [openssl.org #3607] nistz256 is broken.
Oops! Wrong patch! Correct one attached. If you feel like testing the wrong one, go ahead, but there are some later non-essential adjustments. diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c index bf3fcc6..33b07ce 100644 --- a/crypto/ec/ecp_nistz256.c +++ b/crypto/ec/ecp_nistz256.c @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * group, ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]); ecp_nistz256_point_add (row[15 - 1], row[14 - 1], row[1 - 1]); ecp_nistz256_point_add (row[11 - 1], row[10 - 1], row[1 - 1]); -ecp_nistz256_point_add (row[16 - 1], row[15 - 1], row[1 - 1]); +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]); } index = 255; I can believe that this fixes the issue, but it's just masking it, no? The underlying problem is that assembly routines return partially reduced results. Partially reduced means that it can return result + modulus if it fits in 256 bits. Rationale is that ((x+m)*y)%m=(x*y+m*y)%m=x*y%m+m*y%m and last term is 0. While it does work with series of multiplications, I failed to recognize that there are corner cases in non-multiplication operations. I'm preparing an update... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
Oops! Wrong patch! Correct one attached. If you feel like testing the wrong one, go ahead, but there are some later non-essential adjustments. diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c index bf3fcc6..33b07ce 100644 --- a/crypto/ec/ecp_nistz256.c +++ b/crypto/ec/ecp_nistz256.c @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * group, ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]); ecp_nistz256_point_add (row[15 - 1], row[14 - 1], row[1 - 1]); ecp_nistz256_point_add (row[11 - 1], row[10 - 1], row[1 - 1]); -ecp_nistz256_point_add (row[16 - 1], row[15 - 1], row[1 - 1]); +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]); } index = 255; I can believe that this fixes the issue, but it's just masking it, no? The underlying problem is that assembly routines return partially reduced results. Partially reduced means that it can return result + modulus if it fits in 256 bits. Rationale is that ((x+m)*y)%m=(x*y+m*y)%m=x*y%m+m*y%m and last term is 0. While it does work with series of multiplications, I failed to recognize that there are corner cases in non-multiplication operations. I'm preparing an update... Attached. A little bit worse performance on some CPUs. I also took opportunity to harmonize ecp_nistz256_from_mont by applying same pattern for reduction. The patch is cumulative, i.e. is not incremental to previously posted one[s], and addresses both problems, originally reported one and discovered in the course. Patch to ecp_nistz256.c referred above doesn't matter. diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl index 4486a5e..cdff22a 100755 --- a/crypto/ec/asm/ecp_nistz256-x86_64.pl +++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl @@ -31,15 +31,16 @@ # Further optimization by ap...@openssl.org: # # this/original -# Opteron +8-33% -# Bulldozer +10-30% -# P4 +14-38% -# Westmere +8-23% -# Sandy Bridge +8-24% -# Ivy Bridge +7-25% -# Haswell +5-25% -# Atom +10-32% -# VIA Nano +37-130% +# Opteron +12-49% +# Bulldozer +14-45% +# P4 +18-46% +# Westmere +12-34% +# Sandy Bridge +9-35% +# Ivy Bridge +9-35% +# Haswell +8-37% +# Broadwell +18-58% +# Atom +15-50% +# VIA Nano +43-160% # # Ranges denote minimum and maximum improvement coefficients depending # on benchmark. Lower coefficients are for ECDSA sign, relatively @@ -550,28 +551,20 @@ __ecp_nistz256_mul_montq: # and add the result to the acc. # Due to the special form of p256 we do some optimizations # - # acc[0] x p256[0] = acc[0] x 2^64 - acc[0] - # then we add acc[0] and get acc[0] x 2^64 - - mulq $poly1 - xor $t0, $t0 - add $acc0, $acc1 # +=acc[0]*2^64 - adc \$0, %rdx - add %rax, $acc1 - mov $acc0, %rax - - # acc[0] x p256[2] = 0 - adc %rdx, $acc2 - adc \$0, $t0 + # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0] + # then we add acc[0] and get acc[0] x 2^96 + mov $acc0, $t1 + shl \$32, $acc0 mulq $poly3 - xor $acc0, $acc0 - add $t0, $acc3 - adc \$0, %rdx - add %rax, $acc3 + shr \$32, $t1 + add $acc0, $acc1 # +=acc[0]96 + adc $t1, $acc2 + adc %rax, $acc3 mov 8*1($b_ptr), %rax adc %rdx, $acc4 adc \$0, $acc5 + xor $acc0, $acc0 # Multiply by b[1] @@ -608,23 +601,17 @@ __ecp_nistz256_mul_montq: # Second reduction step - mulq $poly1 - xor $t0, $t0 - add $acc1, $acc2 - adc \$0, %rdx - add %rax, $acc2 - mov $acc1, %rax - adc %rdx, $acc3 - adc \$0, $t0 - + mov $acc1, $t1 + shl \$32, $acc1 mulq $poly3 - xor $acc1, $acc1 - add $t0, $acc4 - adc \$0, %rdx - add %rax, $acc4 + shr \$32, $t1 + add $acc1, $acc2 + adc $t1, $acc3 + adc %rax, $acc4 mov 8*2($b_ptr), %rax adc %rdx, $acc5 adc \$0, $acc0 + xor $acc1, $acc1 # Multiply by b[2] @@ -661,23 +648,17 @@ __ecp_nistz256_mul_montq: # Third reduction step - mulq $poly1 - xor $t0, $t0 - add $acc2, $acc3 - adc \$0, %rdx - add %rax, $acc3 - mov $acc2, %rax - adc %rdx, $acc4 - adc \$0, $t0 - + mov $acc2, $t1 + shl \$32, $acc2 mulq $poly3 - xor $acc2, $acc2 - add $t0, $acc5 - adc \$0, %rdx - add %rax, $acc5 + shr \$32, $t1 + add $acc2, $acc3 + adc $t1, $acc4 + adc %rax, $acc5 mov 8*3($b_ptr), %rax adc %rdx, $acc0 adc \$0, $acc1 + xor $acc2, $acc2 # Multiply by b[3] @@ -714,20 +695,14 @@ __ecp_nistz256_mul_montq: # Final reduction step - mulq $poly1 - #xor $t0, $t0 - add $acc3, $acc4 - adc \$0, %rdx - add %rax, $acc4 - mov $acc3, %rax - adc %rdx, $acc5 - #adc \$0, $t0 # doesn't overflow - + mov $acc3, $t1 + shl \$32, $acc3 mulq $poly3 -
Re: [openssl.org #3607] nistz256 is broken.
Oops! Wrong patch! Correct one attached. If you feel like testing the wrong one, go ahead, but there are some later non-essential adjustments. diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c index bf3fcc6..33b07ce 100644 --- a/crypto/ec/ecp_nistz256.c +++ b/crypto/ec/ecp_nistz256.c @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * group, ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]); ecp_nistz256_point_add (row[15 - 1], row[14 - 1], row[1 - 1]); ecp_nistz256_point_add (row[11 - 1], row[10 - 1], row[1 - 1]); -ecp_nistz256_point_add (row[16 - 1], row[15 - 1], row[1 - 1]); +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]); } index = 255; I can believe that this fixes the issue, but it's just masking it, no? It's not a coincidence that I didn't say fixes the issue or solves the problem, but produces correct result. BTW, it seems to be unrelated to the original problem with carries handling in assembly. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. I've run out of time tracking this down for today, but I got to the point where setting the Jacobian coordinates: X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631 and multiplying that point by 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD results in the affine point: x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7 However, I believe that the result should be: x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF I do get the latter... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. I've run out of time tracking this down for today, but I got to the point where setting the Jacobian coordinates: X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631 and multiplying that point by 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD results in the affine point: x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7 However, I believe that the result should be: x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF I do get the latter... ... in master, and I get the former in 1.0.2. Looking into it... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. I've run out of time tracking this down for today, but I got to the point where setting the Jacobian coordinates: X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631 and multiplying that point by 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD results in the affine point: x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7 However, I believe that the result should be: x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF I do get the latter... ... in master, and I get the former in 1.0.2. Looking into it... Attached patch produces correct result in 1.0.2. Looking further for explanation... commit 9ca10dca3ca6bc5ea739a998258351dbfd261b42 Author: Andy Polyakov ap...@openssl.org Date: Sun Oct 26 12:45:47 2014 +0100 Add ARMv4 ECP_NISTZ256 implementation. diff --git a/Configure b/Configure index 2eda5e6..777e8cf 100755 --- a/Configure +++ b/Configure @@ -138,7 +138,7 @@ my $alpha_asm=alphacpuid.o:bn_asm.o alpha-mont.o::sha1-alpha.o:::ghash- my $mips64_asm=:bn-mips.o mips-mont.o:::aes_cbc.o aes-mips.o:::sha1-mips.o sha256-mips.o sha512-mips.o; my $mips32_asm=$mips64_asm; $mips32_asm =~ s/\s*sha512\-mips\.o//; my $s390x_asm=s390xcap.o s390xcpuid.o:bn-s390x.o s390x-mont.o s390x-gf2m.o:::aes-s390x.o aes-ctr.o aes-xts.o:::sha1-s390x.o sha256-s390x.o sha512-s390x.o::rc4-s390x.o:ghash-s390x.o:; -my $armv4_asm=armcap.o armv4cpuid.o:bn_asm.o armv4-mont.o armv4-gf2m.o:::aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o:::sha1-armv4-large.o sha256-armv4.o sha512-armv4.o:::ghash-armv4.o ghashv8-armx.o::void; +my $armv4_asm=armcap.o armv4cpuid.o:bn_asm.o armv4-mont.o armv4-gf2m.o:ecp_nistz256.o ecp_nistz256-armv4.o::aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o:::sha1-armv4-large.o sha256-armv4.o sha512-armv4.o:::ghash-armv4.o ghashv8-armx.o::void; my $aarch64_asm=armcap.o arm64cpuid.o mem_clr.oaes_core.o aes_cbc.o aesv8-armx.o:::sha1-armv8.o sha256-armv8.o sha512-armv8.o:::ghashv8-armx.o:; my $parisc11_asm=pariscid.o:bn_asm.o parisc-mont.o:::aes_core.o aes_cbc.o aes-parisc.o:::sha1-parisc.o sha256-parisc.o sha512-parisc.o::rc4-parisc.o:ghash-parisc.o::32; my $parisc20_asm=pariscid.o:pa-risc2W.o parisc-mont.o:::aes_core.o aes_cbc.o aes-parisc.o:::sha1-parisc.o sha256-parisc.o sha512-parisc.o::rc4-parisc.o:ghash-parisc.o::64; diff --git a/TABLE b/TABLE index d778dac..b1d4543 100644 --- a/TABLE +++ b/TABLE @@ -1132,7 +1132,7 @@ $lflags = -ldl $bn_ops = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR $cpuid_obj= armcap.o armv4cpuid.o $bn_obj = bn_asm.o armv4-mont.o armv4-gf2m.o -$ec_obj = +$ec_obj = ecp_nistz256.o ecp_nistz256-armv4.o $des_obj = $aes_obj = aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o $bf_obj = @@ -4362,7 +4362,7 @@ $lflags = -ldl $bn_ops = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR $cpuid_obj= armcap.o armv4cpuid.o $bn_obj = bn_asm.o armv4-mont.o armv4-gf2m.o -$ec_obj = +$ec_obj = ecp_nistz256.o ecp_nistz256-armv4.o $des_obj = $aes_obj = aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o $bf_obj = diff --git a/crypto/ec/Makefile b/crypto/ec/Makefile index 898e43d..e020c93 100644 --- a/crypto/ec/Makefile +++ b/crypto/ec/Makefile @@ -54,6 +54,9 @@ ecp_nistz256-x86_64.s: asm/ecp_nistz256-x86_64.pl ecp_nistz256-avx2.s: asm/ecp_nistz256-avx2.pl $(PERL) asm/ecp_nistz256-avx2.pl $(PERLASM_SCHEME) $@ +ecp_nistz256-%.S: asm/ecp_nistz256-%.pl; $(PERL) $ $(PERLASM_SCHEME) $@ +ecp_nistz256-armv4.o: ecp_nistz256-armv4.S + files: $(PERL) $(TOP)/util/files.pl Makefile $(TOP)/MINFO diff --git a/crypto/ec/asm/ecp_nistz256-armv4.pl b/crypto/ec/asm/ecp_nistz256-armv4.pl new file mode 100755 index 000..ad29948 --- /dev/null +++ b/crypto/ec/asm/ecp_nistz256-armv4.pl @@ -0,0 +1,1720 @@ +#!/usr/bin/env perl + +# +# Written by Andy Polyakov ap...@openssl.org for the OpenSSL +# project. The module is, however, dual licensed under OpenSSL and +# CRYPTOGAMS licenses depending on where you obtain it. For further +# details see http://www.openssl.org/~appro/cryptogams/. +# +# +# ECP_NISTZ256 module for ARMv4. +# +# October 2014. +# +# Original ECP_NISTZ256 submission targeting x86_64 is detailed in +# http://eprint.iacr.org/2013/816. In the process of adaptation +# original .c module was made 32-bit savvy in
Re: [openssl.org #3607] nistz256 is broken.
thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. I've run out of time tracking this down for today, but I got to the point where setting the Jacobian coordinates: X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631 and multiplying that point by 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD results in the affine point: x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7 However, I believe that the result should be: x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF I do get the latter... ... in master, and I get the former in 1.0.2. Looking into it... Attached patch produces correct result in 1.0.2. Looking further for explanation... Oops! Wrong patch! Correct one attached. If you feel like testing the wrong one, go ahead, but there are some later non-essential adjustments. diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c index bf3fcc6..33b07ce 100644 --- a/crypto/ec/ecp_nistz256.c +++ b/crypto/ec/ecp_nistz256.c @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * group, ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]); ecp_nistz256_point_add (row[15 - 1], row[14 - 1], row[1 - 1]); ecp_nistz256_point_add (row[11 - 1], row[10 - 1], row[1 - 1]); -ecp_nistz256_point_add (row[16 - 1], row[15 - 1], row[1 - 1]); +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]); } index = 255;
Re: [openssl.org #3607] nistz256 is broken.
2. When will RT2574 be integrated to protect our ECC keys in the inevitable presence of software defects like this? http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest Reportedly, Cryptography Research (i.e., Rambus) alleges to have broad patents on techniques like this (and they might not be the only ones). I'm not going to look for specific patents and can't assess the validity of that rumor, the only thing I know for certain is that Cryptography Research and Rambus are famous, above all else, for starting patent lawsuits (see, e.g., http://www.sec.gov/Archives/edgar/data/1403161/000119312507270394/d10k.htm). Unfortunately, this means that the OpenSSL project may not be willing to incorporate coordinate-blinding techniques at this time. Bodo
Re: [openssl.org #3607] nistz256 is broken.
On Wed, Dec 3, 2014 at 10:12 AM, Andy Polyakov via RT r...@openssl.org wrote: Oops! Wrong patch! Correct one attached. If you feel like testing the wrong one, go ahead, but there are some later non-essential adjustments. diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c index bf3fcc6..33b07ce 100644 --- a/crypto/ec/ecp_nistz256.c +++ b/crypto/ec/ecp_nistz256.c @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * group, ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]); ecp_nistz256_point_add (row[15 - 1], row[14 - 1], row[1 - 1]); ecp_nistz256_point_add (row[11 - 1], row[10 - 1], row[1 - 1]); -ecp_nistz256_point_add (row[16 - 1], row[15 - 1], row[1 - 1]); +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]); } index = 255; I can believe that this fixes the issue, but it's just masking it, no? I'll see if I can track it down more precisely tomorrow. Cheers AGL __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
On Mon, Dec 1, 2014 at 3:23 PM, Andy Polyakov via RT r...@openssl.org wrote: (Affects 1.0.2 only.) In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq, under Now the reduction there are a number of comments saying doesn't overflow. Unfortunately, they aren't correct. Got math wrong:-( Attached is not only fixed version, but even faster one. Please test attached one instead. Squaring didn't cover one case, and AD*X is optimized. thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. Cheers AGL On related note. It's possible to improve server-side DSA by ~5% by switching [back] to scatter-gather. [Change from scatter-gather was caused by concern about timing dependency, but I argue that concern is not valid in most cases.] There also are x86 and and ARM versions pending: # with/without -DECP_NISTZ256_ASM # Pentium +66-168% # PIII +73-175% # P4+68-140% # Core2 +90-215% # Sandy Bridge +105-265% (contemporary i[57]-* are all close to this) # Atom +66-160% # Opteron +54-112% # Bulldozer +99-240% # VIA Nano +93-300% # with/without -DECP_NISTZ256_ASM # Cortex-A8 +53-173% # Cortex-A9 +76-205% # Cortex-A15+100-316% # Snapdragon S4 +66-187% No, bug in question is not there. Nor is AD*X code path is affected. diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl index 4486a5e..56f6c2b 100755 --- a/crypto/ec/asm/ecp_nistz256-x86_64.pl +++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl @@ -31,15 +31,15 @@ # Further optimization by ap...@openssl.org: # # this/original -# Opteron +8-33% -# Bulldozer+10-30% -# P4 +14-38% -# Westmere +8-23% -# Sandy Bridge +8-24% -# Ivy Bridge +7-25% -# Haswell +5-25% -# Atom +10-32% -# VIA Nano +37-130% +# Opteron +10-43% +# Bulldozer+14-43% +# P4 +18-50% +# Westmere +12-36% +# Sandy Bridge +9-36% +# Ivy Bridge +9-36% +# Haswell +8-37% +# Atom +15-50% +# VIA Nano +43-160% # # Ranges denote minimum and maximum improvement coefficients depending # on benchmark. Lower coefficients are for ECDSA sign, relatively @@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq: # and add the result to the acc. # Due to the special form of p256 we do some optimizations # - # acc[0] x p256[0] = acc[0] x 2^64 - acc[0] - # then we add acc[0] and get acc[0] x 2^64 - - mulq$poly1 - xor $t0, $t0 - add $acc0, $acc1# +=acc[0]*2^64 - adc \$0, %rdx - add %rax, $acc1 - mov $acc0, %rax - - # acc[0] x p256[2] = 0 - adc %rdx, $acc2 - adc \$0, $t0 + # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0] + # then we add acc[0] and get acc[0] x 2^96 + mov $acc0, $t1 + shl \$32, $acc0 mulq$poly3 - xor $acc0, $acc0 - add $t0, $acc3 - adc \$0, %rdx - add %rax, $acc3 + shr \$32, $t1 + add $acc0, $acc1# +=acc[0]96 + adc $t1, $acc2 + adc %rax, $acc3 mov8*1($b_ptr), %rax adc %rdx, $acc4 adc \$0, $acc5 + xor $acc0, $acc0 # Multiply by b[1] @@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq: # Second reduction step - mulq$poly1 - xor $t0, $t0 - add $acc1, $acc2 - adc \$0, %rdx - add %rax, $acc2 - mov $acc1, %rax - adc %rdx, $acc3 - adc \$0, $t0 - + mov $acc1, $t1 + shl \$32, $acc1 mulq$poly3 - xor $acc1, $acc1 - add $t0, $acc4 - adc \$0, %rdx - add %rax, $acc4 + shr \$32, $t1 + add $acc1, $acc2 + adc $t1, $acc3 + adc %rax, $acc4 mov8*2($b_ptr), %rax adc %rdx, $acc5 adc \$0, $acc0 + xor $acc1, $acc1 # Multiply by b[2] @@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq: # Third reduction step - mulq$poly1 - xor $t0, $t0 - add $acc2, $acc3 - adc \$0, %rdx - add %rax, $acc3 - mov $acc2, %rax - adc %rdx, $acc4 - adc \$0, $t0 - + mov $acc2, $t1 + shl
Re: [openssl.org #3607] nistz256 is broken.
On Tue, Dec 2, 2014 at 12:33 PM, Adam Langley a...@google.com wrote: thanks! Was away last week and so didn't have a chance to try fixing this. I'll patch that it and run the tests against it. I've run out of time tracking this down for today, but I got to the point where setting the Jacobian coordinates: X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631 and multiplying that point by 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD results in the affine point: x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7 However, I believe that the result should be: x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF Cheers AGL __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
(Affects 1.0.2 only.) In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq, under Now the reduction there are a number of comments saying doesn't overflow. Unfortunately, they aren't correct. Got math wrong:-( Attached is not only fixed version, but even faster one. On related note. It's possible to improve server-side DSA by ~5% by switching [back] to scatter-gather. [Change from scatter-gather was caused by concern about timing dependency, but I argue that concern is not valid in most cases.] There also are x86 and and ARM versions pending: # with/without -DECP_NISTZ256_ASM # Pentium +66-168% # PIII +73-175% # P4+68-140% # Core2 +90-215% # Sandy Bridge +105-265% (contemporary i[57]-* are all close to this) # Atom +66-160% # Opteron +54-112% # Bulldozer +99-240% # VIA Nano +93-300% # with/without -DECP_NISTZ256_ASM # Cortex-A8 +53-173% # Cortex-A9 +76-205% # Cortex-A15+100-316% # Snapdragon S4 +66-187% No, bug in question is not there. Nor is AD*X code path is affected. diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl index 4486a5e..f328b85 100755 --- a/crypto/ec/asm/ecp_nistz256-x86_64.pl +++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl @@ -31,15 +31,15 @@ # Further optimization by ap...@openssl.org: # # this/original -# Opteron +8-33% -# Bulldozer+10-30% -# P4 +14-38% -# Westmere +8-23% -# Sandy Bridge +8-24% -# Ivy Bridge +7-25% -# Haswell +5-25% -# Atom +10-32% -# VIA Nano +37-130% +# Opteron +10-43% +# Bulldozer+14-43% +# P4 +18-50% +# Westmere +12-36% +# Sandy Bridge +9-36% +# Ivy Bridge +9-36% +# Haswell +8-37% +# Atom +15-50% +# VIA Nano +43-160% # # Ranges denote minimum and maximum improvement coefficients depending # on benchmark. Lower coefficients are for ECDSA sign, relatively @@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq: # and add the result to the acc. # Due to the special form of p256 we do some optimizations # - # acc[0] x p256[0] = acc[0] x 2^64 - acc[0] - # then we add acc[0] and get acc[0] x 2^64 - - mulq$poly1 - xor $t0, $t0 - add $acc0, $acc1# +=acc[0]*2^64 - adc \$0, %rdx - add %rax, $acc1 - mov $acc0, %rax - - # acc[0] x p256[2] = 0 - adc %rdx, $acc2 - adc \$0, $t0 + # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0] + # then we add acc[0] and get acc[0] x 2^96 + mov $acc0, $t1 + shl \$32, $acc0 mulq$poly3 - xor $acc0, $acc0 - add $t0, $acc3 - adc \$0, %rdx - add %rax, $acc3 + shr \$32, $t1 + add $acc0, $acc1# +=acc[0]96 + adc $t1, $acc2 + adc %rax, $acc3 mov8*1($b_ptr), %rax adc %rdx, $acc4 adc \$0, $acc5 + xor $acc0, $acc0 # Multiply by b[1] @@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq: # Second reduction step - mulq$poly1 - xor $t0, $t0 - add $acc1, $acc2 - adc \$0, %rdx - add %rax, $acc2 - mov $acc1, %rax - adc %rdx, $acc3 - adc \$0, $t0 - + mov $acc1, $t1 + shl \$32, $acc1 mulq$poly3 - xor $acc1, $acc1 - add $t0, $acc4 - adc \$0, %rdx - add %rax, $acc4 + shr \$32, $t1 + add $acc1, $acc2 + adc $t1, $acc3 + adc %rax, $acc4 mov8*2($b_ptr), %rax adc %rdx, $acc5 adc \$0, $acc0 + xor $acc1, $acc1 # Multiply by b[2] @@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq: # Third reduction step - mulq$poly1 - xor $t0, $t0 - add $acc2, $acc3 - adc \$0, %rdx - add %rax, $acc3 - mov $acc2, %rax - adc %rdx, $acc4 - adc \$0, $t0 - + mov $acc2, $t1 + shl \$32, $acc2 mulq$poly3 - xor $acc2, $acc2 - add $t0, $acc5 - adc \$0, %rdx - add %rax, $acc5 + shr \$32, $t1 + add $acc2, $acc3 + adc $t1, $acc4 + adc %rax, $acc5 mov8*3($b_ptr), %rax adc %rdx, $acc0 adc \$0, $acc1 + xor $acc2, $acc2
Re: [openssl.org #3607] nistz256 is broken.
(Affects 1.0.2 only.) In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq, under Now the reduction there are a number of comments saying doesn't overflow. Unfortunately, they aren't correct. Got math wrong:-( Attached is not only fixed version, but even faster one. Please test attached one instead. Squaring didn't cover one case, and AD*X is optimized. On related note. It's possible to improve server-side DSA by ~5% by switching [back] to scatter-gather. [Change from scatter-gather was caused by concern about timing dependency, but I argue that concern is not valid in most cases.] There also are x86 and and ARM versions pending: # with/without -DECP_NISTZ256_ASM # Pentium +66-168% # PIII +73-175% # P4+68-140% # Core2 +90-215% # Sandy Bridge +105-265% (contemporary i[57]-* are all close to this) # Atom +66-160% # Opteron +54-112% # Bulldozer +99-240% # VIA Nano +93-300% # with/without -DECP_NISTZ256_ASM # Cortex-A8 +53-173% # Cortex-A9 +76-205% # Cortex-A15+100-316% # Snapdragon S4 +66-187% No, bug in question is not there. Nor is AD*X code path is affected. diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl index 4486a5e..56f6c2b 100755 --- a/crypto/ec/asm/ecp_nistz256-x86_64.pl +++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl @@ -31,15 +31,15 @@ # Further optimization by ap...@openssl.org: # # this/original -# Opteron +8-33% -# Bulldozer+10-30% -# P4 +14-38% -# Westmere +8-23% -# Sandy Bridge +8-24% -# Ivy Bridge +7-25% -# Haswell +5-25% -# Atom +10-32% -# VIA Nano +37-130% +# Opteron +10-43% +# Bulldozer+14-43% +# P4 +18-50% +# Westmere +12-36% +# Sandy Bridge +9-36% +# Ivy Bridge +9-36% +# Haswell +8-37% +# Atom +15-50% +# VIA Nano +43-160% # # Ranges denote minimum and maximum improvement coefficients depending # on benchmark. Lower coefficients are for ECDSA sign, relatively @@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq: # and add the result to the acc. # Due to the special form of p256 we do some optimizations # - # acc[0] x p256[0] = acc[0] x 2^64 - acc[0] - # then we add acc[0] and get acc[0] x 2^64 - - mulq$poly1 - xor $t0, $t0 - add $acc0, $acc1# +=acc[0]*2^64 - adc \$0, %rdx - add %rax, $acc1 - mov $acc0, %rax - - # acc[0] x p256[2] = 0 - adc %rdx, $acc2 - adc \$0, $t0 + # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0] + # then we add acc[0] and get acc[0] x 2^96 + mov $acc0, $t1 + shl \$32, $acc0 mulq$poly3 - xor $acc0, $acc0 - add $t0, $acc3 - adc \$0, %rdx - add %rax, $acc3 + shr \$32, $t1 + add $acc0, $acc1# +=acc[0]96 + adc $t1, $acc2 + adc %rax, $acc3 mov8*1($b_ptr), %rax adc %rdx, $acc4 adc \$0, $acc5 + xor $acc0, $acc0 # Multiply by b[1] @@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq: # Second reduction step - mulq$poly1 - xor $t0, $t0 - add $acc1, $acc2 - adc \$0, %rdx - add %rax, $acc2 - mov $acc1, %rax - adc %rdx, $acc3 - adc \$0, $t0 - + mov $acc1, $t1 + shl \$32, $acc1 mulq$poly3 - xor $acc1, $acc1 - add $t0, $acc4 - adc \$0, %rdx - add %rax, $acc4 + shr \$32, $t1 + add $acc1, $acc2 + adc $t1, $acc3 + adc %rax, $acc4 mov8*2($b_ptr), %rax adc %rdx, $acc5 adc \$0, $acc0 + xor $acc1, $acc1 # Multiply by b[2] @@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq: # Third reduction step - mulq$poly1 - xor $t0, $t0 - add $acc2, $acc3 - adc \$0, %rdx - add %rax, $acc3 - mov $acc2, %rax - adc %rdx, $acc4 - adc \$0, $t0 - + mov $acc2, $t1 + shl \$32, $acc2 mulq$poly3 - xor $acc2, $acc2 - add $t0, $acc5 - adc \$0, %rdx - add %rax, $acc5 + shr \$32, $t1 + add $acc2, $acc3 + adc $t1, $acc4 + adc %rax, $acc5 mov8*3($b_ptr), %rax adc %rdx, $acc0 adc
RE: [openssl.org #3607] nistz256 is broken.
2. When will RT2574 be integrated to protect our ECC keys in the inevitable presence of software defects like this? http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest Timing attacks on ECC isn't a very high priority right now, given all the other bigger easier to exploit issues with wider deployment :( I wish it weren't so, but I do want to set your expectations properly. (Now, of course, having said that, the constant-time folks will swoop in and submit this to master next week :)
Re: [openssl.org #3607] nistz256 is broken.
Thanks for the reply, Rich. 2. When will RT2574 be integrated to protect our ECC keys in the inevitable presence of software defects like this? http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest Timing attacks on ECC isn't a very high priority right now, given all the other bigger easier to exploit issues with wider deployment :( The way I see it, the main purpose of that patch is to prevent bug attacks, not timing attacks. Hence the relation to the arithmetic bug. BBB __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3607] nistz256 is broken.
1. Where's the security analysis? Does https://eprint.iacr.org/2011/633 apply? 2. When will RT2574 be integrated to protect our ECC keys in the inevitable presence of software defects like this? http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest These questions are not necessarily for Adam, but the OpenSSL team. BBB On Sun, Nov 23, 2014 at 8:09 PM, Adam Langley via RT r...@openssl.org wrote: (Affects 1.0.2 only.) In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq, under Now the reduction there are a number of comments saying doesn't overflow. Unfortunately, they aren't correct. Let f be a field element with value 52998265219372519138277318009572834528257482223861497652862868020346603903843. In Montgomery form, it's represented in memory as f*2^256 mod p, which is 58733536287848456860684025065811053850702581988990452502702607007944524443511. When passed to ecp_nistz256_sqr_mont, this results in the intermediate value (before any reduction) 0x41dd6e8bcf7e19f499c19d0f5f3bba78272201eee64c6a44ca8a4ff275b53fa93b41d5b7035af3e40a05dc36f424ab9438cdec4fa193faebf6ce951. r10 in this case contains 0x40a05dc3 and the high-word output of the multiplication after # First iteration is 0xfa193fad. The addition of r8 and r9 overflows into it leaving it as 0xfa193fae. The addition of rax and r9 also sets the carry flag thus the final add-with-carry of rdx into r10 easily overflows and leaves r10 as 0x3ab99d72. Additionally, I'm not sure about any of the other cases in the same function that have been annotated the same way. There is also a similar annotation in ecp_nistz256_mul_mont that I've not investigated. Cheers AGL __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[openssl.org #3607] nistz256 is broken.
(Affects 1.0.2 only.) In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq, under Now the reduction there are a number of comments saying doesn't overflow. Unfortunately, they aren't correct. Let f be a field element with value 52998265219372519138277318009572834528257482223861497652862868020346603903843. In Montgomery form, it's represented in memory as f*2^256 mod p, which is 58733536287848456860684025065811053850702581988990452502702607007944524443511. When passed to ecp_nistz256_sqr_mont, this results in the intermediate value (before any reduction) 0x41dd6e8bcf7e19f499c19d0f5f3bba78272201eee64c6a44ca8a4ff275b53fa93b41d5b7035af3e40a05dc36f424ab9438cdec4fa193faebf6ce951. r10 in this case contains 0x40a05dc3 and the high-word output of the multiplication after # First iteration is 0xfa193fad. The addition of r8 and r9 overflows into it leaving it as 0xfa193fae. The addition of rax and r9 also sets the carry flag thus the final add-with-carry of rdx into r10 easily overflows and leaves r10 as 0x3ab99d72. Additionally, I'm not sure about any of the other cases in the same function that have been annotated the same way. There is also a similar annotation in ecp_nistz256_mul_mont that I've not investigated. Cheers AGL __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org