Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.

2014-12-15 Thread Adam Langley via RT
On Thu, Dec 11, 2014 at 3:30 PM, Adam Langley a...@google.com wrote:
 Thanks. So far that version is good to ~1B random tests. I'll leave it
 going until Monday.

It's good for ~6B random tests.

Of course, that's not as compelling for 64-bit code as it would be for
32-bit, but I think we can safely say that there aren't any
low-hanging bugs left.

Thanks Andy.


Cheers

AGL


___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.

2014-12-11 Thread Adam Langley via RT
On Wed, Dec 10, 2014 at 10:05 AM, Andy Polyakov via RT r...@openssl.org wrote:
 Patching went wrong for you. As you seem to operate in 1.0.2 context
 attached is corresponding ecp_nistz256.pl.

Thanks. So far that version is good to ~1B random tests. I'll leave it
going until Monday.


Cheers

AGL


___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.

2014-12-10 Thread Adam Langley via RT
On Fri, Dec 5, 2014 at 6:33 AM, Andy Polyakov via RT r...@openssl.org wrote:
 Attached. A little bit worse performance on some CPUs. I also took
 opportunity to harmonize ecp_nistz256_from_mont by applying same pattern
 for reduction. The patch is cumulative, i.e. is not incremental to
 previously posted one[s], and addresses both problems, originally
 reported one and discovered in the course. Patch to ecp_nistz256.c
 referred above doesn't matter.

When applying just that patch, the original test case fails. Specially
this test code (C++):


  BIGNUM *n = nullptr, *X = nullptr, *Y = nullptr, *Z = nullptr;
  BIGNUM *x = BN_new();
  BIGNUM *y = BN_new();

  ASSERT_NE(BN_hex2bn(n,
2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD),
0);
  ASSERT_NE(BN_hex2bn(X,
C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F),
0);
  ASSERT_NE(BN_hex2bn(Y,
3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328),
0);
  ASSERT_NE(BN_hex2bn(Z,
F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631),
0);

  EC_GROUP *group = EC_GROUP_new_by_curve_name(NID_X9_62_prime256v1);
  EC_POINT *p = EC_POINT_new(group);
  BN_CTX *ctx = BN_CTX_new();
  ASSERT_EQ(1, EC_POINT_set_Jprojective_coordinates_GFp(group, p, X,
Y, Z, ctx));

  EC_POINT *r = EC_POINT_new(group);
  // Set r = 핡×n.
  ASSERT_EQ(1, EC_POINT_mul(group, r, NULL, p, n, ctx));

  ASSERT_EQ(1, EC_POINT_get_affine_coordinates_GFp(group, r, x, y, ctx));
  char *x_out = BN_bn2hex(x);
  char *y_out = BN_bn2hex(y);
  
EXPECT_STREQ(C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863,
x_out);
  
EXPECT_STREQ(C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF,
y_out);

  free(x_out);
  free(y_out);
  BN_free(x);
  BN_free(y);
  BN_free(X);
  BN_free(Y);
  BN_free(Z);
  BN_free(n);
  EC_POINT_free(r);
  EC_POINT_free(p);
  BN_CTX_free(ctx);
  EC_GROUP_free(group);


Just to check that I'm not doing anything stupid (which is always a
distinct possibility), here are the .pl[1] and resulting .s[2] file
that I ended up with.

[1] 
https://drive.google.com/file/d/0B_OzbbAp1CG5OVdVc196QmV3bG8/view?usp=sharing
[2] 
https://drive.google.com/file/d/0B_OzbbAp1CG5Z3NoZzBqU09scFE/view?usp=sharing


Cheers

AGL


___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #3607] nistz256 is broken.

2014-12-10 Thread Andy Polyakov via RT
 Attached. A little bit worse performance on some CPUs. I also took
 opportunity to harmonize ecp_nistz256_from_mont by applying same pattern
 for reduction. The patch is cumulative, i.e. is not incremental to
 previously posted one[s], and addresses both problems, originally
 reported one and discovered in the course. Patch to ecp_nistz256.c
 referred above doesn't matter.
 
 When applying just that patch, the original test case fails. Specially
 this test code (C++):
 
 
   ...
 
 
 Just to check that I'm not doing anything stupid (which is always a
 distinct possibility), here are the .pl[1] and resulting .s[2] file
 that I ended up with.
 
 [1] 
 https://drive.google.com/file/d/0B_OzbbAp1CG5OVdVc196QmV3bG8/view?usp=sharing
 [2] 
 https://drive.google.com/file/d/0B_OzbbAp1CG5Z3NoZzBqU09scFE/view?usp=sharing

Patching went wrong for you. As you seem to operate in 1.0.2 context
attached is corresponding ecp_nistz256.pl.

#!/usr/bin/env perl

##
##
# Copyright 2014 Intel Corporation   #
##
# Licensed under the Apache License, Version 2.0 (the License);#
# you may not use this file except in compliance with the License.   #
# You may obtain a copy of the License at#
##
#http://www.apache.org/licenses/LICENSE-2.0  #
##
# Unless required by applicable law or agreed to in writing, software#
# distributed under the License is distributed on an AS IS BASIS,  #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   #
# See the License for the specific language governing permissions and#
# limitations under the License. #
##
##
##
#  Developers and authors:   #
#  Shay Gueron (1, 2), and Vlad Krasnov (1)  #
#  (1) Intel Corporation, Israel Development Center  #
#  (2) University of Haifa   #
#  Reference:#
#  S.Gueron and V.Krasnov, Fast Prime Field Elliptic Curve Cryptography with#
#   256 Bit Primes  #
##
##

# Further optimization by ap...@openssl.org:
#
#		this/original
# Opteron	+12-49%
# Bulldozer	+14-45%
# P4		+18-46%
# Westmere	+12-34%
# Sandy Bridge	+9-35%
# Ivy Bridge	+9-35%
# Haswell	+8-37%
# Broadwell	+18-58%
# Atom		+15-50%
# VIA Nano	+43-160%
#
# Ranges denote minimum and maximum improvement coefficients depending
# on benchmark.

$flavour = shift;
$output  = shift;
if ($flavour =~ /\./) { $output = $flavour; undef $flavour; }

$win64=0; $win64=1 if ($flavour =~ /[nm]asm|mingw64/ || $output =~ /\.asm$/);

$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
( $xlate=${dir}x86_64-xlate.pl and -f $xlate ) or
( $xlate=${dir}../../perlasm/x86_64-xlate.pl and -f $xlate) or
die can't locate x86_64-xlate.pl;

open OUT,| \$^X\ $xlate $flavour $output;
*STDOUT=*OUT;

if (`$ENV{CC} -Wa,-v -c -o /dev/null -x assembler /dev/null 21`
		=~ /GNU assembler version ([2-9]\.[0-9]+)/) {
	$avx = ($1=2.19) + ($1=2.22);
	$addx = ($1=2.23);
}

if (!$addx  $win64  ($flavour =~ /nasm/ || $ENV{ASM} =~ /nasm/) 
	`nasm -v 21` =~ /NASM version ([2-9]\.[0-9]+)/) {
	$avx = ($1=2.09) + ($1=2.10);
	$addx = ($1=2.10);
}

if (!$addx  $win64  ($flavour =~ /masm/ || $ENV{ASM} =~ /ml64/) 
	`ml64 21` =~ /Version ([0-9]+)\./) {
	$avx = ($1=10) + ($1=11);
	$addx = ($1=12);
}

if (!$addx  `$ENV{CC} -v 21` =~ /(^clang version|based on LLVM) ([3-9])\.([0-9]+)/) {
	my $ver = $2 + $3/100.0;	# 3.1-3.01, 3.10-3.10
	$avx = ($ver=3.0) + ($ver=3.01);
	$addx = ($ver=3.03);
}

$code.=___;
.text
.extern	OPENSSL_ia32cap_P

# The polynomial
.align 64
.Lpoly:
.quad 0x, 0x, 0x, 0x0001

# 2^512 mod P precomputed for NIST P256 polynomial
.LRR:
.quad 0x0003, 0xfffb, 0xfffe, 0x0004fffd

.LOne:
.long 1,1,1,1,1,1,1,1
.LTwo:
.long 2,2,2,2,2,2,2,2
.LThree:
.long 3,3,3,3,3,3,3,3
.LONE_mont:
.quad 0x0001, 0x, 

Re: [openssl.org #3607] nistz256 is broken.

2014-12-05 Thread Andy Polyakov via RT
 Oops! Wrong patch! Correct one attached. If you feel like testing the
 wrong one, go ahead, but there are some later non-essential adjustments.

 diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
 index bf3fcc6..33b07ce 100644
 --- a/crypto/ec/ecp_nistz256.c
 +++ b/crypto/ec/ecp_nistz256.c
 @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * 
 group,
  ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]);
  ecp_nistz256_point_add   (row[15 - 1], row[14 - 1], row[1 - 1]);
  ecp_nistz256_point_add   (row[11 - 1], row[10 - 1], row[1 - 1]);
 -ecp_nistz256_point_add   (row[16 - 1], row[15 - 1], row[1 - 1]);
 +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]);
  }

  index = 255;
 I can believe that this fixes the issue, but it's just masking it, no?

The underlying problem is that assembly routines return partially
reduced results. Partially reduced means that it can return result +
modulus if it fits in 256 bits. Rationale is that
((x+m)*y)%m=(x*y+m*y)%m=x*y%m+m*y%m and last term is 0. While it does
work with series of multiplications, I failed to recognize that there
are corner cases in non-multiplication operations. I'm preparing an
update...


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-05 Thread Andy Polyakov via RT
 Oops! Wrong patch! Correct one attached. If you feel like testing the
 wrong one, go ahead, but there are some later non-essential adjustments.

 diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
 index bf3fcc6..33b07ce 100644
 --- a/crypto/ec/ecp_nistz256.c
 +++ b/crypto/ec/ecp_nistz256.c
 @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * 
 group,
  ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]);
  ecp_nistz256_point_add   (row[15 - 1], row[14 - 1], row[1 - 
 1]);
  ecp_nistz256_point_add   (row[11 - 1], row[10 - 1], row[1 - 
 1]);
 -ecp_nistz256_point_add   (row[16 - 1], row[15 - 1], row[1 - 
 1]);
 +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]);
  }

  index = 255;
 I can believe that this fixes the issue, but it's just masking it, no?
 
 The underlying problem is that assembly routines return partially
 reduced results. Partially reduced means that it can return result +
 modulus if it fits in 256 bits. Rationale is that
 ((x+m)*y)%m=(x*y+m*y)%m=x*y%m+m*y%m and last term is 0. While it does
 work with series of multiplications, I failed to recognize that there
 are corner cases in non-multiplication operations. I'm preparing an
 update...

Attached. A little bit worse performance on some CPUs. I also took
opportunity to harmonize ecp_nistz256_from_mont by applying same pattern
for reduction. The patch is cumulative, i.e. is not incremental to
previously posted one[s], and addresses both problems, originally
reported one and discovered in the course. Patch to ecp_nistz256.c
referred above doesn't matter.


diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl b/crypto/ec/asm/ecp_nistz256-x86_64.pl
index 4486a5e..cdff22a 100755
--- a/crypto/ec/asm/ecp_nistz256-x86_64.pl
+++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl
@@ -31,15 +31,16 @@
 # Further optimization by ap...@openssl.org:
 #
 #		this/original
-# Opteron	+8-33%
-# Bulldozer	+10-30%
-# P4		+14-38%
-# Westmere	+8-23%
-# Sandy Bridge	+8-24%
-# Ivy Bridge	+7-25%
-# Haswell	+5-25%
-# Atom		+10-32%
-# VIA Nano	+37-130%
+# Opteron	+12-49%
+# Bulldozer	+14-45%
+# P4		+18-46%
+# Westmere	+12-34%
+# Sandy Bridge	+9-35%
+# Ivy Bridge	+9-35%
+# Haswell	+8-37%
+# Broadwell	+18-58%
+# Atom		+15-50%
+# VIA Nano	+43-160%
 #
 # Ranges denote minimum and maximum improvement coefficients depending
 # on benchmark. Lower coefficients are for ECDSA sign, relatively
@@ -550,28 +551,20 @@ __ecp_nistz256_mul_montq:
 	# and add the result to the acc.
 	# Due to the special form of p256 we do some optimizations
 	#
-	# acc[0] x p256[0] = acc[0] x 2^64 - acc[0]
-	# then we add acc[0] and get acc[0] x 2^64
-
-	mulq	$poly1
-	xor	$t0, $t0
-	add	$acc0, $acc1		# +=acc[0]*2^64
-	adc	\$0, %rdx
-	add	%rax, $acc1
-	mov	$acc0, %rax
-
-	# acc[0] x p256[2] = 0
-	adc	%rdx, $acc2
-	adc	\$0, $t0
+	# acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0]
+	# then we add acc[0] and get acc[0] x 2^96
 
+	mov	$acc0, $t1
+	shl	\$32, $acc0
 	mulq	$poly3
-	xor	$acc0, $acc0
-	add	$t0, $acc3
-	adc	\$0, %rdx
-	add	%rax, $acc3
+	shr	\$32, $t1
+	add	$acc0, $acc1		# +=acc[0]96
+	adc	$t1, $acc2
+	adc	%rax, $acc3
 	 mov	8*1($b_ptr), %rax
 	adc	%rdx, $acc4
 	adc	\$0, $acc5
+	xor	$acc0, $acc0
 
 	
 	# Multiply by b[1]
@@ -608,23 +601,17 @@ __ecp_nistz256_mul_montq:
 
 	
 	# Second reduction step	
-	mulq	$poly1
-	xor	$t0, $t0
-	add	$acc1, $acc2
-	adc	\$0, %rdx
-	add	%rax, $acc2
-	mov	$acc1, %rax
-	adc	%rdx, $acc3
-	adc	\$0, $t0
-
+	mov	$acc1, $t1
+	shl	\$32, $acc1
 	mulq	$poly3
-	xor	$acc1, $acc1
-	add	$t0, $acc4
-	adc	\$0, %rdx
-	add	%rax, $acc4
+	shr	\$32, $t1
+	add	$acc1, $acc2
+	adc	$t1, $acc3
+	adc	%rax, $acc4
 	 mov	8*2($b_ptr), %rax
 	adc	%rdx, $acc5
 	adc	\$0, $acc0
+	xor	$acc1, $acc1
 
 	
 	# Multiply by b[2]
@@ -661,23 +648,17 @@ __ecp_nistz256_mul_montq:
 
 	
 	# Third reduction step	
-	mulq	$poly1
-	xor	$t0, $t0
-	add	$acc2, $acc3
-	adc	\$0, %rdx
-	add	%rax, $acc3
-	mov	$acc2, %rax
-	adc	%rdx, $acc4
-	adc	\$0, $t0
-
+	mov	$acc2, $t1
+	shl	\$32, $acc2
 	mulq	$poly3
-	xor	$acc2, $acc2
-	add	$t0, $acc5
-	adc	\$0, %rdx
-	add	%rax, $acc5
+	shr	\$32, $t1
+	add	$acc2, $acc3
+	adc	$t1, $acc4
+	adc	%rax, $acc5
 	 mov	8*3($b_ptr), %rax
 	adc	%rdx, $acc0
 	adc	\$0, $acc1
+	xor	$acc2, $acc2
 
 	
 	# Multiply by b[3]
@@ -714,20 +695,14 @@ __ecp_nistz256_mul_montq:
 
 	
 	# Final reduction step	
-	mulq	$poly1
-	#xor	$t0, $t0
-	add	$acc3, $acc4
-	adc	\$0, %rdx
-	add	%rax, $acc4
-	mov	$acc3, %rax
-	adc	%rdx, $acc5
-	#adc	\$0, $t0		# doesn't overflow
-
+	mov	$acc3, $t1
+	shl	\$32, $acc3
 	mulq	$poly3
-	

Re: [openssl.org #3607] nistz256 is broken.

2014-12-04 Thread Andy Polyakov via RT
 Oops! Wrong patch! Correct one attached. If you feel like testing the
 wrong one, go ahead, but there are some later non-essential adjustments.

 diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
 index bf3fcc6..33b07ce 100644
 --- a/crypto/ec/ecp_nistz256.c
 +++ b/crypto/ec/ecp_nistz256.c
 @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * 
 group,
  ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]);
  ecp_nistz256_point_add   (row[15 - 1], row[14 - 1], row[1 - 1]);
  ecp_nistz256_point_add   (row[11 - 1], row[10 - 1], row[1 - 1]);
 -ecp_nistz256_point_add   (row[16 - 1], row[15 - 1], row[1 - 1]);
 +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]);
  }

  index = 255;
 
 I can believe that this fixes the issue, but it's just masking it, no?

It's not a coincidence that I didn't say fixes the issue or solves
the problem, but produces correct result. BTW, it seems to be
unrelated to the original problem with carries handling in assembly.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Andy Polyakov via RT
 thanks! Was away last week and so didn't have a chance to try fixing this.

 I'll patch that it and run the tests against it.
 
 I've run out of time tracking this down for today, but I got to the
 point where setting the Jacobian coordinates:
 
 X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F
 Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328
 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631
 
 and multiplying that point by
 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD
 results in the affine point:
 
 x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD
 y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7
 
 However, I believe that the result should be:
 
 x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863
 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF

I do get the latter...





__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Andy Polyakov via RT
 thanks! Was away last week and so didn't have a chance to try fixing this.

 I'll patch that it and run the tests against it.

 I've run out of time tracking this down for today, but I got to the
 point where setting the Jacobian coordinates:

 X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F
 Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328
 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631

 and multiplying that point by
 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD
 results in the affine point:

 x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD
 y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7

 However, I believe that the result should be:

 x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863
 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF
 
 I do get the latter...

... in master, and I get the former in 1.0.2. Looking into it...



__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Andy Polyakov via RT
 thanks! Was away last week and so didn't have a chance to try fixing this.

 I'll patch that it and run the tests against it.

 I've run out of time tracking this down for today, but I got to the
 point where setting the Jacobian coordinates:

 X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F
 Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328
 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631

 and multiplying that point by
 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD
 results in the affine point:

 x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD
 y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7

 However, I believe that the result should be:

 x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863
 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF

 I do get the latter...
 
 ... in master, and I get the former in 1.0.2. Looking into it...

Attached patch produces correct result in 1.0.2. Looking further for
explanation...


commit 9ca10dca3ca6bc5ea739a998258351dbfd261b42
Author: Andy Polyakov ap...@openssl.org
Date:   Sun Oct 26 12:45:47 2014 +0100

Add ARMv4 ECP_NISTZ256 implementation.

diff --git a/Configure b/Configure
index 2eda5e6..777e8cf 100755
--- a/Configure
+++ b/Configure
@@ -138,7 +138,7 @@ my $alpha_asm=alphacpuid.o:bn_asm.o alpha-mont.o::sha1-alpha.o:::ghash-
 my $mips64_asm=:bn-mips.o mips-mont.o:::aes_cbc.o aes-mips.o:::sha1-mips.o sha256-mips.o sha512-mips.o;
 my $mips32_asm=$mips64_asm; $mips32_asm =~ s/\s*sha512\-mips\.o//;
 my $s390x_asm=s390xcap.o s390xcpuid.o:bn-s390x.o s390x-mont.o s390x-gf2m.o:::aes-s390x.o aes-ctr.o aes-xts.o:::sha1-s390x.o sha256-s390x.o sha512-s390x.o::rc4-s390x.o:ghash-s390x.o:;
-my $armv4_asm=armcap.o armv4cpuid.o:bn_asm.o armv4-mont.o armv4-gf2m.o:::aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o:::sha1-armv4-large.o sha256-armv4.o sha512-armv4.o:::ghash-armv4.o ghashv8-armx.o::void;
+my $armv4_asm=armcap.o armv4cpuid.o:bn_asm.o armv4-mont.o armv4-gf2m.o:ecp_nistz256.o ecp_nistz256-armv4.o::aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o:::sha1-armv4-large.o sha256-armv4.o sha512-armv4.o:::ghash-armv4.o ghashv8-armx.o::void;
 my $aarch64_asm=armcap.o arm64cpuid.o mem_clr.oaes_core.o aes_cbc.o aesv8-armx.o:::sha1-armv8.o sha256-armv8.o sha512-armv8.o:::ghashv8-armx.o:;
 my $parisc11_asm=pariscid.o:bn_asm.o parisc-mont.o:::aes_core.o aes_cbc.o aes-parisc.o:::sha1-parisc.o sha256-parisc.o sha512-parisc.o::rc4-parisc.o:ghash-parisc.o::32;
 my $parisc20_asm=pariscid.o:pa-risc2W.o parisc-mont.o:::aes_core.o aes_cbc.o aes-parisc.o:::sha1-parisc.o sha256-parisc.o sha512-parisc.o::rc4-parisc.o:ghash-parisc.o::64;
diff --git a/TABLE b/TABLE
index d778dac..b1d4543 100644
--- a/TABLE
+++ b/TABLE
@@ -1132,7 +1132,7 @@ $lflags   = -ldl
 $bn_ops   = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR
 $cpuid_obj= armcap.o armv4cpuid.o
 $bn_obj   = bn_asm.o armv4-mont.o armv4-gf2m.o
-$ec_obj   = 
+$ec_obj   = ecp_nistz256.o ecp_nistz256-armv4.o
 $des_obj  = 
 $aes_obj  = aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o
 $bf_obj   = 
@@ -4362,7 +4362,7 @@ $lflags   = -ldl
 $bn_ops   = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR
 $cpuid_obj= armcap.o armv4cpuid.o
 $bn_obj   = bn_asm.o armv4-mont.o armv4-gf2m.o
-$ec_obj   = 
+$ec_obj   = ecp_nistz256.o ecp_nistz256-armv4.o
 $des_obj  = 
 $aes_obj  = aes_cbc.o aes-armv4.o bsaes-armv7.o aesv8-armx.o
 $bf_obj   = 
diff --git a/crypto/ec/Makefile b/crypto/ec/Makefile
index 898e43d..e020c93 100644
--- a/crypto/ec/Makefile
+++ b/crypto/ec/Makefile
@@ -54,6 +54,9 @@ ecp_nistz256-x86_64.s: asm/ecp_nistz256-x86_64.pl
 ecp_nistz256-avx2.s:   asm/ecp_nistz256-avx2.pl
 	$(PERL) asm/ecp_nistz256-avx2.pl $(PERLASM_SCHEME)  $@
 
+ecp_nistz256-%.S:	asm/ecp_nistz256-%.pl;  $(PERL) $ $(PERLASM_SCHEME) $@
+ecp_nistz256-armv4.o:	ecp_nistz256-armv4.S
+
 files:
 	$(PERL) $(TOP)/util/files.pl Makefile  $(TOP)/MINFO
 
diff --git a/crypto/ec/asm/ecp_nistz256-armv4.pl b/crypto/ec/asm/ecp_nistz256-armv4.pl
new file mode 100755
index 000..ad29948
--- /dev/null
+++ b/crypto/ec/asm/ecp_nistz256-armv4.pl
@@ -0,0 +1,1720 @@
+#!/usr/bin/env perl
+
+# 
+# Written by Andy Polyakov ap...@openssl.org for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see http://www.openssl.org/~appro/cryptogams/.
+# 
+#
+# ECP_NISTZ256 module for ARMv4.
+#
+# October 2014.
+#
+# Original ECP_NISTZ256 submission targeting x86_64 is detailed in
+# http://eprint.iacr.org/2013/816. In the process of adaptation
+# original .c module was made 32-bit savvy in 

Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Andy Polyakov via RT
 thanks! Was away last week and so didn't have a chance to try fixing this.

 I'll patch that it and run the tests against it.
 I've run out of time tracking this down for today, but I got to the
 point where setting the Jacobian coordinates:

 X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F
 Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328
 Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631

 and multiplying that point by
 2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD
 results in the affine point:

 x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD
 y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7

 However, I believe that the result should be:

 x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863
 y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF
 I do get the latter...
 ... in master, and I get the former in 1.0.2. Looking into it...
 
 Attached patch produces correct result in 1.0.2. Looking further for
 explanation...

Oops! Wrong patch! Correct one attached. If you feel like testing the
wrong one, go ahead, but there are some later non-essential adjustments.


diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
index bf3fcc6..33b07ce 100644
--- a/crypto/ec/ecp_nistz256.c
+++ b/crypto/ec/ecp_nistz256.c
@@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * 
group,
 ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]);
 ecp_nistz256_point_add   (row[15 - 1], row[14 - 1], row[1 - 1]);
 ecp_nistz256_point_add   (row[11 - 1], row[10 - 1], row[1 - 1]);
-ecp_nistz256_point_add   (row[16 - 1], row[15 - 1], row[1 - 1]);
+ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]);
 }
 
 index = 255;


Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Bodo Moeller

 2. When will RT2574 be integrated to protect our ECC keys in the
 inevitable presence of software defects like this?
 http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest


Reportedly, Cryptography Research (i.e., Rambus) alleges to have broad
patents on techniques like this (and they might not be the only ones). I'm
not going to look for specific patents and can't assess the validity of
that rumor, the only thing I know for certain is that Cryptography Research
and Rambus are famous, above all else, for starting patent lawsuits (see,
e.g.,
http://www.sec.gov/Archives/edgar/data/1403161/000119312507270394/d10k.htm).

Unfortunately, this means that the OpenSSL project may not be willing to
incorporate coordinate-blinding techniques at this time.

Bodo


Re: [openssl.org #3607] nistz256 is broken.

2014-12-03 Thread Adam Langley via RT
On Wed, Dec 3, 2014 at 10:12 AM, Andy Polyakov via RT r...@openssl.org wrote:
 Oops! Wrong patch! Correct one attached. If you feel like testing the
 wrong one, go ahead, but there are some later non-essential adjustments.

 diff --git a/crypto/ec/ecp_nistz256.c b/crypto/ec/ecp_nistz256.c
 index bf3fcc6..33b07ce 100644
 --- a/crypto/ec/ecp_nistz256.c
 +++ b/crypto/ec/ecp_nistz256.c
 @@ -637,7 +637,7 @@ static void ecp_nistz256_windowed_mul(const EC_GROUP * 
 group,
  ecp_nistz256_point_double(row[10 - 1], row[ 5 - 1]);
  ecp_nistz256_point_add   (row[15 - 1], row[14 - 1], row[1 - 1]);
  ecp_nistz256_point_add   (row[11 - 1], row[10 - 1], row[1 - 1]);
 -ecp_nistz256_point_add   (row[16 - 1], row[15 - 1], row[1 - 1]);
 +ecp_nistz256_point_double(row[16 - 1], row[ 8 - 1]);
  }

  index = 255;

I can believe that this fixes the issue, but it's just masking it, no?
I'll see if I can track it down more precisely tomorrow.


Cheers

AGL


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-02 Thread Adam Langley via RT
On Mon, Dec 1, 2014 at 3:23 PM, Andy Polyakov via RT r...@openssl.org wrote:
 (Affects 1.0.2 only.)

 In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq,
 under Now the reduction there are a number of comments saying
 doesn't overflow. Unfortunately, they aren't correct.

 Got math wrong:-( Attached is not only fixed version, but even faster
 one.

 Please test attached one instead. Squaring didn't cover one case, and
 AD*X is optimized.

thanks! Was away last week and so didn't have a chance to try fixing this.

I'll patch that it and run the tests against it.


Cheers

AGL


 On related note. It's possible to improve server-side DSA by ~5% by
 switching [back] to scatter-gather. [Change from scatter-gather was
 caused by concern about timing dependency, but I argue that concern is
 not valid in most cases.] There also are x86 and and ARM versions pending:

 #   with/without -DECP_NISTZ256_ASM
 # Pentium   +66-168%
 # PIII  +73-175%
 # P4+68-140%
 # Core2 +90-215%
 # Sandy Bridge  +105-265% (contemporary i[57]-* are all close to this)
 # Atom  +66-160%
 # Opteron   +54-112%
 # Bulldozer +99-240%
 # VIA Nano  +93-300%

 #   with/without -DECP_NISTZ256_ASM
 # Cortex-A8 +53-173%
 # Cortex-A9 +76-205%
 # Cortex-A15+100-316%
 # Snapdragon S4 +66-187%

 No, bug in question is not there. Nor is AD*X code path is affected.





 diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl 
 b/crypto/ec/asm/ecp_nistz256-x86_64.pl
 index 4486a5e..56f6c2b 100755
 --- a/crypto/ec/asm/ecp_nistz256-x86_64.pl
 +++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl
 @@ -31,15 +31,15 @@
  # Further optimization by ap...@openssl.org:
  #
  #  this/original
 -# Opteron  +8-33%
 -# Bulldozer+10-30%
 -# P4   +14-38%
 -# Westmere +8-23%
 -# Sandy Bridge +8-24%
 -# Ivy Bridge   +7-25%
 -# Haswell  +5-25%
 -# Atom +10-32%
 -# VIA Nano +37-130%
 +# Opteron  +10-43%
 +# Bulldozer+14-43%
 +# P4   +18-50%
 +# Westmere +12-36%
 +# Sandy Bridge +9-36%
 +# Ivy Bridge   +9-36%
 +# Haswell  +8-37%
 +# Atom +15-50%
 +# VIA Nano +43-160%
  #
  # Ranges denote minimum and maximum improvement coefficients depending
  # on benchmark. Lower coefficients are for ECDSA sign, relatively
 @@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq:
 # and add the result to the acc.
 # Due to the special form of p256 we do some optimizations
 #
 -   # acc[0] x p256[0] = acc[0] x 2^64 - acc[0]
 -   # then we add acc[0] and get acc[0] x 2^64
 -
 -   mulq$poly1
 -   xor $t0, $t0
 -   add $acc0, $acc1# +=acc[0]*2^64
 -   adc \$0, %rdx
 -   add %rax, $acc1
 -   mov $acc0, %rax
 -
 -   # acc[0] x p256[2] = 0
 -   adc %rdx, $acc2
 -   adc \$0, $t0
 +   # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0]
 +   # then we add acc[0] and get acc[0] x 2^96

 +   mov $acc0, $t1
 +   shl \$32, $acc0
 mulq$poly3
 -   xor $acc0, $acc0
 -   add $t0, $acc3
 -   adc \$0, %rdx
 -   add %rax, $acc3
 +   shr \$32, $t1
 +   add $acc0, $acc1# +=acc[0]96
 +   adc $t1, $acc2
 +   adc %rax, $acc3
  mov8*1($b_ptr), %rax
 adc %rdx, $acc4
 adc \$0, $acc5
 +   xor $acc0, $acc0

 
 
 # Multiply by b[1]
 @@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq:

 
 
 # Second reduction step
 -   mulq$poly1
 -   xor $t0, $t0
 -   add $acc1, $acc2
 -   adc \$0, %rdx
 -   add %rax, $acc2
 -   mov $acc1, %rax
 -   adc %rdx, $acc3
 -   adc \$0, $t0
 -
 +   mov $acc1, $t1
 +   shl \$32, $acc1
 mulq$poly3
 -   xor $acc1, $acc1
 -   add $t0, $acc4
 -   adc \$0, %rdx
 -   add %rax, $acc4
 +   shr \$32, $t1
 +   add $acc1, $acc2
 +   adc $t1, $acc3
 +   adc %rax, $acc4
  mov8*2($b_ptr), %rax
 adc %rdx, $acc5
 adc \$0, $acc0
 +   xor $acc1, $acc1

 
 
 # Multiply by b[2]
 @@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq:

 
 
 # Third reduction step
 -   mulq$poly1
 -   xor $t0, $t0
 -   add $acc2, $acc3
 -   adc \$0, %rdx
 -   add %rax, $acc3
 -   mov $acc2, %rax
 -   adc %rdx, $acc4
 -   adc \$0, $t0
 -
 +   mov $acc2, $t1
 +   shl 

Re: [openssl.org #3607] nistz256 is broken.

2014-12-02 Thread Adam Langley via RT
On Tue, Dec 2, 2014 at 12:33 PM, Adam Langley a...@google.com wrote:
 thanks! Was away last week and so didn't have a chance to try fixing this.

 I'll patch that it and run the tests against it.

I've run out of time tracking this down for today, but I got to the
point where setting the Jacobian coordinates:

X: C4EB2994C09557B400FF6A543CFB257F945E86FE3DF1D32A8128F32927666A8F
Y: 3D5283F8F10F559AE5310005005F321B28D2D699F3E01F179F91AC6660013328
Z: F97FD7E6757991A2C7E0C2488FF3C54E58030BCACF3FB95954FD3EF211C24631

and multiplying that point by
2269520AFB46450398DE95AE59DDBDC1D42B8B7030F81BCFEF12D819C1D678DD
results in the affine point:

x: 4BBC2813F69EF6A4D3E69E2832E9A9E97FF59F8C136DCDBD9509BC685FF337FD
y: BDCB623715CE2D983CFC2776C6EED4375454BE2C88932D43856906C1DC7A0BD7

However, I believe that the result should be:

x: C2910AA0216D12DE30C5573CCFC4116546E3091DC1E9EC8604F634185CE40863
y: C9071E13D688C305CE179C6168DD9066657BC6CDC1639A44B68DF7F1E0A40EDF


Cheers

AGL


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-12-01 Thread Andy Polyakov via RT
 (Affects 1.0.2 only.)
 
 In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq,
 under Now the reduction there are a number of comments saying
 doesn't overflow. Unfortunately, they aren't correct.

Got math wrong:-( Attached is not only fixed version, but even faster
one. On related note. It's possible to improve server-side DSA by ~5% by
switching [back] to scatter-gather. [Change from scatter-gather was
caused by concern about timing dependency, but I argue that concern is
not valid in most cases.] There also are x86 and and ARM versions pending:

#   with/without -DECP_NISTZ256_ASM
# Pentium   +66-168%
# PIII  +73-175%
# P4+68-140%
# Core2 +90-215%
# Sandy Bridge  +105-265% (contemporary i[57]-* are all close to this)
# Atom  +66-160%
# Opteron   +54-112%
# Bulldozer +99-240%
# VIA Nano  +93-300%

#   with/without -DECP_NISTZ256_ASM
# Cortex-A8 +53-173%
# Cortex-A9 +76-205%
# Cortex-A15+100-316%
# Snapdragon S4 +66-187%

No, bug in question is not there. Nor is AD*X code path is affected.

diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl 
b/crypto/ec/asm/ecp_nistz256-x86_64.pl
index 4486a5e..f328b85 100755
--- a/crypto/ec/asm/ecp_nistz256-x86_64.pl
+++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl
@@ -31,15 +31,15 @@
 # Further optimization by ap...@openssl.org:
 #
 #  this/original
-# Opteron  +8-33%
-# Bulldozer+10-30%
-# P4   +14-38%
-# Westmere +8-23%
-# Sandy Bridge +8-24%
-# Ivy Bridge   +7-25%
-# Haswell  +5-25%
-# Atom +10-32%
-# VIA Nano +37-130%
+# Opteron  +10-43%
+# Bulldozer+14-43%
+# P4   +18-50%
+# Westmere +12-36%
+# Sandy Bridge +9-36%
+# Ivy Bridge   +9-36%
+# Haswell  +8-37%
+# Atom +15-50%
+# VIA Nano +43-160%
 #
 # Ranges denote minimum and maximum improvement coefficients depending
 # on benchmark. Lower coefficients are for ECDSA sign, relatively
@@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq:
# and add the result to the acc.
# Due to the special form of p256 we do some optimizations
#
-   # acc[0] x p256[0] = acc[0] x 2^64 - acc[0]
-   # then we add acc[0] and get acc[0] x 2^64
-
-   mulq$poly1
-   xor $t0, $t0
-   add $acc0, $acc1# +=acc[0]*2^64
-   adc \$0, %rdx
-   add %rax, $acc1
-   mov $acc0, %rax
-
-   # acc[0] x p256[2] = 0
-   adc %rdx, $acc2
-   adc \$0, $t0
+   # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0]
+   # then we add acc[0] and get acc[0] x 2^96
 
+   mov $acc0, $t1
+   shl \$32, $acc0
mulq$poly3
-   xor $acc0, $acc0
-   add $t0, $acc3
-   adc \$0, %rdx
-   add %rax, $acc3
+   shr \$32, $t1
+   add $acc0, $acc1# +=acc[0]96
+   adc $t1, $acc2
+   adc %rax, $acc3
 mov8*1($b_ptr), %rax
adc %rdx, $acc4
adc \$0, $acc5
+   xor $acc0, $acc0
 

# Multiply by b[1]
@@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq:
 

# Second reduction step 
-   mulq$poly1
-   xor $t0, $t0
-   add $acc1, $acc2
-   adc \$0, %rdx
-   add %rax, $acc2
-   mov $acc1, %rax
-   adc %rdx, $acc3
-   adc \$0, $t0
-
+   mov $acc1, $t1
+   shl \$32, $acc1
mulq$poly3
-   xor $acc1, $acc1
-   add $t0, $acc4
-   adc \$0, %rdx
-   add %rax, $acc4
+   shr \$32, $t1
+   add $acc1, $acc2
+   adc $t1, $acc3
+   adc %rax, $acc4
 mov8*2($b_ptr), %rax
adc %rdx, $acc5
adc \$0, $acc0
+   xor $acc1, $acc1
 

# Multiply by b[2]
@@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq:
 

# Third reduction step  
-   mulq$poly1
-   xor $t0, $t0
-   add $acc2, $acc3
-   adc \$0, %rdx
-   add %rax, $acc3
-   mov $acc2, %rax
-   adc %rdx, $acc4
-   adc \$0, $t0
-
+   mov $acc2, $t1
+   shl \$32, $acc2
mulq$poly3
-   xor $acc2, $acc2
-   add $t0, $acc5
-   adc \$0, %rdx
-   add %rax, $acc5
+   shr \$32, $t1
+   add $acc2, $acc3
+   adc $t1, $acc4
+   adc %rax, $acc5
 mov8*3($b_ptr), %rax
adc %rdx, $acc0
adc \$0, $acc1
+   xor $acc2, $acc2
 

Re: [openssl.org #3607] nistz256 is broken.

2014-12-01 Thread Andy Polyakov via RT
 (Affects 1.0.2 only.)

 In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq,
 under Now the reduction there are a number of comments saying
 doesn't overflow. Unfortunately, they aren't correct.
 
 Got math wrong:-( Attached is not only fixed version, but even faster
 one.

Please test attached one instead. Squaring didn't cover one case, and
AD*X is optimized.

 On related note. It's possible to improve server-side DSA by ~5% by
 switching [back] to scatter-gather. [Change from scatter-gather was
 caused by concern about timing dependency, but I argue that concern is
 not valid in most cases.] There also are x86 and and ARM versions pending:
 
 #   with/without -DECP_NISTZ256_ASM
 # Pentium   +66-168%
 # PIII  +73-175%
 # P4+68-140%
 # Core2 +90-215%
 # Sandy Bridge  +105-265% (contemporary i[57]-* are all close to this)
 # Atom  +66-160%
 # Opteron   +54-112%
 # Bulldozer +99-240%
 # VIA Nano  +93-300%
 
 #   with/without -DECP_NISTZ256_ASM
 # Cortex-A8 +53-173%
 # Cortex-A9 +76-205%
 # Cortex-A15+100-316%
 # Snapdragon S4 +66-187%
 
 No, bug in question is not there. Nor is AD*X code path is affected.
 
 


diff --git a/crypto/ec/asm/ecp_nistz256-x86_64.pl 
b/crypto/ec/asm/ecp_nistz256-x86_64.pl
index 4486a5e..56f6c2b 100755
--- a/crypto/ec/asm/ecp_nistz256-x86_64.pl
+++ b/crypto/ec/asm/ecp_nistz256-x86_64.pl
@@ -31,15 +31,15 @@
 # Further optimization by ap...@openssl.org:
 #
 #  this/original
-# Opteron  +8-33%
-# Bulldozer+10-30%
-# P4   +14-38%
-# Westmere +8-23%
-# Sandy Bridge +8-24%
-# Ivy Bridge   +7-25%
-# Haswell  +5-25%
-# Atom +10-32%
-# VIA Nano +37-130%
+# Opteron  +10-43%
+# Bulldozer+14-43%
+# P4   +18-50%
+# Westmere +12-36%
+# Sandy Bridge +9-36%
+# Ivy Bridge   +9-36%
+# Haswell  +8-37%
+# Atom +15-50%
+# VIA Nano +43-160%
 #
 # Ranges denote minimum and maximum improvement coefficients depending
 # on benchmark. Lower coefficients are for ECDSA sign, relatively
@@ -550,28 +550,20 @@ __ecp_nistz256_mul_montq:
# and add the result to the acc.
# Due to the special form of p256 we do some optimizations
#
-   # acc[0] x p256[0] = acc[0] x 2^64 - acc[0]
-   # then we add acc[0] and get acc[0] x 2^64
-
-   mulq$poly1
-   xor $t0, $t0
-   add $acc0, $acc1# +=acc[0]*2^64
-   adc \$0, %rdx
-   add %rax, $acc1
-   mov $acc0, %rax
-
-   # acc[0] x p256[2] = 0
-   adc %rdx, $acc2
-   adc \$0, $t0
+   # acc[0] x p256[0..1] = acc[0] x 2^96 - acc[0]
+   # then we add acc[0] and get acc[0] x 2^96
 
+   mov $acc0, $t1
+   shl \$32, $acc0
mulq$poly3
-   xor $acc0, $acc0
-   add $t0, $acc3
-   adc \$0, %rdx
-   add %rax, $acc3
+   shr \$32, $t1
+   add $acc0, $acc1# +=acc[0]96
+   adc $t1, $acc2
+   adc %rax, $acc3
 mov8*1($b_ptr), %rax
adc %rdx, $acc4
adc \$0, $acc5
+   xor $acc0, $acc0
 

# Multiply by b[1]
@@ -608,23 +600,17 @@ __ecp_nistz256_mul_montq:
 

# Second reduction step 
-   mulq$poly1
-   xor $t0, $t0
-   add $acc1, $acc2
-   adc \$0, %rdx
-   add %rax, $acc2
-   mov $acc1, %rax
-   adc %rdx, $acc3
-   adc \$0, $t0
-
+   mov $acc1, $t1
+   shl \$32, $acc1
mulq$poly3
-   xor $acc1, $acc1
-   add $t0, $acc4
-   adc \$0, %rdx
-   add %rax, $acc4
+   shr \$32, $t1
+   add $acc1, $acc2
+   adc $t1, $acc3
+   adc %rax, $acc4
 mov8*2($b_ptr), %rax
adc %rdx, $acc5
adc \$0, $acc0
+   xor $acc1, $acc1
 

# Multiply by b[2]
@@ -661,23 +647,17 @@ __ecp_nistz256_mul_montq:
 

# Third reduction step  
-   mulq$poly1
-   xor $t0, $t0
-   add $acc2, $acc3
-   adc \$0, %rdx
-   add %rax, $acc3
-   mov $acc2, %rax
-   adc %rdx, $acc4
-   adc \$0, $t0
-
+   mov $acc2, $t1
+   shl \$32, $acc2
mulq$poly3
-   xor $acc2, $acc2
-   add $t0, $acc5
-   adc \$0, %rdx
-   add %rax, $acc5
+   shr \$32, $t1
+   add $acc2, $acc3
+   adc $t1, $acc4
+   adc %rax, $acc5
 mov8*3($b_ptr), %rax
adc %rdx, $acc0
adc 

RE: [openssl.org #3607] nistz256 is broken.

2014-11-25 Thread Salz, Rich
 2. When will RT2574 be integrated to protect our ECC keys in the inevitable
 presence of software defects like this?
 http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest

Timing attacks on ECC isn't a very high priority right now, given all the other 
bigger easier to exploit issues with wider deployment :(

I wish it weren't so, but I do want to set your expectations properly.

(Now, of course, having said that, the constant-time folks will swoop in and 
submit this to master next week :)


Re: [openssl.org #3607] nistz256 is broken.

2014-11-25 Thread Billy Brumley
Thanks for the reply, Rich.

 2. When will RT2574 be integrated to protect our ECC keys in the inevitable
 presence of software defects like this?
 http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest

 Timing attacks on ECC isn't a very high priority right now, given all the 
 other bigger easier to exploit issues with wider deployment :(

The way I see it, the main purpose of that patch is to prevent bug
attacks, not timing attacks. Hence the relation to the arithmetic bug.

BBB
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3607] nistz256 is broken.

2014-11-24 Thread Billy Brumley
1. Where's the security analysis? Does https://eprint.iacr.org/2011/633 apply?

2. When will RT2574 be integrated to protect our ECC keys in the
inevitable presence of software defects like this?
http://rt.openssl.org/Ticket/Display.html?id=2574user=guestpass=guest

These questions are not necessarily for Adam, but the OpenSSL team.

BBB

On Sun, Nov 23, 2014 at 8:09 PM, Adam Langley via RT r...@openssl.org wrote:
 (Affects 1.0.2 only.)

 In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq,
 under Now the reduction there are a number of comments saying
 doesn't overflow. Unfortunately, they aren't correct.

 Let f be a field element with value
 52998265219372519138277318009572834528257482223861497652862868020346603903843.

 In Montgomery form, it's represented in memory as f*2^256 mod p, which
 is 
 58733536287848456860684025065811053850702581988990452502702607007944524443511.

 When passed to ecp_nistz256_sqr_mont, this results in the intermediate
 value (before any reduction)
 0x41dd6e8bcf7e19f499c19d0f5f3bba78272201eee64c6a44ca8a4ff275b53fa93b41d5b7035af3e40a05dc36f424ab9438cdec4fa193faebf6ce951.

 r10 in this case contains 0x40a05dc3 and the high-word output
 of the multiplication after # First iteration is 0xfa193fad. The
 addition of r8 and r9 overflows into it leaving it as 0xfa193fae. The
 addition of rax and r9 also sets the carry flag thus the final
 add-with-carry of rdx into r10 easily overflows and leaves r10 as
 0x3ab99d72.

 Additionally, I'm not sure about any of the other cases in the same
 function that have been annotated the same way. There is also a
 similar annotation in ecp_nistz256_mul_mont that I've not
 investigated.


 Cheers

 AGL

 __
 OpenSSL Project http://www.openssl.org
 Development Mailing List   openssl-dev@openssl.org
 Automated List Manager   majord...@openssl.org
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[openssl.org #3607] nistz256 is broken.

2014-11-23 Thread Adam Langley via RT
(Affects 1.0.2 only.)

In crypto/ec/asm/ecp_nistz256-x86_64.pl, __ecp_nistz256_sqr_montq,
under Now the reduction there are a number of comments saying
doesn't overflow. Unfortunately, they aren't correct.

Let f be a field element with value
52998265219372519138277318009572834528257482223861497652862868020346603903843.

In Montgomery form, it's represented in memory as f*2^256 mod p, which
is 
58733536287848456860684025065811053850702581988990452502702607007944524443511.

When passed to ecp_nistz256_sqr_mont, this results in the intermediate
value (before any reduction)
0x41dd6e8bcf7e19f499c19d0f5f3bba78272201eee64c6a44ca8a4ff275b53fa93b41d5b7035af3e40a05dc36f424ab9438cdec4fa193faebf6ce951.

r10 in this case contains 0x40a05dc3 and the high-word output
of the multiplication after # First iteration is 0xfa193fad. The
addition of r8 and r9 overflows into it leaving it as 0xfa193fae. The
addition of rax and r9 also sets the carry flag thus the final
add-with-carry of rdx into r10 easily overflows and leaves r10 as
0x3ab99d72.

Additionally, I'm not sure about any of the other cases in the same
function that have been annotated the same way. There is also a
similar annotation in ecp_nistz256_mul_mont that I've not
investigated.


Cheers

AGL

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org