from:"Huang Ying"

Re: [openssl.org #2175] [PATCH] Optimization for 1024 bit RSA on x86_64 platform

2010-03-10 Thread Huang Ying

Hi, All,

On Sat, 2010-02-20 at 14:17 +0100, Huang, Ying via RT wrote:
 Hi, All,
 
 The performance benchmark with openssl speed show about 50%
 performance gain for 1024 bit private RSA.
 
 The optimization is implemented as an engine named RSAX.
 
 Because x86_64 assembly is used in implementation, the optimization is
 only available on x86_64.
 
 More information about the algorithm used can be found in following URL.
 
 http://www.cse.buffalo.edu/srds2009/escs2009_submission_Gopal.pdf

It appears that nobody cares about this patch. Is there something
fundamentally wrong with this patch? Or I should send to someone other
too?

Best Regards,
Huang Ying

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org

A demo implementation of Intel PCLMULQDQ-NI accelerated AES-GCM

2010-02-20 Thread Huang Ying

Hi, All,

To accelerate AES-GCM, a new instruction set named PCLMULQDQ-NI is
introduced by Intel and will be integrated in upcoming Intel CPU. This
patchset provides a demo implementation of Intel PCLMULQDQ-NI
accelerated AES-GCM. Because AES-GCM is used in TLS 1.2 only, a minimal
AES-GCM related TLS 1.2 implementation is provided in patchset too.

This patchset may be combined with the general AES-GCM implementation
contributed by IBM, to provide a full stack.

More information about PCLMULQDQ-NI can be found at:
http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

Best Regards,
Huang Ying



aes_gcm_clmul_ni_patches.tar.gz
Description: application/compressed-tar

What can we do to push AES-NI acceleration patches into 1.0.0 and 0.9.8 branches

2009-10-13 Thread Huang Ying

Hi, All,

We are working on AES-NI acceleration in OpenSSL. With the help of Andy,
we have pushed the AES-NI acceleration patches into OpenSSL CVS
development branch. But It seems that the patches have not been merged
by the 1.0.0 and/or 0.9.8 branches. So We have some questions:

- Is there any rules for a patch to move from CVS development version to
the stable branches (1.0.0 and/or 0.9.8)?

- What can we do to help the moving occurs?

Although there is no machine in market supporting AES-NI now yet. But
AES-NI support will be available in the next generation of Intel
platform Westmere instead of the one after that SandyBridge. And OSV
such as Redhat and Novell are waiting for the AES-NI supporting in
OpenSSL now.

Thanks,
Huang Ying


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-04-01 Thread Huang Ying

On Wed, 2009-04-01 at 03:45 +0800, Andy Polyakov wrote:
Hi,

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

I apologize for delay.

That's all right.

Promised to comment on submission in question.
Well, after some consideration I reckoned that it would take longer to
discuss it than to implement own version of assembler module. Having own
code also makes it easier for me to maintain it:-) The module is
available for preview at
http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are,
all addressed in the new code:

Thank you very much for your work.

- why full unroll?

Just because the unrolled code is not too long.

- why 4x interleave when aesenc latency is [anticipated to be] 6?

Yes. It should be 6, I neglect this important information in white
paper.

- why post-4x processing is done with non-interleaved routine, when
interleaved can be used?

Yes. post-4x processing can be done in interleaved mode. That is faster.

- why not encode all aes instructions with .byte?

Just want to encode all aes instructions after some review. Now I think
maybe we can define aes instructions as perl function and do encoding
via perl.

- instruction scheduling in key setup can be [much] better;

See code and comments in code for further details. I'd appreciate if you
could review and cross-test the code. [Counter-]comments and suggestions
are naturally welcomed. The code will be committed to repository as soon
as remaining issues are resolved. Remaining are build issue (as you
pointed out yourself) and actual tests on Win64. Note that I suggest to
name module eng_aesni-x86_64.pl instead of _asm. This implies that
eventually there will be 32-bit version too.

I will test your code on real machine. And at least you can test the
code with an emulator: SDE, which can be downloaded from following URL:

http://linux.softpedia.com/progDownload/Intel-Software-Development-Emulator-Download-44635.html

Both Linux and windows are supported.

BTW: you want me to prepare the patch or you prepare the patch yourself?

Best Regards,
Huang Ying

signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-04-01 Thread Huang Ying

Hi,

On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote:
  Just because the unrolled code is not too long.
 
 As for non-interleaved loop. Reasoning is that folded loop can be
 inlined in several places to spare few cycles on call overhead. Of
 course this is under premise that it is as fast as unrolled one. Intel
 CPUs used to be very good at small loops, which is why I dared to fold
 the loop. Of course it doesn't have to be the case here and if unrolled
 loop will be proved to be faster, inline code will have to be replaced
 with calls.

Sound reasonable.

  - why not encode all aes instructions with .byte?
  
  Just want to encode all aes instructions after some review. Now I think
  maybe we can define aes instructions as perl function and do encoding
  via perl.
 
 It's done at the end of script.

Yes. Thanks.

  I will test your code on real machine.
 
 There is real machine? Would you care to perform several tests, so that
 we can sort out what's optimal? I mean the folded vs. unrolled, then I
 wonder if my use of .aligns is excessive in *crypt1... I don't demand
 actual figures [in case you can't disclose them], only if/how
 performance is affected... If yes, we can proceed off-list if so desired.

OK. I will do these tests.

1. folded vs. unrolled
2. .align vs no .align in *crypt1

Any other test to added?

I will test with openssl speed and send you the result. I will do the
test tomorrow.

  BTW: you want me to prepare the patch or you prepare the patch yourself?
 
 I'll manage it myself. A.

Can you send me the full patch, so I can test it.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-02-09 Thread Huang Ying

Hi, All,

It seems that Andy is not available from Christmas on. Who can tell me
where can I find him? Or how can I do to have this patch reviewed?

Best Regards,
Huang Ying

On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote:
 This patch adds support to Intel AES-NI instruction set for x86_64
 platform.
 
 Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
 instructions that are going to be introduced in the next generation of
 Intel processor, as of 2009. These instructions enable fast and secure
 data encryption and decryption, using the Advanced Encryption Standard
 (AES), defined by FIPS Publication number 197.  The architecture
 introduces six instructions that offer full hardware support for
 AES. Four of them support high performance data encryption and
 decryption, and the other two instructions support the AES key
 expansion procedure.
 
 The white paper can be downloaded from:
 
 http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
 
 
 AES-NI support is implemented as an engine in crypto/engine/.
 
 
 ChangeLog:
 
 v3:
 
 - Rename INTEL or INTEL_AES stuff to AESNI
 
 - Use cfb and ofb modes implementation of crypto/modes instead of copying.
 
 v2:
 
 - AES-NI support is implemented as an engine instead of branch.
 
 - ECB and CBC modes are implemented in parallel style to take
   advantage of pipelined hardware implementation.
 
 - AES key scheduling algorithm is re-implemented with higher performance.
 
 
 Known issues:
 
 - How to add conditional compilation for eng_intel_asm.pl? It can not
   be compiled on non-x86 platform.
 
 - NID for CTR mode can not be found, how to support it in engine?
 
 - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
   to add AES-NI support for them, I can add them.
 
 
 Signed-off-by: Huang Ying ying.hu...@intel.com
 
 ---
  crypto/engine/Makefile |   11 
  crypto/engine/eng_aesni.c  |  409 ++
  crypto/engine/eng_aesni_asm.pl |  918 
 +
  crypto/engine/eng_all.c|3 
  crypto/engine/engine.h |1 
  5 files changed, 1340 insertions(+), 2 deletions(-)
 
 --- /dev/null
 +++ b/crypto/engine/eng_aesni.c
 @@ -0,0 +1,409 @@
 +/*
 + * Support for Intel AES-NI intruction set
 + *   Author: Huang Ying ying.hu...@intel.com
 + *
 + * Intel AES-NI is a new set of Single Instruction Multiple Data
 + * (SIMD) instructions that are going to be introduced in the next
 + * generation of Intel processor, as of 2009. These instructions
 + * enable fast and secure data encryption and decryption, using the
 + * Advanced Encryption Standard (AES), defined by FIPS Publication
 + * number 197.  The architecture introduces six instructions that
 + * offer full hardware support for AES. Four of them support high
 + * performance data encryption and decryption, and the other two
 + * instructions support the AES key expansion procedure.
 + *
 + * The white paper can be downloaded from:
 + *   
 http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
 + *
 + * This file is based on engines/e_padlock.c
 + */
 +
 +/* 
 + * Copyright (c) 1999-2001 The OpenSSL Project.  All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + *
 + * 1. Redistributions of source code must retain the above copyright
 + *notice, this list of conditions and the following disclaimer.
 + *
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *notice, this list of conditions and the following disclaimer in
 + *the documentation and/or other materials provided with the
 + *distribution.
 + *
 + * 3. All advertising materials mentioning features or use of this
 + *software must display the following acknowledgment:
 + *This product includes software developed by the OpenSSL Project
 + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)
 + *
 + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to
 + *endorse or promote products derived from this software without
 + *prior written permission. For written permission, please contact
 + *licens...@openssl.org.
 + *
 + * 5. Products derived from this software may not be called OpenSSL
 + *nor may OpenSSL appear in their names without prior written
 + *permission of the OpenSSL Project.
 + *
 + * 6. Redistributions of any form whatsoever must retain the following
 + *acknowledgment:
 + *This product includes software developed by the OpenSSL Project
 + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
 + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-23 Thread Huang Ying

On Tue, 2008-12-23 at 23:36 +0800, Geoff Thorpe wrote:
 On Tuesday 23 December 2008 02:01:38 Huang Ying wrote:
  This patch adds support to Intel AES-NI instruction set for x86_64
  platform.
 
 Cool. I'm relying on Andy to provide a more thorough review than my quick 
 scan - I don't do perl-asm :-) In particular, I haven't tried patching 
 and building this. (Andy, let me know if you need any off-platform 
 testing - I presume not, but ...)
 
 Quick comment:
 
  Signed-off-by: Huang Ying ying.hu...@intel.com
 
  ---
   crypto/engine/Makefile |   11
   crypto/engine/eng_all.c|3
   crypto/engine/eng_intel.c  |  589 ++
   crypto/engine/eng_intel_asm.pl |  918
  + 4 files changed, 1519
  insertions(+), 2 deletions(-)
 
 Are you using git to prepare this patch, and if so, which git repo+branch 
 are you tracking?

I use OpenSSL cvs to track upstream version. And I use quilt to prepare
the patch.

  +#define INTEL_AES_MIN_ALIGN16
  +#define ALIGN(x,a) (((unsigned long)(x)+(a)-1)(~((a)-1)))
  +#define INTEL_AES_ALIGN(x) ALIGN(x,INTEL_AES_MIN_ALIGN)
 
 You don't seem to need the ALIGN() macro anywhere, just 
 INTEL_AES_ALIGN(), so I'd personally prefer it if you didn't use ALIGN 
 as this is tempting fate with respect to possible symbol conflicts.

OK. I will fix it.

 Also, if you have no philosophical objection, I think the file and symbol 
 naming should be based on the interface rather than the manufacturer 
 (particularly for intel, who provide lots of h/w and interfaces that 
 have nothing to do with AES-NI). Perhaps eng_aesni.c rather than 
 eng_intel.c. If it's absolutely certain no other manufacturer will 
 support the same instructions in the future, we could live with 
 eng_intel_aesni.c, but it still needs to be clear that the engine 
 targets the AES-NI interface rather than (any) intel interface. (I 
 don't want to handle support questions from x86 noobs who presume 
 the eng_intel.c engine accelerates any intel cpu ...)

I will rename the names to aesni.

 As your use of INTEL_AES_ALIGN() was always to cast it to a pointer, 
 please also rephrase the macro to not need casting every time;
 
 #define AESNI_MIN_ALIGN 16
 #define AESNI_ALIGN(x) \
   (void *)(((unsigned long)(x) + AESNI_MIN_ALIGN - 1)  \
   (~(unsigned long)(AESNI_MIN_ALIGN - 1)))

I will do this.

 Finally - did you omit a patch to engine.h? Your changes to eng_all.c 
 include a call to ENGINE_load_intel_aes_ni(), which is in eng_intel.c, 
 but this doen't appear to be declared in any header.

I will add it.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-23 Thread Huang Ying

On Wed, 2008-12-24 at 00:58 +0800, Andy Polyakov wrote:
  This patch adds support to Intel AES-NI instruction set for x86_64
  platform.
  
  Cool. I'm relying on Andy to provide a more thorough review
 
 Even after short glance I can tell there will be a lot of comments and 
 even work to do, but I'm planning to take it later... ... ... ... ...

Looking forward your further comments.

  Also, if you have no philosophical objection, I think the file and symbol 
  naming should be based on the interface rather than the manufacturer 
  (particularly for intel, who provide lots of h/w and interfaces that 
  have nothing to do with AES-NI). Perhaps eng_aesni.c rather than 
  eng_intel.c.
 
 I second it. Ying, there is nothing preventing us from renaming files 
 and functions (assuming that you have no philosophical objections), but 
 *if* you choose to submit another patch with alternative naming, could 
 you look into crypto/modes and use it? At earlier occasion you commented 
 hope that it can be merged quickly, but it was committed to OpenSSL 
 CVS prior I mentioned it... Or is it that you might have failed to pull 
 it to your repository, but then it's something we have no power to make 
 quicker...

Sorry, I neglect them, I will use them in new patch.

 Out of curiosity, what does NI stand for anyway? Or is it just 
 something the knights kept saying? But didn't they stop doing so? Cheers. A.

NI stands for New Instruction.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-23 Thread Huang Ying

On Wed, 2008-12-24 at 00:58 +0800, Andy Polyakov wrote:
  This patch adds support to Intel AES-NI instruction set for x86_64
  platform.
  
  Cool. I'm relying on Andy to provide a more thorough review
 
 Even after short glance I can tell there will be a lot of comments and 
 even work to do, but I'm planning to take it later... ... ... ... ...
 
  Also, if you have no philosophical objection, I think the file and symbol 
  naming should be based on the interface rather than the manufacturer 
  (particularly for intel, who provide lots of h/w and interfaces that 
  have nothing to do with AES-NI). Perhaps eng_aesni.c rather than 
  eng_intel.c.
 
 I second it. Ying, there is nothing preventing us from renaming files 
 and functions (assuming that you have no philosophical objections), but 
 *if* you choose to submit another patch with alternative naming, could 
 you look into crypto/modes and use it? At earlier occasion you commented 
 hope that it can be merged quickly, but it was committed to OpenSSL 
 CVS prior I mentioned it... Or is it that you might have failed to pull 
 it to your repository, but then it's something we have no power to make 
 quicker...

It seems that crypto/modes is not compiled in libcrypto by default. The
following patch can be used to make it compiled. 

It should be a separate patch or just merged it into AES-NI patch?

---
 Makefile.org |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/Makefile.org
+++ b/Makefile.org
@@ -119,7 +119,7 @@ SDIRS=  \
bn ec rsa dsa ecdsa dh ecdh dso engine \
buffer bio stack lhash rand err \
evp asn1 pem x509 x509v3 conf txt_db pkcs7 pkcs12 comp ocsp ui krb5 \
-   cms pqueue ts jpake
+   cms pqueue ts jpake modes
 # keep in mind that the above list is adjusted by ./Configure
 # according to no-xxx arguments...
 



signature.asc
Description: This is a digitally signed message part

[PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-23 Thread Huang Ying

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197.  The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf


AES-NI support is implemented as an engine in crypto/engine/.


ChangeLog:

v3:

- Rename INTEL or INTEL_AES stuff to AESNI

- Use cfb and ofb modes implementation of crypto/modes instead of copying.

v2:

- AES-NI support is implemented as an engine instead of branch.

- ECB and CBC modes are implemented in parallel style to take
  advantage of pipelined hardware implementation.

- AES key scheduling algorithm is re-implemented with higher performance.


Known issues:

- How to add conditional compilation for eng_intel_asm.pl? It can not
  be compiled on non-x86 platform.

- NID for CTR mode can not be found, how to support it in engine?

- CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
  to add AES-NI support for them, I can add them.


Signed-off-by: Huang Ying ying.hu...@intel.com

---
 crypto/engine/Makefile |   11 
 crypto/engine/eng_aesni.c  |  409 ++
 crypto/engine/eng_aesni_asm.pl |  918 +
 crypto/engine/eng_all.c|3 
 crypto/engine/engine.h |1 
 5 files changed, 1340 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/crypto/engine/eng_aesni.c
@@ -0,0 +1,409 @@
+/*
+ * Support for Intel AES-NI intruction set
+ *   Author: Huang Ying ying.hu...@intel.com
+ *
+ * Intel AES-NI is a new set of Single Instruction Multiple Data
+ * (SIMD) instructions that are going to be introduced in the next
+ * generation of Intel processor, as of 2009. These instructions
+ * enable fast and secure data encryption and decryption, using the
+ * Advanced Encryption Standard (AES), defined by FIPS Publication
+ * number 197.  The architecture introduces six instructions that
+ * offer full hardware support for AES. Four of them support high
+ * performance data encryption and decryption, and the other two
+ * instructions support the AES key expansion procedure.
+ *
+ * The white paper can be downloaded from:
+ *   
http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
+ *
+ * This file is based on engines/e_padlock.c
+ */
+
+/* 
+ * Copyright (c) 1999-2001 The OpenSSL Project.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *
+ * 3. All advertising materials mentioning features or use of this
+ *software must display the following acknowledgment:
+ *This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)
+ *
+ * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to
+ *endorse or promote products derived from this software without
+ *prior written permission. For written permission, please contact
+ *licens...@openssl.org.
+ *
+ * 5. Products derived from this software may not be called OpenSSL
+ *nor may OpenSSL appear in their names without prior written
+ *permission of the OpenSSL Project.
+ *
+ * 6. Redistributions of any form whatsoever must retain the following
+ *acknowledgment:
+ *This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
+ * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE OpenSSL PROJECT OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS

Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

2008-12-22 Thread Huang Ying

On Wed, 2008-12-17 at 22:30 +0800, Andy Polyakov via RT wrote:
  Fix two bugs in .Lcbc_slow_enc_in_place.
  
  - At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be
set to 16.
  - In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb.
 
 Thanks. The problem is addressed but in different way, see 
 http://cvs.openssl.org/chngview?cn=17698.
 
  Signed-off-by: Huang Ying ying.hu...@intel.com
  
  ---
   crypto/aes/asm/aes-x86_64.pl |4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
  
  --- a/crypto/aes/asm/aes-x86_64.pl
  +++ b/crypto/aes/asm/aes-x86_64.pl
  @@ -1994,10 +1994,12 @@ AES_cbc_encrypt:
 
 ??? What is it for version you have? In CVS .Lcbc_slow_enc_in_place 
 resided at line #1974! A.

I use CVS. It's an issue of patch sequence, I put another personal patch
before this one.

And, I find with the simple test program attached with the mail. The
output of CVS is different from that of openssl-0.9.8g if the specified
input length is less than 16.

Best Regards,
Huang Ying

#include openssl/aes.h
#include stdio.h
#include assert.h
#include stdlib.h
#include string.h

void print_arr(unsigned char buf[], int sz, char *prefix)
{
	int i;
	if (prefix)
		printf(%s, prefix);
	for (i = 0; i  sz; i++)
		printf(%02x, buf[i]);
	printf(\n);
}

void test_cbc1(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[16] = 1234567890;
	unsigned char out[16];

	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out),out: );
	//AES_cbc_encrypt(in, in, in_len, key, iv2, 1);
	//print_arr(in, sizeof(in), ip_out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),out: );
}

void test_cbc2(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[32] = 12345678901234567890123456789012;
	unsigned char out[32];

	in_len += 16;
	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out), out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),  in: );
}

void test_cbc3(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[80] = 1234567890123456789012345678901234567890
		1234567890123456789012345678901234567890;
	unsigned char out[80];

	in_len += 64;
	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out), out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),  in: );
}

int main(int argc, char *argv[])
{
	int in_len;

	in_len = argc  1 ? atoi(argv[1]) : 16;
	test_cbc1(in_len);
	test_cbc2(in_len);
	test_cbc3(in_len);
	return 0;
}


signature.asc
Description: This is a digitally signed message part

Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

2008-12-22 Thread Huang, Ying via RT

On Wed, 2008-12-17 at 22:30 +0800, Andy Polyakov via RT wrote:
  Fix two bugs in .Lcbc_slow_enc_in_place.
  
  - At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be
set to 16.
  - In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb.
 
 Thanks. The problem is addressed but in different way, see 
 http://cvs.openssl.org/chngview?cn=17698.
 
  Signed-off-by: Huang Ying ying.hu...@intel.com
  
  ---
   crypto/aes/asm/aes-x86_64.pl |4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
  
  --- a/crypto/aes/asm/aes-x86_64.pl
  +++ b/crypto/aes/asm/aes-x86_64.pl
  @@ -1994,10 +1994,12 @@ AES_cbc_encrypt:
 
 ??? What is it for version you have? In CVS .Lcbc_slow_enc_in_place 
 resided at line #1974! A.

I use CVS. It's an issue of patch sequence, I put another personal patch
before this one.

And, I find with the simple test program attached with the mail. The
output of CVS is different from that of openssl-0.9.8g if the specified
input length is less than 16.

Best Regards,
Huang Ying


#include openssl/aes.h
#include stdio.h
#include assert.h
#include stdlib.h
#include string.h

void print_arr(unsigned char buf[], int sz, char *prefix)
{
	int i;
	if (prefix)
		printf(%s, prefix);
	for (i = 0; i  sz; i++)
		printf(%02x, buf[i]);
	printf(\n);
}

void test_cbc1(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[16] = 1234567890;
	unsigned char out[16];

	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out),out: );
	//AES_cbc_encrypt(in, in, in_len, key, iv2, 1);
	//print_arr(in, sizeof(in), ip_out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),out: );
}

void test_cbc2(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[32] = 12345678901234567890123456789012;
	unsigned char out[32];

	in_len += 16;
	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out), out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),  in: );
}

void test_cbc3(int in_len)
{
	int ret;
	AES_KEY key;
	unsigned char user_key[16] = 123456;
	unsigned char iv1[16] = 9876543210987654;
	unsigned char iv2[16];
	unsigned char in[80] = 1234567890123456789012345678901234567890
		1234567890123456789012345678901234567890;
	unsigned char out[80];

	in_len += 64;
	memcpy(iv2, iv1, sizeof(iv1));
	ret = AES_set_encrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(in, out, in_len, key, iv1, 1);
	print_arr(out, sizeof(out), out: );

	ret = AES_set_decrypt_key(user_key, 128, key);
	assert(!ret);
	AES_cbc_encrypt(out, in, in_len, key, iv2, 0);
	print_arr(in, sizeof(in),  in: );
}

int main(int argc, char *argv[])
{
	int in_len;

	in_len = argc  1 ? atoi(argv[1]) : 16;
	test_cbc1(in_len);
	test_cbc2(in_len);
	test_cbc3(in_len);
	return 0;
}


signature.asc
Description: PGP signature

[PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-22 Thread Huang Ying

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197.  The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf


AES-NI support is implemented as an engine in crypto/engine/.


ChangeLog:

v2:

- AES-NI support is implemented as an engine instead of branch.

- ECB and CBC modes are implemented in parallel style to take
  advantage of pipelined hardware implementation.

- AES key scheduling algorithm is re-implemented with higher performance.


Known issues:

- How to add conditional compilation for eng_intel_asm.pl? It can not
  be compiled on non-x86 platform.

- NID for CTR mode can not be found, how to support it in engine?

- CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
  to add AES-NI support for them, I can add them.


Signed-off-by: Huang Ying ying.hu...@intel.com

---
 crypto/engine/Makefile |   11 
 crypto/engine/eng_all.c|3 
 crypto/engine/eng_intel.c  |  589 ++
 crypto/engine/eng_intel_asm.pl |  918 +
 4 files changed, 1519 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/crypto/engine/eng_intel.c
@@ -0,0 +1,589 @@
+/*
+ * Support for Intel AES-NI intruction set
+ *   Author: Huang Ying ying.hu...@intel.com
+ *
+ * Some code is copied from engines/e_padlock.c
+ *
+ * cfb and ofb mode code is copied from crypto/aes/aes_cfb.c and
+ * crypto/aes/aes_ofb.c.
+ */
+
+/* 
+ * Copyright (c) 1999-2001 The OpenSSL Project.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *
+ * 3. All advertising materials mentioning features or use of this
+ *software must display the following acknowledgment:
+ *This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)
+ *
+ * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to
+ *endorse or promote products derived from this software without
+ *prior written permission. For written permission, please contact
+ *licens...@openssl.org.
+ *
+ * 5. Products derived from this software may not be called OpenSSL
+ *nor may OpenSSL appear in their names without prior written
+ *permission of the OpenSSL Project.
+ *
+ * 6. Redistributions of any form whatsoever must retain the following
+ *acknowledgment:
+ *This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
+ * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE OpenSSL PROJECT OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ * 
+ *
+ * This product includes cryptographic software written by Eric Young
+ * (e...@cryptsoft.com).  This product includes software written by Tim
+ * Hudson (t...@cryptsoft.com).
+ *
+ */
+
+
+#include openssl/opensslconf.h
+
+#if !defined(OPENSSL_NO_HW)  !defined(OPENSSL_NO_HW_INTEL_AES_NI)  
!defined(OPENSSL_NO_AES)
+
+#define INTEL_AES_MIN_ALIGN16
+#define ALIGN(x,a) (((unsigned long)(x)+(a)-1)(~((a)-1)))
+#define INTEL_AES_ALIGN(x

[openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

2008-12-17 Thread Huang, Ying via RT

Fix two bugs in .Lcbc_slow_enc_in_place.

- At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be
  set to 16.
- In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb.

Signed-off-by: Huang Ying ying.hu...@intel.com

---
 crypto/aes/asm/aes-x86_64.pl |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/crypto/aes/asm/aes-x86_64.pl
+++ b/crypto/aes/asm/aes-x86_64.pl
@@ -1994,10 +1994,12 @@ AES_cbc_encrypt:
 .Lcbc_slow_enc_in_place:
mov \$16,%rcx   # zero tail
sub %r10,%rcx
+   mov $out,%rdi
+   add %r10,%rdi
xor %rax,%rax
.long   0x9066AAF3  # rep stosb
mov $out,$inp   # this is not a mistake!
-   movq\$16,$_len  # len=16
+   movq\$16,%r10   # len=16
jmp .Lcbc_slow_enc_loop # one more spin...
 #--- SLOW DECRYPT ---#
 .align 16




signature.asc
Description: PGP signature

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-16 Thread Huang Ying

On Tue, 2008-12-16 at 21:30 +0800, Andy Polyakov wrote:
  Implementation aiming to complement interface exposed by crypto/aes/asm 
  should allow for non-16-byte-aligned key schedule. Period. One can use 
  movups, or check alignment and choose between movups and movaps code 
  paths, or copy key schedule to aligned location on stack.
  
  Should it be considered an unsafe behavior to copy key schedule to
  stack? The stack maybe swapped out to a swap file, so that the key
  schedule is leaked.
 
 Not that I'm answering the question with a question, but what's wrong 
 with movups? I mean I consider that the question about copying key 
 schedule was already discussed in enough detail, but I'm left with 
 feeling that other options are even less preferable. So I wonder why? Is 
 movups expected to so much worse? Even if input is moderately 
 misaligned, say at 64-bit boundary instead of 128? Can't one pipe-line 
 it, i.e. schedule movups longer before aes[enc|dec], to amortize' 
 additional latency, or will it be non-pipe-line-able? Even if so, it's 
 possible to compromise having movaps and palignr. If one chooses to 
 declare key schedule to ensure 64-bit alignment(*) it would really have 
 to be a single palignr case... A.

That sounds interesting. At least deserve try.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-16 Thread Huang Ying

On Tue, 2008-12-16 at 19:12 +0800, Andy Polyakov wrote:
  The cipher and digest support is at the granularity of nids, and
  these combine algorithm, key-length, and mode. So if you implement
  support for those cipher,length,mode combinations that can be
  accelerated by AES-NI, your engine will only be invoked for those
  combinations. You're not obliged to implement anything else, and
  indeed there is nothing to be gained by doing so.
  The situation is:
 
  - We implement cbc and ecb mode in engine
  - If we implement cfb and ofb in engine too, we will duplicate code of
  cfb and ofb mode itself.
 
 The plan is to consolidate mode implementations, so it doesn't have to 
 be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692.

Good! Hope that can be merged quickly.

  - If we do not implement cfb and ofb in engine, no code duplication,
  BUT we can NOT get AES-NI acceleration for AES core block algorithm
  (which benefit cfb and ofb too) until we have a branch version.
  
  OK, I (mis)understood from your original mail that you could only 
  accelerate a subset of modes.
 
 Just to clarify CBC situation. While it's absolutely correct that 
 *de*cryption is the one that can take full advantage of pipe-lining, 
 dedicated *en*cryption procedure should also be implemented in 
 assembler. Why? It doesn't come as surprise that CBC timing is sum of 
 time spent in block procedure and time spent performing the block 
 chaining. The latter can be underestimated and as block procedure gets 
 faster it actually becomes underestimated. I reckon that with 4x faster 
 block procedure, C timing for block chaining would be comparable  with 
 block procedure. This in turn means that overall performance would be 
 almost twice as low as if chaining was implemented in assembler. This 
 applies to x86_64, on x86 performance loss would be even higher...

OK, I will implement CBC encryption with ASM too.

  If you can accelerate them all, then 
  please do so by implementing an intel/aes-ni engine. But not by 
  branching in the vanilla implementation.
  
  So my suggestion is:
 
  - Accelerate AES core block algorithm with branch version. Which is
  used by cbc, cfb and ofb too.
 
  - Accelerate AES ecb and ctr? with engine version.
  
  And my suggestion is:
  
  - write an engine for your hardware.
 
 I second it. And additional note. As padlock engine was mentioned, I can 
 imagine that the idea of using inline assembler will pop up in the head. 
 Please don't! As already mentioned we support other compilers as well 
 and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it 
 might be acceptable (both GNU and Microsoft compilers support inline 
 assembler), but not in 64-bit case (GNU is the only one supporting 
 inline assembler).

OK. I will use same format as aes-x86_64.pl.

 As for FIPS. Given current precedent it should be noted that if branch 
 version is certified, then the branch becomes bound to be taken. In 
 other words branched version would be prohibited to reach certified 
 mode of operation on CPU that does not support the instruction set 
 extension in question. Then why does it have to be branched? Having this 
 in mind wouldn't it make as much sense to implement module that can be 
 used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who 
 are willing to pursue certification for given hardware can do so with 
 not so much hassle(*)? Would it be effort duplication? Does not have to 
 be! Because the code can be used in engine context just as well...
 
 Now to practicalities. What I can do to help. I can put together perl 
 scripts for x86_64 and x86, which can be used as drop-in replacement for 
 aes-[586|x86_64].pl as well as in engine context. Note that drop-in 
 replacement implies presence of CBC procedure, though I'd be reluctant 
 to implement pipe-lined version. At least not without further 
 consideration, because it might turn out that pipe-lined version doesn't 
 have to monolithic. Most notably one can break decryption into 
 multi-block ECB and separate multi-block chaining to minimize developing 
 effort. A.

Thank you very much. I can change the format to perl format, but need
your help to test it on Windows 64 and fix some issue such as SSE
operands.

I think AES-NI based pipelined implementation can be a start point for
general version.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-14 Thread Huang Ying

On Fri, 2008-12-12 at 12:24 +0800, Geoff Thorpe wrote:
 On Thursday 11 December 2008 23:02:12 Huang Ying wrote:
  On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote:
   The cipher and digest support is at the granularity of nids, and
   these combine algorithm, key-length, and mode. So if you implement
   support for those cipher,length,mode combinations that can be
   accelerated by AES-NI, your engine will only be invoked for those
   combinations. You're not obliged to implement anything else, and
   indeed there is nothing to be gained by doing so.
 
  The situation is:
 
  - We implement cbc and ecb mode in engine
  - If we implement cfb and ofb in engine too, we will duplicate code of
  cfb and ofb mode itself.
  - If we do not implement cfb and ofb in engine, no code duplication,
  BUT we can NOT get AES-NI acceleration for AES core block algorithm
  (which benefit cfb and ofb too) until we have a branch version.
 
 OK, I (mis)understood from your original mail that you could only 
 accelerate a subset of modes. If you can accelerate them all, then 
 please do so by implementing an intel/aes-ni engine. But not by 
 branching in the vanilla implementation.
 
  So my suggestion is:
 
  - Accelerate AES core block algorithm with branch version. Which is
  used by cbc, cfb and ofb too.
 
  - Accelerate AES ecb and ctr? with engine version.
 
 And my suggestion is:
 
 - write an engine for your hardware.

OK. I will write an engine version firstly, at least for discussion.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-11 Thread Huang Ying

On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote:
 On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote:
  Anything in memory could end up swapped out, but stack is the least
  likely since it's more often in use, the best you can do is zero the
  area ASAP.
 
  My other objection to putting all of this into an engine is that the
  engine code is unusable in quite a few cases. Export approvals, and
  certifications like FIPS and Common Criteria all pretty much insist
  that the crypto. isn't replaceable by some random chunk of code,
  that's not an OpenSSL issue as such, but it's going to be awkward for
  some subset of OpenSSL consumers. There are ways around those issues,
  but I doubt you really want to add the option of signature checking
  engine plugins ?.
 
 Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ 
 rather ./engines/) and the intention is that they should be the 
 implementation de base for those build targets to which they apply. 
 Cryptodev is the only one so far, but there could be others. In fact, 
 the padlock support for VIA chips (which is comparable to what's being 
 discussed here, with all due respect to the intel instruction-set 
 faithful) sits in ./engines like any other h/w support - a similar 
 argument could be made that, on chips that support it, it should provide 
 the default implementation(s), but right now they've been happy enough 
 to make it a non-default option.

The difference between Intel AES-NI and padlock is that padlock provide
support for different modes directly including ecb, cbc, cfb and ofb,
while Intel AES-NI just provides instructions for AES core block
algorithm NOT for modes directly. At the same time, AES-NI pipelining
implementation can benefit ecb encrypt and cbc decrypt and counter
mode. 

If we implement AES-NI with branch, we can get full power of AES-NI
except ecb and ctr mode.

If we implement AES-NI with engine, we can get full power of AES-NI
for all modes. But we must duplicate mode implementations that can not
benefit from AES-NI, such as cfb, ofb, etc.

Do you OK with code duplication?

So I think the best method is to implement AES-NI with both branch and
engine. With branch version, we get full power of AES-NI for cbc, cfb
and ofb mode. At the same time the engine version can provide further
acceleration for ecb and ctr mode.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-11 Thread Huang Ying

On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote:
 Anything in memory could end up swapped out, but stack is the least likely
 since it's more often in use, the best you can do is zero the area ASAP.

At least on UNIX system, mlock() can be used to prevent specified memory
range from swapping out. Maybe we should put all key schedule into a
memory area protected by mlock()? That is safer than stack I think.

Best Regards,
Huang Ying

 My other objection to putting all of this into an engine is that the engine
 code is unusable in quite a few cases. Export approvals, and certifications
 like FIPS and Common Criteria all pretty much insist that the crypto. isn't
 replaceable by some random chunk of code, that's not an OpenSSL issue as
 such, but it's going to be awkward for some subset of OpenSSL consumers.
 There are ways around those issues, but I doubt you really want to add the
 option of signature checking engine plugins ?.
 
 Perhaps a compromise ?. Put the generic AES speedup into the core, and the
 extra modes where you gain the big performance boosts into an engine ?.
 
 Peter
 
 
   
  
   From:   Huang Ying ying.hu...@intel.com   
  
   
  
   To: openssl-dev@openssl.org openssl-dev@openssl.org 
  
   
  
   Date:   12/11/2008 05:06 PM 
  
   
  
   Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for 
   x86_64 platform
   
  
 
 
 
 
 
 On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote:
  Implementation aiming to complement interface exposed by crypto/aes/asm
  should allow for non-16-byte-aligned key schedule. Period. One can use
  movups, or check alignment and choose between movups and movaps code
  paths, or copy key schedule to aligned location on stack.
 
 Should it be considered an unsafe behavior to copy key schedule to
 stack? The stack maybe swapped out to a swap file, so that the key
 schedule is leaked.
 
 Best Regards,
 Huang Ying
 
 [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM]
 
 
 __
 OpenSSL Project http://www.openssl.org
 Development Mailing List   openssl-dev@openssl.org
 Automated List Manager   majord...@openssl.org


signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-11 Thread Huang Ying

On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote:
 On Thursday 11 December 2008 20:39:41 Huang Ying wrote:
  On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote:
   Engines like eng_cryptodev.c *are* built in (they're in
   ./crypto/engine/ rather ./engines/) and the intention is that they
   should be the implementation de base for those build targets to
   which they apply. Cryptodev is the only one so far, but there could
   be others. In fact, the padlock support for VIA chips (which is
   comparable to what's being discussed here, with all due respect to
   the intel instruction-set faithful) sits in ./engines like any other
   h/w support - a similar argument could be made that, on chips that
   support it, it should provide the default implementation(s), but
   right now they've been happy enough to make it a non-default option.
 
  The difference between Intel AES-NI and padlock is that padlock
  provide support for different modes directly including ecb, cbc, cfb
  and ofb, while Intel AES-NI just provides instructions for AES core
  block algorithm NOT for modes directly. At the same time, AES-NI
  pipelining implementation can benefit ecb encrypt and cbc decrypt and
  counter mode.
 
  If we implement AES-NI with branch, we can get full power of AES-NI
  except ecb and ctr mode.
 
  If we implement AES-NI with engine, we can get full power of AES-NI
  for all modes. But we must duplicate mode implementations that can not
  benefit from AES-NI, such as cfb, ofb, etc.
 
  Do you OK with code duplication?
 
 The cipher and digest support is at the granularity of nids, and these 
 combine algorithm, key-length, and mode. So if you implement support for 
 those cipher,length,mode combinations that can be accelerated by AES-NI, 
 your engine will only be invoked for those combinations. You're not 
 obliged to implement anything else, and indeed there is nothing to be 
 gained by doing so.

The situation is:

- We implement cbc and ecb mode in engine
- If we implement cfb and ofb in engine too, we will duplicate code of
cfb and ofb mode itself.
- If we do not implement cfb and ofb in engine, no code duplication, BUT
we can NOT get AES-NI acceleration for AES core block algorithm (which
benefit cfb and ofb too) until we have a branch version.


So my suggestion is:

- Accelerate AES core block algorithm with branch version. Which is
used by cbc, cfb and ofb too.

- Accelerate AES ecb and ctr? with engine version.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying

On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote:
  - and $-16, %rdx is unacceptable in this context. The relevant
  interface is exposed to end-user and we have to reserve for possibility
  that key schedule is memcpy-ed to location with alternative alignment;
  
  Does there any other mechanism to deal with alignment issue in OpenSSL?
 
 The answer is engine.

In engine, I can just just re-align the expanded key address because it
is not exposed to user? Something as follow:

typedef struct
{
   AES_KEY ks;
   unsigned int _pad[3];
} INTEL_AES_KEY;

IMPLEMENT_BLOCK_CIPHER(intel_aes_128, ks, intel_AES, INTEL_AES_KEY,
   NID_aes_128, 16, 16, 16, 128,
   0, intel_aes_init_key, NULL,
   EVP_CIPHER_set_asn1_iv,
   EVP_CIPHER_get_asn1_iv,
   NULL)

BTW: The comments of AES_KEY in aes.h says:
/* This should be a hidden type, but EVP requires that the size be 
known */

Does this means AES_KEY is not a public interface and user should not
take use of its internal implementation?

  - implementation should allow for pipelining;
 
  As for the latter. I refer to possibility of scheduling of multiple
  AESENC/DEC with same key schedule element and multiple data chunks. It's
  possible in modes that allow for parallelization (e.g. ECB, CBC decrypt,
  CTR), and as far as I understand it is even recommended. So we are kind
  of obliged to reserve for this option.
 
  The answer is engine. I mean this preferably should be implemented as
  engine that will be able to take full advantage of architecture, not as
  patch to general purpose block function.
  
  But as Peter Waltenberg said, engine has its issue too.
 
 Yes, and the relevant question is if it worth it.

  At least we
  should have a branch based version (may be slower) to benefit most
  users, until we can make engine version usable by most users.
 
 There is no hardware in sight, so until is not really an argument. One 
 can reserve for branch version as back-up/exit plan, i.e. in case, 
 but not until.

ECB, CBC decrypt, CTR can benefit from AES-NI pipelining. But other
modes can not. So maybe we should have both branch version and
engine version. Branch version used for other modes and CBC decrypt,
while engine version used for ECB and CTR modes.

BTW: Is ECB used widely in practice?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying

On Wed, 2008-12-10 at 16:01 +0800, Andy Polyakov wrote:
  I doubt the OS vendors would bother
  to enable an engine by default, testing of the possible configurations is
  expensive and the costs of support calls if they mess up makes
  autodetecting the engine to use a very unattractive proposition.
  One can discuss loading selected engines by default, i.e. you'd have to
  work to not load it:-) Then it wouldn't be any different, yet provide
  
  I am new to OpenSSL. Can you tell me how to do that? how to use the
  proper engine automatically?
 
 I said one can discuss it, there is no way currently, but as it's 
 *soft*ware there is hardly limit for what one can do. A.

What's your idea about that? It seems that EVP_CipherInit_ex() will
check engines. And AES-NI engine can register itself upon there is
appropriate CPUID bit set.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying

On Wed, 2008-12-10 at 17:22 +0800, Andy Polyakov wrote:
  - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable
  for both Unix and Win64;
  
  OK. I will follow the way like that in aes-x86_64.pl to deal with ABI
  issue.
 
 Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so 
 some SIMD instructions can't be handled. Some adjustments for 
 crypto/perlasm are required...

I just know a little about perl and I have no windows 64 machine to
test. Can you help me to do that?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying

On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote:
 Implementation aiming to complement interface exposed by crypto/aes/asm 
 should allow for non-16-byte-aligned key schedule. Period. One can use 
 movups, or check alignment and choose between movups and movaps code 
 paths, or copy key schedule to aligned location on stack.

Should it be considered an unsafe behavior to copy key schedule to
stack? The stack maybe swapped out to a swap file, so that the key
schedule is leaked.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

[PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-09 Thread Huang Ying

This patch adds support to Intel AES-NI instruction set for x86_64
platform.

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197.  The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf


- AES implementation based on AES-NI is put in crypto/aes/asm/aes-intel.S

- AES-NI operates on XMM registers, so the key structure need to be
  128-bit aligned. A pad field is added to AES_KEY and key structure
  is aligned to 128-bit boundary in entry of AES-NI implementation.

- In entry point of AES algorithm in crypto/aes/asm/aes-x86_64.pl,
  OPENSSL_ia32cap_P is checked, if corresponding bit (57) is set,
  branch into AES-NI based implementation.

- AES-NI based implementation can not benefit from a specialized
  AES_cbc_encrypt, so its general C implementation is used. To resolve
  the name conflict, original AES_cbc_encrypt is renamed to
  AES_cbc_encrypt_def and put in crypto/aes/aes_cbc_def.c.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Configure|   20 +-
 crypto/aes/Makefile  |9 -
 crypto/aes/aes.h |5 
 crypto/aes/aes_cbc.c |   66 ---
 crypto/aes/aes_cbc_def.c |  130 ++
 crypto/aes/asm/aes-intel.S   |  374 +++
 crypto/aes/asm/aes-x86_64.pl |   20 ++
 7 files changed, 546 insertions(+), 78 deletions(-)

--- /dev/null
+++ b/crypto/aes/asm/aes-intel.S
@@ -0,0 +1,374 @@
+/*
+ * 
+ * Written by Huang Ying [EMAIL PROTECTED] for the OpenSSL
+ * project to add support for Intel new AES instructions. Rights for
+ * redistribution and usage in source and binary forms are granted
+ * according to the OpenSSL license.
+ * 
+ */
+
+.align 16
+key_expansion_128:
+   movaps %xmm1, %xmm4
+   psrldq $12, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm1, %xmm0
+
+   movaps %xmm0, (%rcx)
+   add $0x10, %rcx
+   ret
+
+.align 16
+key_expansion_192:
+   pshufd $0b01010101, %xmm1, %xmm1
+   movaps %xmm1, %xmm4
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm1, %xmm0
+
+   pshufd $0b, %xmm0, %xmm3
+   pxor %xmm2, %xmm3
+   palignr $12, %xmm0, %xmm3
+   pxor %xmm2, %xmm3
+
+   test %r9, %r9
+   not %r9
+   jnz 1f
+
+   movaps %xmm0, %xmm1
+   pslldq $8, %xmm2
+   palignr $8, %xmm2, %xmm1
+   movaps %xmm1, (%rcx)
+   add $0x10, %rcx
+   movaps %xmm3, %xmm2
+   palignr $8, %xmm0, %xmm3
+   movaps %xmm3, (%rcx)
+   add $0x10, %rcx
+   ret
+1:
+   movaps %xmm0, (%rcx)
+   add $0x10, %rcx
+   movaps %xmm3, %xmm2
+   ret
+
+.align 16
+key_expansion_256:
+   movaps %xmm1, %xmm4
+   psrldq $12, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm0, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm1, %xmm0
+
+   movaps %xmm0, (%rcx)
+   add $0x10, %rcx
+
+   test %r9, %r9
+   jnz 1f
+
+   # aeskeygenassist $0x1, %xmm0, %xmm1
+   .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01
+
+   pshufd $0b10101010, %xmm1, %xmm1
+   movaps %xmm1, %xmm4
+   pxor %xmm2, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm2, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm2, %xmm1
+   palignr $12, %xmm4, %xmm1
+   pxor %xmm1, %xmm2
+
+   movaps %xmm2, (%rcx)
+   add $0x10, %rcx
+1:
+   ret
+
+.align 16
+.global intel_AES_set_encrypt_key
+intel_AES_set_encrypt_key:
+   test %rdi, %rdi
+   jz 3f
+   test %rdx, %rdx
+   jz 3f
+   add $0xf, %rdx  # make key struct 128-bit aligned
+   and $0xfff0, %rdx
+   movups (%rdi), %xmm0# user key (first 16 bytes)
+   movaps %xmm0, (%rdx)
+   lea 0x10(%rdx), %rcx# key addr
+   cmp $256, %esi
+   jnz 1f
+   mov $14, %esi
+   movl %esi, 240(%rdx)# 14 rounds for 256
+   movups 0x10(%rdi), %xmm2

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-09 Thread Huang Ying

On Wed, 2008-12-10 at 03:40 +0800, Andy Polyakov wrote:
 As for RFC part. NO! This is NOT the way to do it. For several reasons
 (in ascending order of importance):
 
 - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable
 for both Unix and Win64;

OK. I will follow the way like that in aes-x86_64.pl to deal with ABI
issue.

 - and $-16, %rdx is unacceptable in this context. The relevant
 interface is exposed to end-user and we have to reserve for possibility
 that key schedule is memcpy-ed to location with alternative alignment;

Does there any other mechanism to deal with alignment issue in OpenSSL?
Is it better to declare AES_KEY as follow:

struct aes_key_st {
unsigned int rd_key[4 *(AES_MAXNR + 1)];
int rounds;
} __attribute__ ((aligned (16)));

And how to deal with memory allocated with malloc()?

 - zero-copy CBC routine gives a fair performance improvement even in
 ordinary case, and driving ultra-fast block function from C would be
 just wasteful. In other words AESENC/DEC would benefit more from
 dedicated CBC routine (see even comment below);

I will do more investigation on that.

 - implementation should allow for pipelining;
 
 As for the latter. I refer to possibility of scheduling of multiple
 AESENC/DEC with same key schedule element and multiple data chunks. It's
 possible in modes that allow for parallelization (e.g. ECB, CBC decrypt,
 CTR), and as far as I understand it is even recommended. So we are kind
 of obliged to reserve for this option.
 
 The answer is engine. I mean this preferably should be implemented as
 engine that will be able to take full advantage of architecture, not as
 patch to general purpose block function.

But as Peter Waltenberg said, engine has its issue too. At least we
should have a branch based version (may be slower) to benefit most
users, until we can make engine version usable by most users.

  This patch adds support to Intel AES-NI instruction set for x86_64
  platform.
  
  Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
  instructions that are going to be introduced in the next generation of
  Intel processor, as of 2009.
 
 Hardware however is not expected before 2010, right? A.

Maybe 2009 or 2010, I don't know that exactly too.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-09 Thread Huang Ying

On Wed, 2008-12-10 at 04:58 +0800, Peter Waltenberg wrote:
 If you want this in the mainstream code, you'll need to detect the
 capability at runtime and use your alternate code paths only if the
 hardware is present. It's not even to Intels advantage if OpenSSL crashes
 and burns on older Intel CPU's and most bulk users of OpenSSL (OS vendors)
 won't want to mess around installing different OpenSSL versions for
 different hardware.
 
 Autodetection is the best option if the detection overhead is reasonable -
 take a look at crypto/x86_64cpuid.pl for how to do the detection logic
 neatly.
 There are advantages in this being present all the time/dynamically enabled
 if it can be done, most users/OS vendors wouldn't bother to configure an
 engine backend anyway.

Auto-detection has been implemented in patch.

- In entry point of AES algorithm in crypto/aes/asm/aes-x86_64.pl,
  OPENSSL_ia32cap_P is checked, if corresponding bit (57) is set,
  branch into AES-NI based implementation.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-09 Thread Huang Ying

On Wed, 2008-12-10 at 05:47 +0800, Andy Polyakov wrote:
  I doubt the OS vendors would bother
  to enable an engine by default, testing of the possible configurations is
  expensive and the costs of support calls if they mess up makes
  autodetecting the engine to use a very unattractive proposition.
 
 One can discuss loading selected engines by default, i.e. you'd have to
 work to not load it:-) Then it wouldn't be any different, yet provide

I am new to OpenSSL. Can you tell me how to do that? how to use the
proper engine automatically?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part

[openssl.org #1690] BN_GF2m_mod_arr() infinite loop

2008-06-03 Thread Huang, Ying via RT

The following code will make BN_GF2m_mod_arr() into infinite loop.

int main(int argc, char *argv[])
{
BIGNUM *bn = NULL, *res = NULL, *p = NULL;

BN_hex2bn(bn3, 448692853686179295b477565726f6e5d);
BN_hex2bn(p,   10087);
res = BN_new();
BN_GF2m_mod(res, bn3, p);
}

Because in final round of reduction d0 == 0 and z[dN] != 0, which
makes z[dN] can not be changed for ever. This is fixed by set
z[dn] = 0 if d0 == 0.

This patch is based on openssl SNAPSHOT 20080519, and has been tested
on x86_64 with openssl/test/bntest.c and above program.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 crypto/bn/bn_gf2m.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/crypto/bn/bn_gf2m.c
+++ b/crypto/bn/bn_gf2m.c
@@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG
if (zz == 0) break;
d1 = BN_BITS2 - d0;

-   if (d0) z[dN] = (z[dN]  d1)  d1; /* clear up the top d1 
bits */
+   /* clear up the top d1 bits */
+   if (d0)
+   z[dN] = (z[dN]  d1)  d1;
+   else
+   z[dN] = 0;
z[0] ^= zz; /* reduction t^0 component */
 
for (k = 1; p[k] != 0; k++)


The following code will make BN_GF2m_mod_arr() into infinite loop.

int main(int argc, char *argv[])
{
	BIGNUM *bn = NULL, *res = NULL, *p = NULL;

	BN_hex2bn(bn3, 448692853686179295b477565726f6e5d);
	BN_hex2bn(p,   10087);
	res = BN_new();
	BN_GF2m_mod(res, bn3, p);
}

Because in final round of reduction d0 == 0 and z[dN] != 0, which
makes z[dN] can not be changed for ever. This is fixed by set
z[dn] = 0 if d0 == 0.

This patch is based on openssl SNAPSHOT 20080519, and has been tested
on x86_64 with openssl/test/bntest.c and above program.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 crypto/bn/bn_gf2m.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/crypto/bn/bn_gf2m.c
+++ b/crypto/bn/bn_gf2m.c
@@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG
 		if (zz == 0) break;
 		d1 = BN_BITS2 - d0;
 		
-		if (d0) z[dN] = (z[dN]  d1)  d1; /* clear up the top d1 bits */
+		/* clear up the top d1 bits */
+		if (d0)
+			z[dN] = (z[dN]  d1)  d1;
+		else
+			z[dN] = 0;
 		z[0] ^= zz; /* reduction t^0 component */
 
 		for (k = 1; p[k] != 0; k++)
OpenSSL self-test report:

OpenSSL version:  0.9.9-dev
Last change:  To support arbitrarily-typed thread IDs, deprecate the ...
Options:   no-gmp no-krb5 no-mdc2 no-rc5 no-rfc3779 no-shared no-zlib 
no-zlib-dynamic static-engine
OS (uname):   Linux caritas-dev 2.6.24-1-amd64 #1 SMP Fri Apr 18 23:08:22 
UTC 2008 x86_64 GNU/Linux
OS (config):  x86_64-whatever-linux2
Target (default): linux-x86_64
Target:   linux-x86_64
Compiler: Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 
--program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug 
--enable-objc-gc --enable-mpfr --disable-libmudflap --enable-checking=release 
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Debian 4.2.4-1)

Test passed.

[BUGFIX] BN_GF2m_mod_arr() infinite loop

2008-05-28 Thread Huang, Ying

The following code will make BN_GF2m_mod_arr() into infinite loop.

int main(int argc, char *argv[])
{
BIGNUM *bn = NULL, *res = NULL, *p = NULL;

BN_hex2bn(bn3, 448692853686179295b477565726f6e5d);
BN_hex2bn(p,   10087);
res = BN_new();
BN_GF2m_mod(res, bn3, p);
}

Because in final round of reduction d0 == 0 and z[dN] != 0, which
makes z[dN] can not be changed for ever. This is fixed by set
z[dn] = 0 if d0 == 0.

This patch is based on openssl SNAPSHOT 20080519, and has been tested
on x86_64 with openssl/test/bntest.c and above program.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 crypto/bn/bn_gf2m.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/crypto/bn/bn_gf2m.c
+++ b/crypto/bn/bn_gf2m.c
@@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG
if (zz == 0) break;
d1 = BN_BITS2 - d0;

-   if (d0) z[dN] = (z[dN]  d1)  d1; /* clear up the top d1 
bits */
+   /* clear up the top d1 bits */
+   if (d0)
+   z[dN] = (z[dN]  d1)  d1;
+   else
+   z[dN] = 0;
z[0] ^= zz; /* reduction t^0 component */
 
for (k = 1; p[k] != 0; k++)

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]

Re: [openssl.org #2175] [PATCH] Optimization for 1024 bit RSA on x86_64 platform

A demo implementation of Intel PCLMULQDQ-NI accelerated AES-GCM

What can we do to push AES-NI acceleration patches into 1.0.0 and 0.9.8 branches

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

[PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

[PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform

[openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

[PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

[openssl.org #1690] BN_GF2m_mod_arr() infinite loop

[BUGFIX] BN_GF2m_mod_arr() infinite loop

30 matches

Site Navigation

Mail list logo

Footer information