Re: [openssl.org #2175] [PATCH] Optimization for 1024 bit RSA on x86_64 platform
Hi, All, On Sat, 2010-02-20 at 14:17 +0100, Huang, Ying via RT wrote: Hi, All, The performance benchmark with openssl speed show about 50% performance gain for 1024 bit private RSA. The optimization is implemented as an engine named RSAX. Because x86_64 assembly is used in implementation, the optimization is only available on x86_64. More information about the algorithm used can be found in following URL. http://www.cse.buffalo.edu/srds2009/escs2009_submission_Gopal.pdf It appears that nobody cares about this patch. Is there something fundamentally wrong with this patch? Or I should send to someone other too? Best Regards, Huang Ying __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
A demo implementation of Intel PCLMULQDQ-NI accelerated AES-GCM
Hi, All, To accelerate AES-GCM, a new instruction set named PCLMULQDQ-NI is introduced by Intel and will be integrated in upcoming Intel CPU. This patchset provides a demo implementation of Intel PCLMULQDQ-NI accelerated AES-GCM. Because AES-GCM is used in TLS 1.2 only, a minimal AES-GCM related TLS 1.2 implementation is provided in patchset too. This patchset may be combined with the general AES-GCM implementation contributed by IBM, to provide a full stack. More information about PCLMULQDQ-NI can be found at: http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/ Best Regards, Huang Ying aes_gcm_clmul_ni_patches.tar.gz Description: application/compressed-tar
What can we do to push AES-NI acceleration patches into 1.0.0 and 0.9.8 branches
Hi, All, We are working on AES-NI acceleration in OpenSSL. With the help of Andy, we have pushed the AES-NI acceleration patches into OpenSSL CVS development branch. But It seems that the patches have not been merged by the 1.0.0 and/or 0.9.8 branches. So We have some questions: - Is there any rules for a patch to move from CVS development version to the stable branches (1.0.0 and/or 0.9.8)? - What can we do to help the moving occurs? Although there is no machine in market supporting AES-NI now yet. But AES-NI support will be available in the next generation of Intel platform Westmere instead of the one after that SandyBridge. And OSV such as Redhat and Novell are waiting for the AES-NI supporting in OpenSSL now. Thanks, Huang Ying __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2009-04-01 at 03:45 +0800, Andy Polyakov wrote: Hi, This patch adds support to Intel AES-NI instruction set for x86_64 platform. I apologize for delay. That's all right. Promised to comment on submission in question. Well, after some consideration I reckoned that it would take longer to discuss it than to implement own version of assembler module. Having own code also makes it easier for me to maintain it:-) The module is available for preview at http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are, all addressed in the new code: Thank you very much for your work. - why full unroll? Just because the unrolled code is not too long. - why 4x interleave when aesenc latency is [anticipated to be] 6? Yes. It should be 6, I neglect this important information in white paper. - why post-4x processing is done with non-interleaved routine, when interleaved can be used? Yes. post-4x processing can be done in interleaved mode. That is faster. - why not encode all aes instructions with .byte? Just want to encode all aes instructions after some review. Now I think maybe we can define aes instructions as perl function and do encoding via perl. - instruction scheduling in key setup can be [much] better; See code and comments in code for further details. I'd appreciate if you could review and cross-test the code. [Counter-]comments and suggestions are naturally welcomed. The code will be committed to repository as soon as remaining issues are resolved. Remaining are build issue (as you pointed out yourself) and actual tests on Win64. Note that I suggest to name module eng_aesni-x86_64.pl instead of _asm. This implies that eventually there will be 32-bit version too. I will test your code on real machine. And at least you can test the code with an emulator: SDE, which can be downloaded from following URL: http://linux.softpedia.com/progDownload/Intel-Software-Development-Emulator-Download-44635.html Both Linux and windows are supported. BTW: you want me to prepare the patch or you prepare the patch yourself? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote: Just because the unrolled code is not too long. As for non-interleaved loop. Reasoning is that folded loop can be inlined in several places to spare few cycles on call overhead. Of course this is under premise that it is as fast as unrolled one. Intel CPUs used to be very good at small loops, which is why I dared to fold the loop. Of course it doesn't have to be the case here and if unrolled loop will be proved to be faster, inline code will have to be replaced with calls. Sound reasonable. - why not encode all aes instructions with .byte? Just want to encode all aes instructions after some review. Now I think maybe we can define aes instructions as perl function and do encoding via perl. It's done at the end of script. Yes. Thanks. I will test your code on real machine. There is real machine? Would you care to perform several tests, so that we can sort out what's optimal? I mean the folded vs. unrolled, then I wonder if my use of .aligns is excessive in *crypt1... I don't demand actual figures [in case you can't disclose them], only if/how performance is affected... If yes, we can proceed off-list if so desired. OK. I will do these tests. 1. folded vs. unrolled 2. .align vs no .align in *crypt1 Any other test to added? I will test with openssl speed and send you the result. I will do the test tomorrow. BTW: you want me to prepare the patch or you prepare the patch yourself? I'll manage it myself. A. Can you send me the full patch, so I can test it. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, All, It seems that Andy is not available from Christmas on. Who can tell me where can I find him? Or how can I do to have this patch reviewed? Best Regards, Huang Ying On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote: This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES-NI support is implemented as an engine in crypto/engine/. ChangeLog: v3: - Rename INTEL or INTEL_AES stuff to AESNI - Use cfb and ofb modes implementation of crypto/modes instead of copying. v2: - AES-NI support is implemented as an engine instead of branch. - ECB and CBC modes are implemented in parallel style to take advantage of pipelined hardware implementation. - AES key scheduling algorithm is re-implemented with higher performance. Known issues: - How to add conditional compilation for eng_intel_asm.pl? It can not be compiled on non-x86 platform. - NID for CTR mode can not be found, how to support it in engine? - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary to add AES-NI support for them, I can add them. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/engine/Makefile | 11 crypto/engine/eng_aesni.c | 409 ++ crypto/engine/eng_aesni_asm.pl | 918 + crypto/engine/eng_all.c|3 crypto/engine/engine.h |1 5 files changed, 1340 insertions(+), 2 deletions(-) --- /dev/null +++ b/crypto/engine/eng_aesni.c @@ -0,0 +1,409 @@ +/* + * Support for Intel AES-NI intruction set + * Author: Huang Ying ying.hu...@intel.com + * + * Intel AES-NI is a new set of Single Instruction Multiple Data + * (SIMD) instructions that are going to be introduced in the next + * generation of Intel processor, as of 2009. These instructions + * enable fast and secure data encryption and decryption, using the + * Advanced Encryption Standard (AES), defined by FIPS Publication + * number 197. The architecture introduces six instructions that + * offer full hardware support for AES. Four of them support high + * performance data encryption and decryption, and the other two + * instructions support the AES key expansion procedure. + * + * The white paper can be downloaded from: + * http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf + * + * This file is based on engines/e_padlock.c + */ + +/* + * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in + *the documentation and/or other materials provided with the + *distribution. + * + * 3. All advertising materials mentioning features or use of this + *software must display the following acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/) + * + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to + *endorse or promote products derived from this software without + *prior written permission. For written permission, please contact + *licens...@openssl.org. + * + * 5. Products derived from this software may not be called OpenSSL + *nor may OpenSSL appear in their names without prior written + *permission of the OpenSSL Project. + * + * 6. Redistributions of any form whatsoever must retain the following + *acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/) + * + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES
Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform
On Tue, 2008-12-23 at 23:36 +0800, Geoff Thorpe wrote: On Tuesday 23 December 2008 02:01:38 Huang Ying wrote: This patch adds support to Intel AES-NI instruction set for x86_64 platform. Cool. I'm relying on Andy to provide a more thorough review than my quick scan - I don't do perl-asm :-) In particular, I haven't tried patching and building this. (Andy, let me know if you need any off-platform testing - I presume not, but ...) Quick comment: Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/engine/Makefile | 11 crypto/engine/eng_all.c|3 crypto/engine/eng_intel.c | 589 ++ crypto/engine/eng_intel_asm.pl | 918 + 4 files changed, 1519 insertions(+), 2 deletions(-) Are you using git to prepare this patch, and if so, which git repo+branch are you tracking? I use OpenSSL cvs to track upstream version. And I use quilt to prepare the patch. +#define INTEL_AES_MIN_ALIGN16 +#define ALIGN(x,a) (((unsigned long)(x)+(a)-1)(~((a)-1))) +#define INTEL_AES_ALIGN(x) ALIGN(x,INTEL_AES_MIN_ALIGN) You don't seem to need the ALIGN() macro anywhere, just INTEL_AES_ALIGN(), so I'd personally prefer it if you didn't use ALIGN as this is tempting fate with respect to possible symbol conflicts. OK. I will fix it. Also, if you have no philosophical objection, I think the file and symbol naming should be based on the interface rather than the manufacturer (particularly for intel, who provide lots of h/w and interfaces that have nothing to do with AES-NI). Perhaps eng_aesni.c rather than eng_intel.c. If it's absolutely certain no other manufacturer will support the same instructions in the future, we could live with eng_intel_aesni.c, but it still needs to be clear that the engine targets the AES-NI interface rather than (any) intel interface. (I don't want to handle support questions from x86 noobs who presume the eng_intel.c engine accelerates any intel cpu ...) I will rename the names to aesni. As your use of INTEL_AES_ALIGN() was always to cast it to a pointer, please also rephrase the macro to not need casting every time; #define AESNI_MIN_ALIGN 16 #define AESNI_ALIGN(x) \ (void *)(((unsigned long)(x) + AESNI_MIN_ALIGN - 1) \ (~(unsigned long)(AESNI_MIN_ALIGN - 1))) I will do this. Finally - did you omit a patch to engine.h? Your changes to eng_all.c include a call to ENGINE_load_intel_aes_ni(), which is in eng_intel.c, but this doen't appear to be declared in any header. I will add it. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-24 at 00:58 +0800, Andy Polyakov wrote: This patch adds support to Intel AES-NI instruction set for x86_64 platform. Cool. I'm relying on Andy to provide a more thorough review Even after short glance I can tell there will be a lot of comments and even work to do, but I'm planning to take it later... ... ... ... ... Looking forward your further comments. Also, if you have no philosophical objection, I think the file and symbol naming should be based on the interface rather than the manufacturer (particularly for intel, who provide lots of h/w and interfaces that have nothing to do with AES-NI). Perhaps eng_aesni.c rather than eng_intel.c. I second it. Ying, there is nothing preventing us from renaming files and functions (assuming that you have no philosophical objections), but *if* you choose to submit another patch with alternative naming, could you look into crypto/modes and use it? At earlier occasion you commented hope that it can be merged quickly, but it was committed to OpenSSL CVS prior I mentioned it... Or is it that you might have failed to pull it to your repository, but then it's something we have no power to make quicker... Sorry, I neglect them, I will use them in new patch. Out of curiosity, what does NI stand for anyway? Or is it just something the knights kept saying? But didn't they stop doing so? Cheers. A. NI stands for New Instruction. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-24 at 00:58 +0800, Andy Polyakov wrote: This patch adds support to Intel AES-NI instruction set for x86_64 platform. Cool. I'm relying on Andy to provide a more thorough review Even after short glance I can tell there will be a lot of comments and even work to do, but I'm planning to take it later... ... ... ... ... Also, if you have no philosophical objection, I think the file and symbol naming should be based on the interface rather than the manufacturer (particularly for intel, who provide lots of h/w and interfaces that have nothing to do with AES-NI). Perhaps eng_aesni.c rather than eng_intel.c. I second it. Ying, there is nothing preventing us from renaming files and functions (assuming that you have no philosophical objections), but *if* you choose to submit another patch with alternative naming, could you look into crypto/modes and use it? At earlier occasion you commented hope that it can be merged quickly, but it was committed to OpenSSL CVS prior I mentioned it... Or is it that you might have failed to pull it to your repository, but then it's something we have no power to make quicker... It seems that crypto/modes is not compiled in libcrypto by default. The following patch can be used to make it compiled. It should be a separate patch or just merged it into AES-NI patch? --- Makefile.org |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/Makefile.org +++ b/Makefile.org @@ -119,7 +119,7 @@ SDIRS= \ bn ec rsa dsa ecdsa dh ecdh dso engine \ buffer bio stack lhash rand err \ evp asn1 pem x509 x509v3 conf txt_db pkcs7 pkcs12 comp ocsp ui krb5 \ - cms pqueue ts jpake + cms pqueue ts jpake modes # keep in mind that the above list is adjusted by ./Configure # according to no-xxx arguments... signature.asc Description: This is a digitally signed message part
[PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES-NI support is implemented as an engine in crypto/engine/. ChangeLog: v3: - Rename INTEL or INTEL_AES stuff to AESNI - Use cfb and ofb modes implementation of crypto/modes instead of copying. v2: - AES-NI support is implemented as an engine instead of branch. - ECB and CBC modes are implemented in parallel style to take advantage of pipelined hardware implementation. - AES key scheduling algorithm is re-implemented with higher performance. Known issues: - How to add conditional compilation for eng_intel_asm.pl? It can not be compiled on non-x86 platform. - NID for CTR mode can not be found, how to support it in engine? - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary to add AES-NI support for them, I can add them. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/engine/Makefile | 11 crypto/engine/eng_aesni.c | 409 ++ crypto/engine/eng_aesni_asm.pl | 918 + crypto/engine/eng_all.c|3 crypto/engine/engine.h |1 5 files changed, 1340 insertions(+), 2 deletions(-) --- /dev/null +++ b/crypto/engine/eng_aesni.c @@ -0,0 +1,409 @@ +/* + * Support for Intel AES-NI intruction set + * Author: Huang Ying ying.hu...@intel.com + * + * Intel AES-NI is a new set of Single Instruction Multiple Data + * (SIMD) instructions that are going to be introduced in the next + * generation of Intel processor, as of 2009. These instructions + * enable fast and secure data encryption and decryption, using the + * Advanced Encryption Standard (AES), defined by FIPS Publication + * number 197. The architecture introduces six instructions that + * offer full hardware support for AES. Four of them support high + * performance data encryption and decryption, and the other two + * instructions support the AES key expansion procedure. + * + * The white paper can be downloaded from: + * http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf + * + * This file is based on engines/e_padlock.c + */ + +/* + * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in + *the documentation and/or other materials provided with the + *distribution. + * + * 3. All advertising materials mentioning features or use of this + *software must display the following acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/) + * + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to + *endorse or promote products derived from this software without + *prior written permission. For written permission, please contact + *licens...@openssl.org. + * + * 5. Products derived from this software may not be called OpenSSL + *nor may OpenSSL appear in their names without prior written + *permission of the OpenSSL Project. + * + * 6. Redistributions of any form whatsoever must retain the following + *acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/) + * + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR + * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS
Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input
On Wed, 2008-12-17 at 22:30 +0800, Andy Polyakov via RT wrote: Fix two bugs in .Lcbc_slow_enc_in_place. - At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be set to 16. - In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb. Thanks. The problem is addressed but in different way, see http://cvs.openssl.org/chngview?cn=17698. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/aes/asm/aes-x86_64.pl |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/crypto/aes/asm/aes-x86_64.pl +++ b/crypto/aes/asm/aes-x86_64.pl @@ -1994,10 +1994,12 @@ AES_cbc_encrypt: ??? What is it for version you have? In CVS .Lcbc_slow_enc_in_place resided at line #1974! A. I use CVS. It's an issue of patch sequence, I put another personal patch before this one. And, I find with the simple test program attached with the mail. The output of CVS is different from that of openssl-0.9.8g if the specified input length is less than 16. Best Regards, Huang Ying #include openssl/aes.h #include stdio.h #include assert.h #include stdlib.h #include string.h void print_arr(unsigned char buf[], int sz, char *prefix) { int i; if (prefix) printf(%s, prefix); for (i = 0; i sz; i++) printf(%02x, buf[i]); printf(\n); } void test_cbc1(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[16] = 1234567890; unsigned char out[16]; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out),out: ); //AES_cbc_encrypt(in, in, in_len, key, iv2, 1); //print_arr(in, sizeof(in), ip_out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in),out: ); } void test_cbc2(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[32] = 12345678901234567890123456789012; unsigned char out[32]; in_len += 16; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out), out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in), in: ); } void test_cbc3(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[80] = 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890; unsigned char out[80]; in_len += 64; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out), out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in), in: ); } int main(int argc, char *argv[]) { int in_len; in_len = argc 1 ? atoi(argv[1]) : 16; test_cbc1(in_len); test_cbc2(in_len); test_cbc3(in_len); return 0; } signature.asc Description: This is a digitally signed message part
Re: [openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input
On Wed, 2008-12-17 at 22:30 +0800, Andy Polyakov via RT wrote: Fix two bugs in .Lcbc_slow_enc_in_place. - At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be set to 16. - In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb. Thanks. The problem is addressed but in different way, see http://cvs.openssl.org/chngview?cn=17698. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/aes/asm/aes-x86_64.pl |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/crypto/aes/asm/aes-x86_64.pl +++ b/crypto/aes/asm/aes-x86_64.pl @@ -1994,10 +1994,12 @@ AES_cbc_encrypt: ??? What is it for version you have? In CVS .Lcbc_slow_enc_in_place resided at line #1974! A. I use CVS. It's an issue of patch sequence, I put another personal patch before this one. And, I find with the simple test program attached with the mail. The output of CVS is different from that of openssl-0.9.8g if the specified input length is less than 16. Best Regards, Huang Ying #include openssl/aes.h #include stdio.h #include assert.h #include stdlib.h #include string.h void print_arr(unsigned char buf[], int sz, char *prefix) { int i; if (prefix) printf(%s, prefix); for (i = 0; i sz; i++) printf(%02x, buf[i]); printf(\n); } void test_cbc1(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[16] = 1234567890; unsigned char out[16]; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out),out: ); //AES_cbc_encrypt(in, in, in_len, key, iv2, 1); //print_arr(in, sizeof(in), ip_out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in),out: ); } void test_cbc2(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[32] = 12345678901234567890123456789012; unsigned char out[32]; in_len += 16; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out), out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in), in: ); } void test_cbc3(int in_len) { int ret; AES_KEY key; unsigned char user_key[16] = 123456; unsigned char iv1[16] = 9876543210987654; unsigned char iv2[16]; unsigned char in[80] = 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890; unsigned char out[80]; in_len += 64; memcpy(iv2, iv1, sizeof(iv1)); ret = AES_set_encrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(in, out, in_len, key, iv1, 1); print_arr(out, sizeof(out), out: ); ret = AES_set_decrypt_key(user_key, 128, key); assert(!ret); AES_cbc_encrypt(out, in, in_len, key, iv2, 0); print_arr(in, sizeof(in), in: ); } int main(int argc, char *argv[]) { int in_len; in_len = argc 1 ? atoi(argv[1]) : 16; test_cbc1(in_len); test_cbc2(in_len); test_cbc3(in_len); return 0; } signature.asc Description: PGP signature
[PATCH RFC -v2] Add support to Intel AES-NI instruction set for x86_64 platform
This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES-NI support is implemented as an engine in crypto/engine/. ChangeLog: v2: - AES-NI support is implemented as an engine instead of branch. - ECB and CBC modes are implemented in parallel style to take advantage of pipelined hardware implementation. - AES key scheduling algorithm is re-implemented with higher performance. Known issues: - How to add conditional compilation for eng_intel_asm.pl? It can not be compiled on non-x86 platform. - NID for CTR mode can not be found, how to support it in engine? - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary to add AES-NI support for them, I can add them. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/engine/Makefile | 11 crypto/engine/eng_all.c|3 crypto/engine/eng_intel.c | 589 ++ crypto/engine/eng_intel_asm.pl | 918 + 4 files changed, 1519 insertions(+), 2 deletions(-) --- /dev/null +++ b/crypto/engine/eng_intel.c @@ -0,0 +1,589 @@ +/* + * Support for Intel AES-NI intruction set + * Author: Huang Ying ying.hu...@intel.com + * + * Some code is copied from engines/e_padlock.c + * + * cfb and ofb mode code is copied from crypto/aes/aes_cfb.c and + * crypto/aes/aes_ofb.c. + */ + +/* + * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in + *the documentation and/or other materials provided with the + *distribution. + * + * 3. All advertising materials mentioning features or use of this + *software must display the following acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/) + * + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to + *endorse or promote products derived from this software without + *prior written permission. For written permission, please contact + *licens...@openssl.org. + * + * 5. Products derived from this software may not be called OpenSSL + *nor may OpenSSL appear in their names without prior written + *permission of the OpenSSL Project. + * + * 6. Redistributions of any form whatsoever must retain the following + *acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/) + * + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR + * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * + * This product includes cryptographic software written by Eric Young + * (e...@cryptsoft.com). This product includes software written by Tim + * Hudson (t...@cryptsoft.com). + * + */ + + +#include openssl/opensslconf.h + +#if !defined(OPENSSL_NO_HW) !defined(OPENSSL_NO_HW_INTEL_AES_NI) !defined(OPENSSL_NO_AES) + +#define INTEL_AES_MIN_ALIGN16 +#define ALIGN(x,a) (((unsigned long)(x)+(a)-1)(~((a)-1))) +#define INTEL_AES_ALIGN(x
[openssl.org #1801] [BUGFIX] Segment fault when invoking AES_cbc_encrypt() on x86_64 with short input
Fix two bugs in .Lcbc_slow_enc_in_place. - At end of .Lcbc_slow_enc_in_place, %r10 instead of $_len should be set to 16. - In .Lcbc_slow_enc_in_place, %rdi should be initialized before stosb. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/aes/asm/aes-x86_64.pl |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/crypto/aes/asm/aes-x86_64.pl +++ b/crypto/aes/asm/aes-x86_64.pl @@ -1994,10 +1994,12 @@ AES_cbc_encrypt: .Lcbc_slow_enc_in_place: mov \$16,%rcx # zero tail sub %r10,%rcx + mov $out,%rdi + add %r10,%rdi xor %rax,%rax .long 0x9066AAF3 # rep stosb mov $out,$inp # this is not a mistake! - movq\$16,$_len # len=16 + movq\$16,%r10 # len=16 jmp .Lcbc_slow_enc_loop # one more spin... #--- SLOW DECRYPT ---# .align 16 signature.asc Description: PGP signature
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Tue, 2008-12-16 at 21:30 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Not that I'm answering the question with a question, but what's wrong with movups? I mean I consider that the question about copying key schedule was already discussed in enough detail, but I'm left with feeling that other options are even less preferable. So I wonder why? Is movups expected to so much worse? Even if input is moderately misaligned, say at 64-bit boundary instead of 128? Can't one pipe-line it, i.e. schedule movups longer before aes[enc|dec], to amortize' additional latency, or will it be non-pipe-line-able? Even if so, it's possible to compromise having movaps and palignr. If one chooses to declare key schedule to ensure 64-bit alignment(*) it would really have to be a single palignr case... A. That sounds interesting. At least deserve try. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Tue, 2008-12-16 at 19:12 +0800, Andy Polyakov wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. The plan is to consolidate mode implementations, so it doesn't have to be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692. Good! Hope that can be merged quickly. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. Just to clarify CBC situation. While it's absolutely correct that *de*cryption is the one that can take full advantage of pipe-lining, dedicated *en*cryption procedure should also be implemented in assembler. Why? It doesn't come as surprise that CBC timing is sum of time spent in block procedure and time spent performing the block chaining. The latter can be underestimated and as block procedure gets faster it actually becomes underestimated. I reckon that with 4x faster block procedure, C timing for block chaining would be comparable with block procedure. This in turn means that overall performance would be almost twice as low as if chaining was implemented in assembler. This applies to x86_64, on x86 performance loss would be even higher... OK, I will implement CBC encryption with ASM too. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. I second it. And additional note. As padlock engine was mentioned, I can imagine that the idea of using inline assembler will pop up in the head. Please don't! As already mentioned we support other compilers as well and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it might be acceptable (both GNU and Microsoft compilers support inline assembler), but not in 64-bit case (GNU is the only one supporting inline assembler). OK. I will use same format as aes-x86_64.pl. As for FIPS. Given current precedent it should be noted that if branch version is certified, then the branch becomes bound to be taken. In other words branched version would be prohibited to reach certified mode of operation on CPU that does not support the instruction set extension in question. Then why does it have to be branched? Having this in mind wouldn't it make as much sense to implement module that can be used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who are willing to pursue certification for given hardware can do so with not so much hassle(*)? Would it be effort duplication? Does not have to be! Because the code can be used in engine context just as well... Now to practicalities. What I can do to help. I can put together perl scripts for x86_64 and x86, which can be used as drop-in replacement for aes-[586|x86_64].pl as well as in engine context. Note that drop-in replacement implies presence of CBC procedure, though I'd be reluctant to implement pipe-lined version. At least not without further consideration, because it might turn out that pipe-lined version doesn't have to monolithic. Most notably one can break decryption into multi-block ECB and separate multi-block chaining to minimize developing effort. A. Thank you very much. I can change the format to perl format, but need your help to test it on Windows 64 and fix some issue such as SSE operands. I think AES-NI based pipelined implementation can be a start point for general version. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Fri, 2008-12-12 at 12:24 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 23:02:12 Huang Ying wrote: On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. OK. I will write an engine version firstly, at least for discussion. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? So I think the best method is to implement AES-NI with both branch and engine. With branch version, we get full power of AES-NI for cbc, cfb and ofb mode. At the same time the engine version can provide further acceleration for ecb and ctr mode. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. At least on UNIX system, mlock() can be used to prevent specified memory range from swapping out. Maybe we should put all key schedule into a memory area protected by mlock()? That is safer than stack I think. Best Regards, Huang Ying My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 20:39:41 Huang Ying wrote: On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? The answer is engine. In engine, I can just just re-align the expanded key address because it is not exposed to user? Something as follow: typedef struct { AES_KEY ks; unsigned int _pad[3]; } INTEL_AES_KEY; IMPLEMENT_BLOCK_CIPHER(intel_aes_128, ks, intel_AES, INTEL_AES_KEY, NID_aes_128, 16, 16, 16, 128, 0, intel_aes_init_key, NULL, EVP_CIPHER_set_asn1_iv, EVP_CIPHER_get_asn1_iv, NULL) BTW: The comments of AES_KEY in aes.h says: /* This should be a hidden type, but EVP requires that the size be known */ Does this means AES_KEY is not a public interface and user should not take use of its internal implementation? - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. Yes, and the relevant question is if it worth it. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. There is no hardware in sight, so until is not really an argument. One can reserve for branch version as back-up/exit plan, i.e. in case, but not until. ECB, CBC decrypt, CTR can benefit from AES-NI pipelining. But other modes can not. So maybe we should have both branch version and engine version. Branch version used for other modes and CBC decrypt, while engine version used for ECB and CTR modes. BTW: Is ECB used widely in practice? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 16:01 +0800, Andy Polyakov wrote: I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I said one can discuss it, there is no way currently, but as it's *soft*ware there is hardly limit for what one can do. A. What's your idea about that? It seems that EVP_CipherInit_ex() will check engines. And AES-NI engine can register itself upon there is appropriate CPUID bit set. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 17:22 +0800, Andy Polyakov wrote: - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so some SIMD instructions can't be handled. Some adjustments for crypto/perlasm are required... I just know a little about perl and I have no windows 64 machine to test. Can you help me to do that? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
[PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf - AES implementation based on AES-NI is put in crypto/aes/asm/aes-intel.S - AES-NI operates on XMM registers, so the key structure need to be 128-bit aligned. A pad field is added to AES_KEY and key structure is aligned to 128-bit boundary in entry of AES-NI implementation. - In entry point of AES algorithm in crypto/aes/asm/aes-x86_64.pl, OPENSSL_ia32cap_P is checked, if corresponding bit (57) is set, branch into AES-NI based implementation. - AES-NI based implementation can not benefit from a specialized AES_cbc_encrypt, so its general C implementation is used. To resolve the name conflict, original AES_cbc_encrypt is renamed to AES_cbc_encrypt_def and put in crypto/aes/aes_cbc_def.c. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Configure| 20 +- crypto/aes/Makefile |9 - crypto/aes/aes.h |5 crypto/aes/aes_cbc.c | 66 --- crypto/aes/aes_cbc_def.c | 130 ++ crypto/aes/asm/aes-intel.S | 374 +++ crypto/aes/asm/aes-x86_64.pl | 20 ++ 7 files changed, 546 insertions(+), 78 deletions(-) --- /dev/null +++ b/crypto/aes/asm/aes-intel.S @@ -0,0 +1,374 @@ +/* + * + * Written by Huang Ying [EMAIL PROTECTED] for the OpenSSL + * project to add support for Intel new AES instructions. Rights for + * redistribution and usage in source and binary forms are granted + * according to the OpenSSL license. + * + */ + +.align 16 +key_expansion_128: + movaps %xmm1, %xmm4 + psrldq $12, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm1, %xmm0 + + movaps %xmm0, (%rcx) + add $0x10, %rcx + ret + +.align 16 +key_expansion_192: + pshufd $0b01010101, %xmm1, %xmm1 + movaps %xmm1, %xmm4 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm1, %xmm0 + + pshufd $0b, %xmm0, %xmm3 + pxor %xmm2, %xmm3 + palignr $12, %xmm0, %xmm3 + pxor %xmm2, %xmm3 + + test %r9, %r9 + not %r9 + jnz 1f + + movaps %xmm0, %xmm1 + pslldq $8, %xmm2 + palignr $8, %xmm2, %xmm1 + movaps %xmm1, (%rcx) + add $0x10, %rcx + movaps %xmm3, %xmm2 + palignr $8, %xmm0, %xmm3 + movaps %xmm3, (%rcx) + add $0x10, %rcx + ret +1: + movaps %xmm0, (%rcx) + add $0x10, %rcx + movaps %xmm3, %xmm2 + ret + +.align 16 +key_expansion_256: + movaps %xmm1, %xmm4 + psrldq $12, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm0, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm1, %xmm0 + + movaps %xmm0, (%rcx) + add $0x10, %rcx + + test %r9, %r9 + jnz 1f + + # aeskeygenassist $0x1, %xmm0, %xmm1 + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01 + + pshufd $0b10101010, %xmm1, %xmm1 + movaps %xmm1, %xmm4 + pxor %xmm2, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm2, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm2, %xmm1 + palignr $12, %xmm4, %xmm1 + pxor %xmm1, %xmm2 + + movaps %xmm2, (%rcx) + add $0x10, %rcx +1: + ret + +.align 16 +.global intel_AES_set_encrypt_key +intel_AES_set_encrypt_key: + test %rdi, %rdi + jz 3f + test %rdx, %rdx + jz 3f + add $0xf, %rdx # make key struct 128-bit aligned + and $0xfff0, %rdx + movups (%rdi), %xmm0# user key (first 16 bytes) + movaps %xmm0, (%rdx) + lea 0x10(%rdx), %rcx# key addr + cmp $256, %esi + jnz 1f + mov $14, %esi + movl %esi, 240(%rdx)# 14 rounds for 256 + movups 0x10(%rdi), %xmm2
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 03:40 +0800, Andy Polyakov wrote: As for RFC part. NO! This is NOT the way to do it. For several reasons (in ascending order of importance): - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? Is it better to declare AES_KEY as follow: struct aes_key_st { unsigned int rd_key[4 *(AES_MAXNR + 1)]; int rounds; } __attribute__ ((aligned (16))); And how to deal with memory allocated with malloc()? - zero-copy CBC routine gives a fair performance improvement even in ordinary case, and driving ultra-fast block function from C would be just wasteful. In other words AESENC/DEC would benefit more from dedicated CBC routine (see even comment below); I will do more investigation on that. - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. Hardware however is not expected before 2010, right? A. Maybe 2009 or 2010, I don't know that exactly too. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 04:58 +0800, Peter Waltenberg wrote: If you want this in the mainstream code, you'll need to detect the capability at runtime and use your alternate code paths only if the hardware is present. It's not even to Intels advantage if OpenSSL crashes and burns on older Intel CPU's and most bulk users of OpenSSL (OS vendors) won't want to mess around installing different OpenSSL versions for different hardware. Autodetection is the best option if the detection overhead is reasonable - take a look at crypto/x86_64cpuid.pl for how to do the detection logic neatly. There are advantages in this being present all the time/dynamically enabled if it can be done, most users/OS vendors wouldn't bother to configure an engine backend anyway. Auto-detection has been implemented in patch. - In entry point of AES algorithm in crypto/aes/asm/aes-x86_64.pl, OPENSSL_ia32cap_P is checked, if corresponding bit (57) is set, branch into AES-NI based implementation. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 05:47 +0800, Andy Polyakov wrote: I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
[openssl.org #1690] BN_GF2m_mod_arr() infinite loop
The following code will make BN_GF2m_mod_arr() into infinite loop. int main(int argc, char *argv[]) { BIGNUM *bn = NULL, *res = NULL, *p = NULL; BN_hex2bn(bn3, 448692853686179295b477565726f6e5d); BN_hex2bn(p, 10087); res = BN_new(); BN_GF2m_mod(res, bn3, p); } Because in final round of reduction d0 == 0 and z[dN] != 0, which makes z[dN] can not be changed for ever. This is fixed by set z[dn] = 0 if d0 == 0. This patch is based on openssl SNAPSHOT 20080519, and has been tested on x86_64 with openssl/test/bntest.c and above program. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- crypto/bn/bn_gf2m.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/crypto/bn/bn_gf2m.c +++ b/crypto/bn/bn_gf2m.c @@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG if (zz == 0) break; d1 = BN_BITS2 - d0; - if (d0) z[dN] = (z[dN] d1) d1; /* clear up the top d1 bits */ + /* clear up the top d1 bits */ + if (d0) + z[dN] = (z[dN] d1) d1; + else + z[dN] = 0; z[0] ^= zz; /* reduction t^0 component */ for (k = 1; p[k] != 0; k++) The following code will make BN_GF2m_mod_arr() into infinite loop. int main(int argc, char *argv[]) { BIGNUM *bn = NULL, *res = NULL, *p = NULL; BN_hex2bn(bn3, 448692853686179295b477565726f6e5d); BN_hex2bn(p, 10087); res = BN_new(); BN_GF2m_mod(res, bn3, p); } Because in final round of reduction d0 == 0 and z[dN] != 0, which makes z[dN] can not be changed for ever. This is fixed by set z[dn] = 0 if d0 == 0. This patch is based on openssl SNAPSHOT 20080519, and has been tested on x86_64 with openssl/test/bntest.c and above program. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- crypto/bn/bn_gf2m.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/crypto/bn/bn_gf2m.c +++ b/crypto/bn/bn_gf2m.c @@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG if (zz == 0) break; d1 = BN_BITS2 - d0; - if (d0) z[dN] = (z[dN] d1) d1; /* clear up the top d1 bits */ + /* clear up the top d1 bits */ + if (d0) + z[dN] = (z[dN] d1) d1; + else + z[dN] = 0; z[0] ^= zz; /* reduction t^0 component */ for (k = 1; p[k] != 0; k++) OpenSSL self-test report: OpenSSL version: 0.9.9-dev Last change: To support arbitrarily-typed thread IDs, deprecate the ... Options: no-gmp no-krb5 no-mdc2 no-rc5 no-rfc3779 no-shared no-zlib no-zlib-dynamic static-engine OS (uname): Linux caritas-dev 2.6.24-1-amd64 #1 SMP Fri Apr 18 23:08:22 UTC 2008 x86_64 GNU/Linux OS (config): x86_64-whatever-linux2 Target (default): linux-x86_64 Target: linux-x86_64 Compiler: Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --disable-libmudflap --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.2.4 (Debian 4.2.4-1) Test passed.
[BUGFIX] BN_GF2m_mod_arr() infinite loop
The following code will make BN_GF2m_mod_arr() into infinite loop. int main(int argc, char *argv[]) { BIGNUM *bn = NULL, *res = NULL, *p = NULL; BN_hex2bn(bn3, 448692853686179295b477565726f6e5d); BN_hex2bn(p, 10087); res = BN_new(); BN_GF2m_mod(res, bn3, p); } Because in final round of reduction d0 == 0 and z[dN] != 0, which makes z[dN] can not be changed for ever. This is fixed by set z[dn] = 0 if d0 == 0. This patch is based on openssl SNAPSHOT 20080519, and has been tested on x86_64 with openssl/test/bntest.c and above program. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- crypto/bn/bn_gf2m.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/crypto/bn/bn_gf2m.c +++ b/crypto/bn/bn_gf2m.c @@ -322,7 +322,11 @@ int BN_GF2m_mod_arr(BIGNUM *r, const BIG if (zz == 0) break; d1 = BN_BITS2 - d0; - if (d0) z[dN] = (z[dN] d1) d1; /* clear up the top d1 bits */ + /* clear up the top d1 bits */ + if (d0) + z[dN] = (z[dN] d1) d1; + else + z[dN] = 0; z[0] ^= zz; /* reduction t^0 component */ for (k = 1; p[k] != 0; k++) __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]