Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2009-04-01 at 03:45 +0800, Andy Polyakov wrote: Hi, This patch adds support to Intel AES-NI instruction set for x86_64 platform. I apologize for delay. That's all right. Promised to comment on submission in question. Well, after some consideration I reckoned that it would take longer to discuss it than to implement own version of assembler module. Having own code also makes it easier for me to maintain it:-) The module is available for preview at http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are, all addressed in the new code: Thank you very much for your work. - why full unroll? Just because the unrolled code is not too long. - why 4x interleave when aesenc latency is [anticipated to be] 6? Yes. It should be 6, I neglect this important information in white paper. - why post-4x processing is done with non-interleaved routine, when interleaved can be used? Yes. post-4x processing can be done in interleaved mode. That is faster. - why not encode all aes instructions with .byte? Just want to encode all aes instructions after some review. Now I think maybe we can define aes instructions as perl function and do encoding via perl. - instruction scheduling in key setup can be [much] better; See code and comments in code for further details. I'd appreciate if you could review and cross-test the code. [Counter-]comments and suggestions are naturally welcomed. The code will be committed to repository as soon as remaining issues are resolved. Remaining are build issue (as you pointed out yourself) and actual tests on Win64. Note that I suggest to name module eng_aesni-x86_64.pl instead of _asm. This implies that eventually there will be 32-bit version too. I will test your code on real machine. And at least you can test the code with an emulator: SDE, which can be downloaded from following URL: http://linux.softpedia.com/progDownload/Intel-Software-Development-Emulator-Download-44635.html Both Linux and windows are supported. BTW: you want me to prepare the patch or you prepare the patch yourself? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, - why full unroll? Just because the unrolled code is not too long. As for non-interleaved loop. Reasoning is that folded loop can be inlined in several places to spare few cycles on call overhead. Of course this is under premise that it is as fast as unrolled one. Intel CPUs used to be very good at small loops, which is why I dared to fold the loop. Of course it doesn't have to be the case here and if unrolled loop will be proved to be faster, inline code will have to be replaced with calls. - why not encode all aes instructions with .byte? Just want to encode all aes instructions after some review. Now I think maybe we can define aes instructions as perl function and do encoding via perl. It's done at the end of script. I will test your code on real machine. There is real machine? Would you care to perform several tests, so that we can sort out what's optimal? I mean the folded vs. unrolled, then I wonder if my use of .aligns is excessive in *crypt1... I don't demand actual figures [in case you can't disclose them], only if/how performance is affected... If yes, we can proceed off-list if so desired. And at least you can test the code with an emulator: SDE, That's how the code was tested, every code branch was explicitly tested. BTW: you want me to prepare the patch or you prepare the patch yourself? I'll manage it myself. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote: Just because the unrolled code is not too long. As for non-interleaved loop. Reasoning is that folded loop can be inlined in several places to spare few cycles on call overhead. Of course this is under premise that it is as fast as unrolled one. Intel CPUs used to be very good at small loops, which is why I dared to fold the loop. Of course it doesn't have to be the case here and if unrolled loop will be proved to be faster, inline code will have to be replaced with calls. Sound reasonable. - why not encode all aes instructions with .byte? Just want to encode all aes instructions after some review. Now I think maybe we can define aes instructions as perl function and do encoding via perl. It's done at the end of script. Yes. Thanks. I will test your code on real machine. There is real machine? Would you care to perform several tests, so that we can sort out what's optimal? I mean the folded vs. unrolled, then I wonder if my use of .aligns is excessive in *crypt1... I don't demand actual figures [in case you can't disclose them], only if/how performance is affected... If yes, we can proceed off-list if so desired. OK. I will do these tests. 1. folded vs. unrolled 2. .align vs no .align in *crypt1 Any other test to added? I will test with openssl speed and send you the result. I will do the test tomorrow. BTW: you want me to prepare the patch or you prepare the patch yourself? I'll manage it myself. A. Can you send me the full patch, so I can test it. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, This patch adds support to Intel AES-NI instruction set for x86_64 platform. I apologize for delay. Promised to comment on submission in question. Well, after some consideration I reckoned that it would take longer to discuss it than to implement own version of assembler module. Having own code also makes it easier for me to maintain it:-) The module is available for preview at http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are, all addressed in the new code: - why full unroll? - why 4x interleave when aesenc latency is [anticipated to be] 6? - why post-4x processing is done with non-interleaved routine, when interleaved can be used? - why not encode all aes instructions with .byte? - instruction scheduling in key setup can be [much] better; See code and comments in code for further details. I'd appreciate if you could review and cross-test the code. [Counter-]comments and suggestions are naturally welcomed. The code will be committed to repository as soon as remaining issues are resolved. Remaining are build issue (as you pointed out yourself) and actual tests on Win64. Note that I suggest to name module eng_aesni-x86_64.pl instead of _asm. This implies that eventually there will be 32-bit version too. Cheers. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform
Hi, All, It seems that Andy is not available from Christmas on. Who can tell me where can I find him? Or how can I do to have this patch reviewed? Best Regards, Huang Ying On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote: This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES), defined by FIPS Publication number 197. The architecture introduces six instructions that offer full hardware support for AES. Four of them support high performance data encryption and decryption, and the other two instructions support the AES key expansion procedure. The white paper can be downloaded from: http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf AES-NI support is implemented as an engine in crypto/engine/. ChangeLog: v3: - Rename INTEL or INTEL_AES stuff to AESNI - Use cfb and ofb modes implementation of crypto/modes instead of copying. v2: - AES-NI support is implemented as an engine instead of branch. - ECB and CBC modes are implemented in parallel style to take advantage of pipelined hardware implementation. - AES key scheduling algorithm is re-implemented with higher performance. Known issues: - How to add conditional compilation for eng_intel_asm.pl? It can not be compiled on non-x86 platform. - NID for CTR mode can not be found, how to support it in engine? - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary to add AES-NI support for them, I can add them. Signed-off-by: Huang Ying ying.hu...@intel.com --- crypto/engine/Makefile | 11 crypto/engine/eng_aesni.c | 409 ++ crypto/engine/eng_aesni_asm.pl | 918 + crypto/engine/eng_all.c|3 crypto/engine/engine.h |1 5 files changed, 1340 insertions(+), 2 deletions(-) --- /dev/null +++ b/crypto/engine/eng_aesni.c @@ -0,0 +1,409 @@ +/* + * Support for Intel AES-NI intruction set + * Author: Huang Ying ying.hu...@intel.com + * + * Intel AES-NI is a new set of Single Instruction Multiple Data + * (SIMD) instructions that are going to be introduced in the next + * generation of Intel processor, as of 2009. These instructions + * enable fast and secure data encryption and decryption, using the + * Advanced Encryption Standard (AES), defined by FIPS Publication + * number 197. The architecture introduces six instructions that + * offer full hardware support for AES. Four of them support high + * performance data encryption and decryption, and the other two + * instructions support the AES key expansion procedure. + * + * The white paper can be downloaded from: + * http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf + * + * This file is based on engines/e_padlock.c + */ + +/* + * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in + *the documentation and/or other materials provided with the + *distribution. + * + * 3. All advertising materials mentioning features or use of this + *software must display the following acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/) + * + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to + *endorse or promote products derived from this software without + *prior written permission. For written permission, please contact + *licens...@openssl.org. + * + * 5. Products derived from this software may not be called OpenSSL + *nor may OpenSSL appear in their names without prior written + *permission of the OpenSSL Project. + * + * 6. Redistributions of any form whatsoever must retain the following + *acknowledgment: + *This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/) + * + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF