Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-04-01 Thread Huang Ying
On Wed, 2009-04-01 at 03:45 +0800, Andy Polyakov wrote:
 Hi,
 
  This patch adds support to Intel AES-NI instruction set for x86_64
  platform.
 
 I apologize for delay.

That's all right.

 Promised to comment on submission in question.
 Well, after some consideration I reckoned that it would take longer to
 discuss it than to implement own version of assembler module. Having own
 code also makes it easier for me to maintain it:-) The module is
 available for preview at
 http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are,
 all addressed in the new code:

Thank you very much for your work.

 - why full unroll?

Just because the unrolled code is not too long.

 - why 4x interleave when aesenc latency is [anticipated to be] 6?

Yes. It should be 6, I neglect this important information in white
paper.

 - why post-4x processing is done with non-interleaved routine, when
 interleaved can be used?

Yes. post-4x processing can be done in interleaved mode. That is faster.

 - why not encode all aes instructions with .byte?

Just want to encode all aes instructions after some review. Now I think
maybe we can define aes instructions as perl function and do encoding
via perl.

 - instruction scheduling in key setup can be [much] better;
 
 See code and comments in code for further details. I'd appreciate if you
 could review and cross-test the code. [Counter-]comments and suggestions
 are naturally welcomed. The code will be committed to repository as soon
 as remaining issues are resolved. Remaining are build issue (as you
 pointed out yourself) and actual tests on Win64. Note that I suggest to
 name module eng_aesni-x86_64.pl instead of _asm. This implies that
 eventually there will be 32-bit version too.

I will test your code on real machine. And at least you can test the
code with an emulator: SDE, which can be downloaded from following URL:

http://linux.softpedia.com/progDownload/Intel-Software-Development-Emulator-Download-44635.html

Both Linux and windows are supported.

BTW: you want me to prepare the patch or you prepare the patch yourself?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-04-01 Thread Andy Polyakov
Hi,

 - why full unroll?
 
 Just because the unrolled code is not too long.

As for non-interleaved loop. Reasoning is that folded loop can be
inlined in several places to spare few cycles on call overhead. Of
course this is under premise that it is as fast as unrolled one. Intel
CPUs used to be very good at small loops, which is why I dared to fold
the loop. Of course it doesn't have to be the case here and if unrolled
loop will be proved to be faster, inline code will have to be replaced
with calls.

 - why not encode all aes instructions with .byte?
 
 Just want to encode all aes instructions after some review. Now I think
 maybe we can define aes instructions as perl function and do encoding
 via perl.

It's done at the end of script.

 I will test your code on real machine.

There is real machine? Would you care to perform several tests, so that
we can sort out what's optimal? I mean the folded vs. unrolled, then I
wonder if my use of .aligns is excessive in *crypt1... I don't demand
actual figures [in case you can't disclose them], only if/how
performance is affected... If yes, we can proceed off-list if so desired.

 And at least you can test the
 code with an emulator: SDE,

That's how the code was tested, every code branch was explicitly tested.

 BTW: you want me to prepare the patch or you prepare the patch yourself?

I'll manage it myself. A.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-04-01 Thread Huang Ying
Hi,

On Wed, 2009-04-01 at 16:02 +0800, Andy Polyakov wrote:
  Just because the unrolled code is not too long.
 
 As for non-interleaved loop. Reasoning is that folded loop can be
 inlined in several places to spare few cycles on call overhead. Of
 course this is under premise that it is as fast as unrolled one. Intel
 CPUs used to be very good at small loops, which is why I dared to fold
 the loop. Of course it doesn't have to be the case here and if unrolled
 loop will be proved to be faster, inline code will have to be replaced
 with calls.

Sound reasonable.

  - why not encode all aes instructions with .byte?
  
  Just want to encode all aes instructions after some review. Now I think
  maybe we can define aes instructions as perl function and do encoding
  via perl.
 
 It's done at the end of script.

Yes. Thanks.

  I will test your code on real machine.
 
 There is real machine? Would you care to perform several tests, so that
 we can sort out what's optimal? I mean the folded vs. unrolled, then I
 wonder if my use of .aligns is excessive in *crypt1... I don't demand
 actual figures [in case you can't disclose them], only if/how
 performance is affected... If yes, we can proceed off-list if so desired.

OK. I will do these tests.

1. folded vs. unrolled
2. .align vs no .align in *crypt1

Any other test to added?

I will test with openssl speed and send you the result. I will do the
test tomorrow.

  BTW: you want me to prepare the patch or you prepare the patch yourself?
 
 I'll manage it myself. A.

Can you send me the full patch, so I can test it.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-03-31 Thread Andy Polyakov
Hi,

 This patch adds support to Intel AES-NI instruction set for x86_64
 platform.

I apologize for delay. Promised to comment on submission in question.
Well, after some consideration I reckoned that it would take longer to
discuss it than to implement own version of assembler module. Having own
code also makes it easier for me to maintain it:-) The module is
available for preview at
http://www.openssl.org/~appro/eng_aesni-x86_64.pl.txt. Major points are,
all addressed in the new code:

- why full unroll?
- why 4x interleave when aesenc latency is [anticipated to be] 6?
- why post-4x processing is done with non-interleaved routine, when
interleaved can be used?
- why not encode all aes instructions with .byte?
- instruction scheduling in key setup can be [much] better;

See code and comments in code for further details. I'd appreciate if you
could review and cross-test the code. [Counter-]comments and suggestions
are naturally welcomed. The code will be committed to repository as soon
as remaining issues are resolved. Remaining are build issue (as you
pointed out yourself) and actual tests on Win64. Note that I suggest to
name module eng_aesni-x86_64.pl instead of _asm. This implies that
eventually there will be 32-bit version too. Cheers. A.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH RFC -v3] Add support to Intel AES-NI instruction set for x86_64 platform

2009-02-09 Thread Huang Ying
Hi, All,

It seems that Andy is not available from Christmas on. Who can tell me
where can I find him? Or how can I do to have this patch reviewed?

Best Regards,
Huang Ying

On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote:
 This patch adds support to Intel AES-NI instruction set for x86_64
 platform.
 
 Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
 instructions that are going to be introduced in the next generation of
 Intel processor, as of 2009. These instructions enable fast and secure
 data encryption and decryption, using the Advanced Encryption Standard
 (AES), defined by FIPS Publication number 197.  The architecture
 introduces six instructions that offer full hardware support for
 AES. Four of them support high performance data encryption and
 decryption, and the other two instructions support the AES key
 expansion procedure.
 
 The white paper can be downloaded from:
 
 http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
 
 
 AES-NI support is implemented as an engine in crypto/engine/.
 
 
 ChangeLog:
 
 v3:
 
 - Rename INTEL or INTEL_AES stuff to AESNI
 
 - Use cfb and ofb modes implementation of crypto/modes instead of copying.
 
 v2:
 
 - AES-NI support is implemented as an engine instead of branch.
 
 - ECB and CBC modes are implemented in parallel style to take
   advantage of pipelined hardware implementation.
 
 - AES key scheduling algorithm is re-implemented with higher performance.
 
 
 Known issues:
 
 - How to add conditional compilation for eng_intel_asm.pl? It can not
   be compiled on non-x86 platform.
 
 - NID for CTR mode can not be found, how to support it in engine?
 
 - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
   to add AES-NI support for them, I can add them.
 
 
 Signed-off-by: Huang Ying ying.hu...@intel.com
 
 ---
  crypto/engine/Makefile |   11 
  crypto/engine/eng_aesni.c  |  409 ++
  crypto/engine/eng_aesni_asm.pl |  918 
 +
  crypto/engine/eng_all.c|3 
  crypto/engine/engine.h |1 
  5 files changed, 1340 insertions(+), 2 deletions(-)
 
 --- /dev/null
 +++ b/crypto/engine/eng_aesni.c
 @@ -0,0 +1,409 @@
 +/*
 + * Support for Intel AES-NI intruction set
 + *   Author: Huang Ying ying.hu...@intel.com
 + *
 + * Intel AES-NI is a new set of Single Instruction Multiple Data
 + * (SIMD) instructions that are going to be introduced in the next
 + * generation of Intel processor, as of 2009. These instructions
 + * enable fast and secure data encryption and decryption, using the
 + * Advanced Encryption Standard (AES), defined by FIPS Publication
 + * number 197.  The architecture introduces six instructions that
 + * offer full hardware support for AES. Four of them support high
 + * performance data encryption and decryption, and the other two
 + * instructions support the AES key expansion procedure.
 + *
 + * The white paper can be downloaded from:
 + *   
 http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
 + *
 + * This file is based on engines/e_padlock.c
 + */
 +
 +/* 
 + * Copyright (c) 1999-2001 The OpenSSL Project.  All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + *
 + * 1. Redistributions of source code must retain the above copyright
 + *notice, this list of conditions and the following disclaimer.
 + *
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *notice, this list of conditions and the following disclaimer in
 + *the documentation and/or other materials provided with the
 + *distribution.
 + *
 + * 3. All advertising materials mentioning features or use of this
 + *software must display the following acknowledgment:
 + *This product includes software developed by the OpenSSL Project
 + *for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)
 + *
 + * 4. The names OpenSSL Toolkit and OpenSSL Project must not be used to
 + *endorse or promote products derived from this software without
 + *prior written permission. For written permission, please contact
 + *licens...@openssl.org.
 + *
 + * 5. Products derived from this software may not be called OpenSSL
 + *nor may OpenSSL appear in their names without prior written
 + *permission of the OpenSSL Project.
 + *
 + * 6. Redistributions of any form whatsoever must retain the following
 + *acknowledgment:
 + *This product includes software developed by the OpenSSL Project
 + *for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
 + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF