Hi, All,

It seems that Andy is not available from Christmas on. Who can tell me
where can I find him? Or how can I do to have this patch reviewed?

Best Regards,
Huang Ying

On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote:
> This patch adds support to Intel AES-NI instruction set for x86_64
> platform.
> 
> Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
> instructions that are going to be introduced in the next generation of
> Intel processor, as of 2009. These instructions enable fast and secure
> data encryption and decryption, using the Advanced Encryption Standard
> (AES), defined by FIPS Publication number 197.  The architecture
> introduces six instructions that offer full hardware support for
> AES. Four of them support high performance data encryption and
> decryption, and the other two instructions support the AES key
> expansion procedure.
> 
> The white paper can be downloaded from:
> 
> http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
> 
> 
> AES-NI support is implemented as an engine in crypto/engine/.
> 
> 
> ChangeLog:
> 
> v3:
> 
> - Rename INTEL or INTEL_AES stuff to AESNI
> 
> - Use cfb and ofb modes implementation of crypto/modes instead of copying.
> 
> v2:
> 
> - AES-NI support is implemented as an engine instead of "branch".
> 
> - ECB and CBC modes are implemented in parallel style to take
>   advantage of pipelined hardware implementation.
> 
> - AES key scheduling algorithm is re-implemented with higher performance.
> 
> 
> Known issues:
> 
> - How to add conditional compilation for eng_intel_asm.pl? It can not
>   be compiled on non-x86 platform.
> 
> - NID for CTR mode can not be found, how to support it in engine?
> 
> - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary
>   to add AES-NI support for them, I can add them.
> 
> 
> Signed-off-by: Huang Ying <ying.hu...@intel.com>
> 
> ---
>  crypto/engine/Makefile         |   11 
>  crypto/engine/eng_aesni.c      |  409 ++++++++++++++++++
>  crypto/engine/eng_aesni_asm.pl |  918 
> +++++++++++++++++++++++++++++++++++++++++
>  crypto/engine/eng_all.c        |    3 
>  crypto/engine/engine.h         |    1 
>  5 files changed, 1340 insertions(+), 2 deletions(-)
> 
> --- /dev/null
> +++ b/crypto/engine/eng_aesni.c
> @@ -0,0 +1,409 @@
> +/*
> + * Support for Intel AES-NI intruction set
> + *   Author: Huang Ying <ying.hu...@intel.com>
> + *
> + * Intel AES-NI is a new set of Single Instruction Multiple Data
> + * (SIMD) instructions that are going to be introduced in the next
> + * generation of Intel processor, as of 2009. These instructions
> + * enable fast and secure data encryption and decryption, using the
> + * Advanced Encryption Standard (AES), defined by FIPS Publication
> + * number 197.  The architecture introduces six instructions that
> + * offer full hardware support for AES. Four of them support high
> + * performance data encryption and decryption, and the other two
> + * instructions support the AES key expansion procedure.
> + *
> + * The white paper can be downloaded from:
> + *   
> http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf
> + *
> + * This file is based on engines/e_padlock.c
> + */
> +
> +/* ====================================================================
> + * Copyright (c) 1999-2001 The OpenSSL Project.  All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in
> + *    the documentation and/or other materials provided with the
> + *    distribution.
> + *
> + * 3. All advertising materials mentioning features or use of this
> + *    software must display the following acknowledgment:
> + *    "This product includes software developed by the OpenSSL Project
> + *    for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)"
> + *
> + * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
> + *    endorse or promote products derived from this software without
> + *    prior written permission. For written permission, please contact
> + *    licens...@openssl.org.
> + *
> + * 5. Products derived from this software may not be called "OpenSSL"
> + *    nor may "OpenSSL" appear in their names without prior written
> + *    permission of the OpenSSL Project.
> + *
> + * 6. Redistributions of any form whatsoever must retain the following
> + *    acknowledgment:
> + *    "This product includes software developed by the OpenSSL Project
> + *    for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)"
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
> + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> + * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE OpenSSL PROJECT OR
> + * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + * ====================================================================
> + *
> + * This product includes cryptographic software written by Eric Young
> + * (e...@cryptsoft.com).  This product includes software written by Tim
> + * Hudson (t...@cryptsoft.com).
> + *
> + */
> +
> +
> +#include <openssl/opensslconf.h>
> +
> +#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AES_NI) && 
> !defined(OPENSSL_NO_AES)
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <assert.h>
> +#include <openssl/crypto.h>
> +#include <openssl/dso.h>
> +#include <openssl/engine.h>
> +#include <openssl/evp.h>
> +#include <openssl/aes.h>
> +#include <openssl/err.h>
> +#include <cryptlib.h>
> +#include "crypto/modes/modes.h"
> +
> +/* AES-NI is available *ONLY* on some x86 CPUs.  Not only that it
> +   doesn't exist elsewhere, but it even can't be compiled on other
> +   platforms! */
> +#undef COMPILE_HW_AESNI
> +#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_AMD64)) && 
> !defined(I386_ONLY)
> +#define COMPILE_HW_AESNI
> +static ENGINE *ENGINE_aesni (void);
> +#endif
> +
> +void ENGINE_load_aesni (void)
> +{
> +/* On non-x86 CPUs it just returns. */
> +#ifdef COMPILE_HW_AESNI
> +     ENGINE *toadd = ENGINE_aesni();
> +     if (!toadd)
> +             return;
> +     ENGINE_add (toadd);
> +     ENGINE_free (toadd);
> +     ERR_clear_error ();
> +#endif
> +}
> +
> +#ifdef COMPILE_HW_AESNI
> +int aesni_set_encrypt_key(const unsigned char *userKey, const int bits,
> +                           AES_KEY *key);
> +int aesni_set_decrypt_key(const unsigned char *userKey, const int bits,
> +                           AES_KEY *key);
> +
> +void aesni_encrypt(const unsigned char *in, unsigned char *out,
> +                    const AES_KEY *key);
> +void aesni_decrypt(const unsigned char *in, unsigned char *out,
> +                    const AES_KEY *key);
> +
> +void aesni_ecb_encrypt(const unsigned char *in,
> +                        unsigned char *out,
> +                        const unsigned long length,
> +                        const AES_KEY *key,
> +                        const int enc);
> +void aesni_cbc_encrypt(const unsigned char *in,
> +                        unsigned char *out,
> +                        const unsigned long length,
> +                        const AES_KEY *key,
> +                        unsigned char *ivec, const int enc);
> +
> +/* Function for ENGINE detection and control */
> +static int aesni_init(ENGINE *e);
> +
> +/* Cipher Stuff */
> +static int aesni_ciphers(ENGINE *e, const EVP_CIPHER **cipher,
> +                             const int **nids, int nid);
> +
> +#define AESNI_MIN_ALIGN      16
> +#define AESNI_ALIGN(x) \
> +     ((void *)(((unsigned long)(x)+AESNI_MIN_ALIGN-1)&~(AESNI_MIN_ALIGN-1)))
> +
> +/* Engine names */
> +static const char *aesni_id = "AESNI";
> +static char *aesni_name = "AESNI";
> +
> +/* ===== Engine "management" functions ===== */
> +
> +/* Prepare the ENGINE structure for registration */
> +static int
> +aesni_bind_helper(ENGINE *e)
> +{
> +     if (!(OPENSSL_ia32cap_P & (1UL << 57)))
> +             return 0;
> +
> +     /* Register everything or return with an error */
> +     if (!ENGINE_set_id(e, aesni_id) ||
> +         !ENGINE_set_name(e, aesni_name) ||
> +
> +         !ENGINE_set_init_function(e, aesni_init) ||
> +         !ENGINE_set_ciphers (e, aesni_ciphers))
> +             return 0;
> +
> +     /* Everything looks good */
> +     return 1;
> +}
> +
> +/* Constructor */
> +static ENGINE *
> +ENGINE_aesni(void)
> +{
> +     ENGINE *eng = ENGINE_new();
> +
> +     if (!eng) {
> +             return NULL;
> +     }
> +
> +     if (!aesni_bind_helper(eng)) {
> +             ENGINE_free(eng);
> +             return NULL;
> +     }
> +
> +     return eng;
> +}
> +
> +/* Check availability of the engine */
> +static int
> +aesni_init(ENGINE *e)
> +{
> +     return 1;
> +}
> +
> +#if defined(NID_aes_128_cfb128) && ! defined (NID_aes_128_cfb)
> +#define NID_aes_128_cfb      NID_aes_128_cfb128
> +#endif
> +
> +#if defined(NID_aes_128_ofb128) && ! defined (NID_aes_128_ofb)
> +#define NID_aes_128_ofb      NID_aes_128_ofb128
> +#endif
> +
> +#if defined(NID_aes_192_cfb128) && ! defined (NID_aes_192_cfb)
> +#define NID_aes_192_cfb      NID_aes_192_cfb128
> +#endif
> +
> +#if defined(NID_aes_192_ofb128) && ! defined (NID_aes_192_ofb)
> +#define NID_aes_192_ofb      NID_aes_192_ofb128
> +#endif
> +
> +#if defined(NID_aes_256_cfb128) && ! defined (NID_aes_256_cfb)
> +#define NID_aes_256_cfb      NID_aes_256_cfb128
> +#endif
> +
> +#if defined(NID_aes_256_ofb128) && ! defined (NID_aes_256_ofb)
> +#define NID_aes_256_ofb      NID_aes_256_ofb128
> +#endif
> +
> +/* List of supported ciphers. */
> +static int aesni_cipher_nids[] = {
> +     NID_aes_128_ecb,
> +     NID_aes_128_cbc,
> +     NID_aes_128_cfb,
> +     NID_aes_128_ofb,
> +
> +     NID_aes_192_ecb,
> +     NID_aes_192_cbc,
> +     NID_aes_192_cfb,
> +     NID_aes_192_ofb,
> +
> +     NID_aes_256_ecb,
> +     NID_aes_256_cbc,
> +     NID_aes_256_cfb,
> +     NID_aes_256_ofb,
> +};
> +static int aesni_cipher_nids_num =
> +     (sizeof(aesni_cipher_nids)/sizeof(aesni_cipher_nids[0]));
> +
> +/* Function prototypes ... */
> +static int aesni_init_key(EVP_CIPHER_CTX *ctx, const unsigned char *key,
> +                           const unsigned char *iv, int enc);
> +static int aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,
> +                         const unsigned char *in, size_t inl);
> +
> +typedef struct
> +{
> +     AES_KEY ks;
> +     unsigned int _pad1[3];
> +} AESNI_KEY;
> +
> +#define AES_BLOCK_SIZE               16
> +
> +#define EVP_CIPHER_block_size_ECB    AES_BLOCK_SIZE
> +#define EVP_CIPHER_block_size_CBC    AES_BLOCK_SIZE
> +#define EVP_CIPHER_block_size_OFB    1
> +#define EVP_CIPHER_block_size_CFB    1
> +
> +/* Declaring so many ciphers by hand would be a pain.
> +   Instead introduce a bit of preprocessor magic :-) */
> +#define      DECLARE_AES_EVP(ksize,lmode,umode)      \
> +static const EVP_CIPHER aesni_##ksize##_##lmode = {  \
> +     NID_aes_##ksize##_##lmode,                      \
> +     EVP_CIPHER_block_size_##umode,                  \
> +     ksize / 8,                                      \
> +     AES_BLOCK_SIZE,                                 \
> +     0 | EVP_CIPH_##umode##_MODE,                    \
> +     aesni_init_key,                         \
> +     aesni_cipher,                           \
> +     NULL,                                           \
> +     sizeof(AESNI_KEY),                              \
> +     EVP_CIPHER_set_asn1_iv,                         \
> +     EVP_CIPHER_get_asn1_iv,                         \
> +     NULL,                                           \
> +     NULL                                            \
> +}
> +
> +DECLARE_AES_EVP(128,ecb,ECB);
> +DECLARE_AES_EVP(128,cbc,CBC);
> +DECLARE_AES_EVP(128,cfb,CFB);
> +DECLARE_AES_EVP(128,ofb,OFB);
> +
> +DECLARE_AES_EVP(192,ecb,ECB);
> +DECLARE_AES_EVP(192,cbc,CBC);
> +DECLARE_AES_EVP(192,cfb,CFB);
> +DECLARE_AES_EVP(192,ofb,OFB);
> +
> +DECLARE_AES_EVP(256,ecb,ECB);
> +DECLARE_AES_EVP(256,cbc,CBC);
> +DECLARE_AES_EVP(256,cfb,CFB);
> +DECLARE_AES_EVP(256,ofb,OFB);
> +
> +static int
> +aesni_ciphers (ENGINE *e, const EVP_CIPHER **cipher,
> +                   const int **nids, int nid)
> +{
> +     /* No specific cipher => return a list of supported nids ... */
> +     if (!cipher) {
> +             *nids = aesni_cipher_nids;
> +             return aesni_cipher_nids_num;
> +     }
> +
> +     /* ... or the requested "cipher" otherwise */
> +     switch (nid) {
> +     case NID_aes_128_ecb:
> +             *cipher = &aesni_128_ecb;
> +             break;
> +     case NID_aes_128_cbc:
> +             *cipher = &aesni_128_cbc;
> +             break;
> +     case NID_aes_128_cfb:
> +             *cipher = &aesni_128_cfb;
> +             break;
> +     case NID_aes_128_ofb:
> +             *cipher = &aesni_128_ofb;
> +             break;
> +
> +     case NID_aes_192_ecb:
> +             *cipher = &aesni_192_ecb;
> +             break;
> +     case NID_aes_192_cbc:
> +             *cipher = &aesni_192_cbc;
> +             break;
> +     case NID_aes_192_cfb:
> +             *cipher = &aesni_192_cfb;
> +             break;
> +     case NID_aes_192_ofb:
> +             *cipher = &aesni_192_ofb;
> +             break;
> +
> +     case NID_aes_256_ecb:
> +             *cipher = &aesni_256_ecb;
> +             break;
> +     case NID_aes_256_cbc:
> +             *cipher = &aesni_256_cbc;
> +             break;
> +     case NID_aes_256_cfb:
> +             *cipher = &aesni_256_cfb;
> +             break;
> +     case NID_aes_256_ofb:
> +             *cipher = &aesni_256_ofb;
> +             break;
> +
> +     default:
> +             /* Sorry, we don't support this NID */
> +             *cipher = NULL;
> +             return 0;
> +     }
> +
> +     return 1;
> +}
> +
> +/* Prepare the encryption key for AES NI usage */
> +static int
> +aesni_init_key (EVP_CIPHER_CTX *ctx, const unsigned char *user_key,
> +                 const unsigned char *iv, int enc)
> +{
> +     int ret;
> +     AES_KEY *key = AESNI_ALIGN(ctx->cipher_data);
> +
> +     if ((ctx->cipher->flags & EVP_CIPH_MODE) == EVP_CIPH_CFB_MODE
> +         || (ctx->cipher->flags & EVP_CIPH_MODE) == EVP_CIPH_OFB_MODE
> +         || enc)
> +             ret=aesni_set_encrypt_key(user_key, ctx->key_len * 8, key);
> +     else
> +             ret=aesni_set_decrypt_key(user_key, ctx->key_len * 8, key);
> +
> +     if(ret < 0) {
> +             EVPerr(EVP_F_AES_INIT_KEY,EVP_R_AES_KEY_SETUP_FAILED);
> +             return 0;
> +     }
> +
> +     return 1;
> +}
> +
> +static int
> +aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,
> +              const unsigned char *in, size_t inl)
> +{
> +     AES_KEY *key = AESNI_ALIGN(ctx->cipher_data);
> +
> +     switch (EVP_CIPHER_CTX_mode(ctx)) {
> +     case EVP_CIPH_ECB_MODE:
> +             aesni_ecb_encrypt(in, out, inl, key, ctx->encrypt);
> +             break;
> +     case EVP_CIPH_CBC_MODE:
> +             aesni_cbc_encrypt(in, out, inl, key,
> +                                   ctx->iv, ctx->encrypt);
> +             break;
> +     case EVP_CIPH_CFB_MODE:
> +             CRYPTO_cfb128_encrypt(in, out, inl, key, ctx->iv,
> +                                   &ctx->num, ctx->encrypt,
> +                                   aesni_encrypt);
> +             break;
> +     case EVP_CIPH_OFB_MODE:
> +             CRYPTO_ofb128_encrypt(in, out, inl, key,
> +                                   ctx->iv, &ctx->num,
> +                                   aesni_encrypt);
> +             break;
> +     default:
> +             return 0;
> +     }
> +
> +     return 1;
> +}
> +
> +#endif /* COMPILE_HW_AESNI */
> +#endif /* !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI) && 
> !defined(OPENSSL_NO_AES) */
> --- a/crypto/engine/eng_all.c
> +++ b/crypto/engine/eng_all.c
> @@ -71,6 +71,9 @@ void ENGINE_load_builtin_engines(void)
>  #if defined(__OpenBSD__) || defined(__FreeBSD__)
>       ENGINE_load_cryptodev();
>  #endif
> +#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI)
> +     ENGINE_load_aesni();
> +#endif
>       ENGINE_load_dynamic();
>  #ifndef OPENSSL_NO_STATIC_ENGINE
>  #ifndef OPENSSL_NO_HW
> --- /dev/null
> +++ b/crypto/engine/eng_aesni_asm.pl
> @@ -0,0 +1,918 @@
> +#
> +# ====================================================================
> +# Written by Intel Corporation for the OpenSSL project to add support
> +# for Intel AES-NI instructions. Rights for redistribution and usage
> +# in source and binary forms are granted according to the OpenSSL
> +# license.
> +#
> +#   Author: Huang Ying <ying.hu...@intel.com>
> +#           Vinodh Gopal <vinodh.go...@intel.com>
> +#           Kahraman Akdemir
> +# ====================================================================
> +#
> +
> +$output=shift;
> +
> +$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
> +( $xlate="${dir}x86_64-xlate.pl" and -f $xlate ) or
> +( $xlate="${dir}../perlasm/x86_64-xlate.pl" and -f $xlate) or
> +die "can't locate x86_64-xlate.pl";
> +
> +open STDOUT,"| $^X $xlate $output";
> +
> +$code=".text\n";
> +
> +$state="%xmm0";
> +$state1="%xmm0";
> +$key="%xmm1";
> +$in="%xmm2";
> +$in1="%xmm2";
> +$iv="%xmm3";
> +$state2="%xmm4";
> +$state3="%xmm5";
> +$state4="%xmm6";
> +$in2="%xmm7";
> +$in3="%xmm8";
> +$in4="%xmm9";
> +
> +$inp="%r11";
> +$len="%rdx";
> +$outp="%r10";
> +$keyp="%r9";
> +$ivp="%r8";
> +$rnds="%esi";
> +$t1="%rdi";
> +$t1d="%edi";
> +$tkeyp=$t1;
> +$t2="%rcx";
> +$t3="%rax";
> +
> +$code.=<<___;
> +.type        _key_expansion_128,\...@abi-omnipotent
> +.align       16
> +_key_expansion_128:
> +_key_expansion_256a:
> +     pshufd \$0b11111111, %xmm1, %xmm1
> +     shufps \$0b00010000, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     shufps \$0b10001100, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     pxor %xmm1, %xmm0
> +     movaps %xmm0, (%rcx)
> +     add \$0x10, %rcx
> +     ret
> +.size        _key_expansion_128, . - _key_expansion_128
> +___
> +
> +$code.=<<___;
> +.type        _key_expansion_192a,\...@abi-omnipotent
> +.align 16
> +_key_expansion_192a:
> +     pshufd \$0b01010101, %xmm1, %xmm1
> +     shufps \$0b00010000, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     shufps \$0b10001100, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     pxor %xmm1, %xmm0
> +
> +     movaps %xmm2, %xmm5
> +     movaps %xmm2, %xmm6
> +     pslldq \$4, %xmm5
> +     pshufd \$0b11111111, %xmm0, %xmm3
> +     pxor %xmm3, %xmm2
> +     pxor %xmm5, %xmm2
> +
> +     movaps %xmm0, %xmm1
> +     shufps \$0b01000100, %xmm0, %xmm6
> +     movaps %xmm6, (%rcx)
> +     shufps \$0b01001110, %xmm2, %xmm1
> +     movaps %xmm1, 0x10(%rcx)
> +     add \$0x20, %rcx
> +     ret
> +.size        _key_expansion_192a, . - _key_expansion_192a
> +___
> +
> +$code.=<<___;
> +.type        _key_expansion_192b,\...@abi-omnipotent
> +.align 16
> +_key_expansion_192b:
> +     pshufd \$0b01010101, %xmm1, %xmm1
> +     shufps \$0b00010000, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     shufps \$0b10001100, %xmm0, %xmm4
> +     pxor %xmm4, %xmm0
> +     pxor %xmm1, %xmm0
> +
> +     movaps %xmm2, %xmm5
> +     pslldq \$4, %xmm5
> +     pshufd \$0b11111111, %xmm0, %xmm3
> +     pxor %xmm3, %xmm2
> +     pxor %xmm5, %xmm2
> +
> +     movaps %xmm0, (%rcx)
> +     add \$0x10, %rcx
> +     ret
> +.size        _key_expansion_192b, . - _key_expansion_192b
> +___
> +
> +$code.=<<___;
> +.type        _key_expansion_256b,\...@abi-omnipotent
> +.align 16
> +_key_expansion_256b:
> +     pshufd \$0b10101010, %xmm1, %xmm1
> +     shufps \$0b00010000, %xmm2, %xmm4
> +     pxor %xmm4, %xmm2
> +     shufps \$0b10001100, %xmm2, %xmm4
> +     pxor %xmm4, %xmm2
> +     pxor %xmm1, %xmm2
> +     movaps %xmm2, (%rcx)
> +     add \$0x10, %rcx
> +     ret
> +.size        _key_expansion_256b, . - _key_expansion_256b
> +___
> +
> +# int aesni_set_encrypt_key(const unsigned char *userKey, const int bits,
> +#                               AES_KEY *key)
> +$code.=<<___;
> +.globl       aesni_set_encrypt_key
> +.type        aesni_set_encrypt_key,\...@function,3
> +.align       16
> +aesni_set_encrypt_key:
> +     call _aesni_set_encrypt_key
> +     ret
> +.size        aesni_set_encrypt_key, . - aesni_set_encrypt_key
> +
> +.type        _aesni_set_encrypt_key,\...@abi-omnipotent
> +.align       16
> +_aesni_set_encrypt_key:
> +     test %rdi, %rdi
> +     jz .Lenc_key_invalid_param
> +     test %rdx, %rdx
> +     jz .Lenc_key_invalid_param
> +     movups (%rdi), %xmm0            # user key (first 16 bytes)
> +     movaps %xmm0, (%rdx)
> +     lea 0x10(%rdx), %rcx            # key addr
> +     pxor %xmm4, %xmm4               # xmm4 is assumed 0 in _key_expansion_x
> +     cmp \$256, %esi
> +     jnz .Lenc_key192
> +     mov \$14, %esi
> +     movl %esi, 240(%rdx)            # 14 rounds for 256
> +     movups 0x10(%rdi), %xmm2        # other user key
> +     movaps %xmm2, (%rcx)
> +     add \$0x10, %rcx
> +     # aeskeygenassist \$0x1, %xmm2, %xmm1   # round 1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x1, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x2, %xmm2, %xmm1   # round 2
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x2, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x4, %xmm2, %xmm1   # round 3
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x4, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x8, %xmm2, %xmm1   # round 4
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x8, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x10, %xmm2, %xmm1  # round 5
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x10, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x20, %xmm2, %xmm1  # round 6
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20
> +     call _key_expansion_256a
> +     # aeskeygenassist \$0x20, %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20
> +     call _key_expansion_256b
> +     # aeskeygenassist \$0x40, %xmm2, %xmm1  # round 7
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40
> +     call _key_expansion_256a
> +     xor %rax, %rax
> +     ret
> +.Lenc_key192:
> +     cmp \$192, %esi
> +     jnz .Lenc_key128
> +     mov \$12, %esi
> +     movl %esi, 240(%rdx)            # 12 rounds for 192
> +     movq 0x10(%rdi), %xmm2          # other user key
> +     # aeskeygenassist \$0x1, %xmm2, %xmm1   # round 1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01
> +     call _key_expansion_192a
> +     # aeskeygenassist \$0x2, %xmm2, %xmm1   # round 2
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02
> +     call _key_expansion_192b
> +     # aeskeygenassist \$0x4, %xmm2, %xmm1   # round 3
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04
> +     call _key_expansion_192a
> +     # aeskeygenassist \$0x8, %xmm2, %xmm1   # round 4
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08
> +     call _key_expansion_192b
> +     # aeskeygenassist \$0x10, %xmm2, %xmm1  # round 5
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10
> +     call _key_expansion_192a
> +     # aeskeygenassist \$0x20, %xmm2, %xmm1  # round 6
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20
> +     call _key_expansion_192b
> +     # aeskeygenassist \$0x40, %xmm2, %xmm1  # round 7
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40
> +     call _key_expansion_192a
> +     # aeskeygenassist \$0x80, %xmm2, %xmm1  # round 8
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x80
> +     call _key_expansion_192b
> +     xor %rax, %rax
> +     ret
> +.Lenc_key128:
> +     cmp \$128, %esi
> +     jnz .Lenc_key_invalid_key_bits
> +     mov \$10, %esi
> +     movl %esi, 240(%rdx)            # 10 rounds for 128
> +     # aeskeygenassist \$0x1, %xmm0, %xmm1   # round 1
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x2, %xmm0, %xmm1   # round 2
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x4, %xmm0, %xmm1   # round 3
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x8, %xmm0, %xmm1   # round 4
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x10, %xmm0, %xmm1  # round 5
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x20, %xmm0, %xmm1  # round 6
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x40, %xmm0, %xmm1  # round 7
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x40
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x80, %xmm0, %xmm1  # round 8
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x80
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x1b, %xmm0, %xmm1  # round 9
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x1b
> +     call _key_expansion_128
> +     # aeskeygenassist \$0x36, %xmm0, %xmm1  # round 10
> +     .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x36
> +     call _key_expansion_128
> +     xor %eax, %eax
> +     ret
> +.Lenc_key_invalid_param:
> +     mov \$-1, %rax
> +     ret
> +.Lenc_key_invalid_key_bits:
> +     mov \$-2, %rax
> +     ret
> +.size        _aesni_set_encrypt_key, . - _aesni_set_encrypt_key
> +___
> +
> +
> +# int aesni_set_decrypt_key(const unsigned char *userKey, const int bits,
> +#                               AES_KEY *key)
> +$code.=<<___;
> +.globl       aesni_set_decrypt_key
> +.type        aesni_set_decrypt_key,\...@function,3
> +.align       16
> +aesni_set_decrypt_key:
> +     call _aesni_set_encrypt_key
> +     test %rax, %rax
> +     jnz .Ldec_key_exit
> +     lea 0x10(%rdx), %rcx
> +     shl \$4, %esi
> +     add %rdx, %rsi
> +     mov %rsi, %rdi
> +.align 4
> +.Ldec_key_reorder_loop:
> +     movaps (%rdx), %xmm0
> +     movaps (%rsi), %xmm1
> +     movaps %xmm0, (%rsi)
> +     movaps %xmm1, (%rdx)
> +     lea 0x10(%rdx), %rdx
> +     lea -0x10(%rsi), %rsi
> +     cmp %rdx, %rsi
> +     ja .Ldec_key_reorder_loop
> +.align 4
> +.Ldec_key_inv_loop:
> +     movaps (%rcx), %xmm0
> +     # aesimc %xmm0, %xmm1
> +     .byte 0x66, 0x0f, 0x38, 0xdb, 0xc8
> +     movaps %xmm1, (%rcx)
> +     lea 0x10(%rcx), %rcx
> +     cmp %rdi, %rcx
> +     jnz .Ldec_key_inv_loop
> +.Ldec_key_exit:
> +     ret
> +.size        aesni_set_encrypt_key, . - aesni_set_encrypt_key
> +___
> +
> +# void aesni_encrypt (const void *inp,void *out,const AES_KEY *key);
> +$code.=<<___;
> +.globl       aesni_encrypt
> +.type        aesni_encrypt,\...@function,3
> +.align       16
> +aesni_encrypt:
> +     mov %rdi, $inp
> +     mov %rsi, $outp
> +     mov %rdx, $keyp
> +     mov 240($keyp), $rnds           # round count
> +     movups ($inp), $state           # input
> +     call _aesni_encrypt1
> +     movups $state, ($outp)          # output
> +     ret
> +.size        aesni_encrypt, . - aesni_encrypt
> +___
> +
> +# _aesni_encrypt1:   internal ABI
> +# input:
> +#    $keyp:          key struct pointer
> +#    $rnds:          round count
> +#    $state:         initial state (input)
> +# output:
> +#    $state:         finial state (output)
> +# changed:
> +#    $key
> +#    $tkeyp ($t1)
> +$code.=<<___;
> +.type        _aesni_encrypt1,\...@abi-omnipotent
> +.align       16
> +_aesni_encrypt1:
> +     movaps ($keyp), $key            # key
> +     mov $keyp, $tkeyp
> +     pxor $key, $state               # round 0
> +     lea 0x30($tkeyp), $tkeyp
> +     cmp \$12, $rnds
> +     jb .Lenc128
> +     lea 0x20($tkeyp), $tkeyp
> +     je .Lenc192
> +     lea 0x20($tkeyp), $tkeyp
> +     movaps -0x60($tkeyp), $key
> +     aesenc $key, $state
> +     movaps -0x50($tkeyp), $key
> +     aesenc $key, $state
> +.align 4
> +.Lenc192:
> +     movaps -0x40($tkeyp), $key
> +     aesenc $key, $state
> +     movaps -0x30($tkeyp), $key
> +     aesenc $key, $state
> +.align 4
> +.Lenc128:
> +     movaps -0x20($tkeyp), $key
> +     aesenc $key, $state
> +     movaps -0x10($tkeyp), $key
> +     aesenc $key, $state
> +     movaps ($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x10($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x20($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x30($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x40($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x50($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x60($tkeyp), $key
> +     aesenc $key, $state
> +     movaps 0x70($tkeyp), $key
> +     aesenclast $key, $state # last round
> +     ret
> +.size        _aesni_encrypt1, . - _aesni_encrypt1
> +___
> +
> +# _aesni_encrypt4:   internal ABI
> +# input:
> +#    $keyp:          key struct pointer
> +#    $rnds:          round count
> +#    $state1:        initial state (input)
> +#    $state2
> +#    $state3
> +#    $state4
> +# output:
> +#    $state1:        finial state (output)
> +#    $state2
> +#    $state3
> +#    $state4
> +# changed:
> +#    $key
> +#    $tkeyp ($t1)
> +$code.=<<___;
> +.type        _aesni_encrypt4,\...@abi-omnipotent
> +.align       16
> +_aesni_encrypt4:
> +     movaps ($keyp), $key            # key
> +     mov $keyp, $tkeyp
> +     pxor $key, $state1              # round 0
> +     pxor $key, $state2
> +     pxor $key, $state3
> +     pxor $key, $state4
> +     lea 0x30($tkeyp), $tkeyp
> +     cmp \$12, $rnds
> +     jb .L4enc128
> +     lea 0x20($tkeyp), $tkeyp
> +     je .L4enc192
> +     lea 0x20($tkeyp), $tkeyp
> +     movaps -0x60($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps -0x50($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +.align 4
> +.L4enc192:
> +     movaps -0x40($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps -0x30($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +.align 4
> +.L4enc128:
> +     movaps -0x20($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps -0x10($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps ($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x10($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x20($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x30($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x40($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x50($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x60($tkeyp), $key
> +     aesenc $key, $state1
> +     aesenc $key, $state2
> +     aesenc $key, $state3
> +     aesenc $key, $state4
> +     movaps 0x70($tkeyp), $key
> +     aesenclast $key, $state1        # last round
> +     aesenclast $key, $state2
> +     aesenclast $key, $state3
> +     aesenclast $key, $state4
> +     ret
> +.size        _aesni_encrypt4, . - _aesni_encrypt4
> +___
> +
> +# void aesni_decrypt (const void *inp,void *out,const AES_KEY *key);
> +$code.=<<___;
> +.globl       aesni_decrypt
> +.type        aesni_decrypt,\...@function,3
> +.align       16
> +aesni_decrypt:
> +     mov %rdi, $inp
> +     mov %rsi, $outp
> +     mov %rdx, $keyp
> +     mov 240($keyp), $rnds           # round count
> +     movups ($inp), $state           # input
> +     call _aesni_decrypt1
> +     movups $state, ($outp)          #output
> +     ret
> +.size        aesni_encrypt, . - aesni_encrypt
> +___
> +
> +# _aesni_decrypt1:   internal ABI
> +# input:
> +#    $keyp:          key struct pointer
> +#    $rnds:          round count
> +#    $state:         initial state (input)
> +# output:
> +#    $state:         finial state (output)
> +# changed:
> +#    $key
> +#    $tkeyp ($t1)
> +$code.=<<___;
> +.type        _aesni_decrypt1,\...@abi-omnipotent
> +.align       16
> +_aesni_decrypt1:
> +     movaps ($keyp), $key            # key
> +     mov $keyp, $tkeyp
> +     pxor $key, $state               # round 0
> +     lea 0x30($tkeyp), $tkeyp
> +     cmp \$12, $rnds
> +     jb .Ldec128
> +     lea 0x20($tkeyp), $tkeyp
> +     je .Ldec192
> +     lea 0x20($tkeyp), $tkeyp
> +     movaps -0x60($tkeyp), $key
> +     aesdec $key, $state
> +     movaps -0x50($tkeyp), $key
> +     aesdec $key, $state
> +.align 4
> +.Ldec192:
> +     movaps -0x40($tkeyp), $key
> +     aesdec $key, $state
> +     movaps -0x30($tkeyp), $key
> +     aesdec $key, $state
> +.align 4
> +.Ldec128:
> +     movaps -0x20($tkeyp), $key
> +     aesdec $key, $state
> +     movaps -0x10($tkeyp), $key
> +     aesdec $key, $state
> +     movaps ($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x10($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x20($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x30($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x40($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x50($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x60($tkeyp), $key
> +     aesdec $key, $state
> +     movaps 0x70($tkeyp), $key
> +     aesdeclast $key, $state         # last round
> +     ret
> +.size        _aesni_decrypt1, . - _aesni_decrypt1
> +___
> +
> +# _aesni_decrypt4:   internal ABI
> +# input:
> +#    $keyp:          key struct pointer
> +#    $rnds:          round count
> +#    $state1:        initial state (input)
> +#    $state2
> +#    $state3
> +#    $state4
> +# output:
> +#    $state1:        finial state (output)
> +#    $state2
> +#    $state3
> +#    $state4
> +# changed:
> +#    $key
> +#    $tkeyp ($t1)
> +$code.=<<___;
> +.type        _aesni_decrypt4,\...@abi-omnipotent
> +.align       16
> +_aesni_decrypt4:
> +     movaps ($keyp), $key            # key
> +     mov $keyp, $tkeyp
> +     pxor $key, $state1              # round 0
> +     pxor $key, $state2
> +     pxor $key, $state3
> +     pxor $key, $state4
> +     lea 0x30($tkeyp), $tkeyp
> +     cmp \$12, $rnds
> +     jb .L4dec128
> +     lea 0x20($tkeyp), $tkeyp
> +     je .L4dec192
> +     lea 0x20($tkeyp), $tkeyp
> +     movaps -0x60($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps -0x50($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +.align 4
> +.L4dec192:
> +     movaps -0x40($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps -0x30($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +.align 4
> +.L4dec128:
> +     movaps -0x20($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps -0x10($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps ($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x10($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x20($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x30($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x40($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x50($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x60($tkeyp), $key
> +     aesdec $key, $state1
> +     aesdec $key, $state2
> +     aesdec $key, $state3
> +     aesdec $key, $state4
> +     movaps 0x70($tkeyp), $key
> +     aesdeclast $key, $state1        # last round
> +     aesdeclast $key, $state2
> +     aesdeclast $key, $state3
> +     aesdeclast $key, $state4
> +     ret
> +.size        _aesni_decrypt4, . - _aesni_decrypt4
> +___
> +
> +# void aesni_ecb_encrypt(const unsigned char *in, unsigned char *out,
> +#                         size_t length, const AES_KEY *key,
> +#                         const int enc);
> +$code.=<<___;
> +.globl       aesni_ecb_encrypt
> +.type        aesni_ecb_encrypt,\...@function,5
> +.align       16
> +aesni_ecb_encrypt:
> +     test $len, $len         # check length
> +     jz .Lecb_just_ret
> +     mov %rdi, $inp
> +     mov %rsi, $outp
> +     mov %r8d, $t1d          # clear upper half of enc
> +     mov %rcx, $keyp
> +     mov 240($keyp), $rnds
> +     test $t1, $t1
> +     jz .Lecb_decrypt
> +#--------------------------- ENCRYPT ------------------------------#
> +     cmp \$16, $len
> +     jb .Lecb_just_ret
> +     cmp \$64, $len
> +     jb .Lecb_enc_loop1
> +.align 4
> +.Lecb_enc_loop4:
> +     movups ($inp), $state1
> +     movups 0x10($inp), $state2
> +     movups 0x20($inp), $state3
> +     movups 0x30($inp), $state4
> +     call _aesni_encrypt4
> +     movups $state1, ($outp)
> +     movups $state2, 0x10($outp)
> +     movups $state3, 0x20($outp)
> +     movups $state4, 0x30($outp)
> +     sub \$64, $len
> +     add \$64, $inp
> +     add \$64, $outp
> +     cmp \$64, $len
> +     jge .Lecb_enc_loop4
> +     cmp \$16, $len
> +     jb .Lecb_just_ret
> +.align 4
> +.Lecb_enc_loop1:
> +     movups ($inp), $state1
> +     call _aesni_encrypt1
> +     movups $state1, ($outp)
> +     sub \$16, $len
> +     add \$16, $inp
> +     add \$16, $outp
> +     cmp \$16, $len
> +     jge .Lecb_enc_loop1
> +     jmp .Lecb_just_ret
> +#--------------------------- DECRYPT ------------------------------#
> +.Lecb_decrypt:
> +     cmp \$16, $len
> +     jb .Lecb_just_ret
> +     cmp \$64, $len
> +     jb .Lecb_dec_loop1
> +.align 4
> +.Lecb_dec_loop4:
> +     movups ($inp), $state1
> +     movups 0x10($inp), $state2
> +     movups 0x20($inp), $state3
> +     movups 0x30($inp), $state4
> +     call _aesni_decrypt4
> +     movups $state1, ($outp)
> +     movups $state2, 0x10($outp)
> +     movups $state3, 0x20($outp)
> +     movups $state4, 0x30($outp)
> +     sub \$64, $len
> +     add \$64, $inp
> +     add \$64, $outp
> +     cmp \$64, $len
> +     jge .Lecb_dec_loop4
> +     cmp \$16, $len
> +     jb .Lecb_just_ret
> +.align 4
> +.Lecb_dec_loop1:
> +     movups ($inp), $state1
> +     call _aesni_decrypt1
> +     movups $state1, ($outp)
> +     sub \$16, $len
> +     add \$16, $inp
> +     add \$16, $outp
> +     cmp \$16, $len
> +     jge .Lecb_dec_loop1
> +.Lecb_just_ret:
> +     ret
> +.size        aesni_ecb_encrypt, . - aesni_ecb_encrypt
> +___
> +
> +# void aesni_cbc_encrypt (const void char *inp, unsigned char *out,
> +#                          size_t length, const AES_KEY *key,
> +#                          unsigned char *ivp,const int enc);
> +$code.=<<___;
> +.globl       aesni_cbc_encrypt
> +.type        aesni_cbc_encrypt,\...@function,6
> +.align       16
> +aesni_cbc_encrypt:
> +     test $len, $len         # check length
> +     jz .Lcbc_just_ret
> +     mov %rdi, $inp
> +     mov %rsi, $outp
> +     mov %r9d, $t1d          # clear upper half of enc
> +     mov %rcx, $keyp
> +     mov 240($keyp), $rnds
> +     test $t1, $t1
> +     jz .Lcbc_decrypt
> +#--------------------------- ENCRYPT ------------------------------#
> +     movups ($ivp), $state   # load iv as initial state
> +     cmp \$16, $len
> +     jb .Lcbc_enc_tail
> +.align 4
> +.Lcbc_enc_loop:
> +     movups ($inp), $in      # load input
> +     pxor $in, $state
> +     call _aesni_encrypt1
> +     movups $state, ($outp)  # store output
> +     sub \$16, $len
> +     add \$16, $inp
> +     add \$16, $outp
> +     cmp \$16, $len
> +     jge .Lcbc_enc_loop
> +     test \$0xf, $len
> +     jnz .Lcbc_enc_tail
> +     movups $state, ($ivp)
> +     jmp .Lcbc_just_ret
> +.Lcbc_enc_tail:
> +     mov $len, %rcx
> +     mov $inp, %rsi
> +     mov $outp, %rdi
> +     .long 0x9066A4F3        # rep movsb
> +     mov 240($keyp), $rnds   # restore $rnds (%esi)
> +     mov \$16, %rcx          # zero tail
> +     sub $len, %rcx
> +     xor %rax, %rax
> +     .long 0x9066AAF3        # rep stosb
> +     mov $outp, $inp         # this is not a mistake!
> +     movq \$16, $len         # len=16
> +     jmp .Lcbc_enc_loop      # one more spin
> +#--------------------------- DECRYPT ------------------------------#
> +.Lcbc_decrypt:
> +     movups ($ivp), $iv
> +     cmp \$16, $len
> +     jb .Lcbc_dec_tail
> +     cmp \$64, $len
> +     jb .Lcbc_dec_loop1
> +.align 4
> +.Lcbc_dec_loop4:
> +     movups ($inp), $in1
> +     movaps $in1, $state1
> +     movups 0x10($inp), $in2
> +     movaps $in2, $state2
> +     movups 0x20($inp), $in3
> +     movaps $in3, $state3
> +     movups 0x30($inp), $in4
> +     movaps $in4, $state4
> +     call _aesni_decrypt4
> +     pxor $iv, $state1
> +     pxor $in1, $state2
> +     pxor $in2, $state3
> +     pxor $in3, $state4
> +     movaps $in4, $iv
> +     movups $state1, ($outp)
> +     movups $state2, 0x10($outp)
> +     movups $state3, 0x20($outp)
> +     movups $state4, 0x30($outp)
> +     sub \$64, $len
> +     add \$64, $inp
> +     add \$64, $outp
> +     cmp \$64, $len
> +     jge .Lcbc_dec_loop4
> +     cmp \$0, $len
> +     jz .Lcbc_dec_ret
> +     cmp \$16, $len
> +     jb .Lcbc_dec_tail
> +.align 4
> +.Lcbc_dec_loop1:
> +     movups ($inp), $in
> +     movaps $in, $state
> +     call _aesni_decrypt1
> +     pxor $iv, $state
> +     movups $state, ($outp)
> +     movaps $in, $iv
> +     sub \$16, $len
> +     add \$16, $inp
> +     add \$16, $outp
> +     cmp \$16, $len
> +     jge .Lcbc_dec_loop1
> +     test \$0xf, $len
> +     jz .Lcbc_dec_ret
> +.Lcbc_dec_tail:
> +     movups ($inp), $in
> +     movaps $in, $state
> +     call _aesni_decrypt1
> +     pxor $iv, $state
> +     movaps $in, $iv
> +     sub \$16, %rsp          # alloc temporary space
> +     movups $state, (%rsp)
> +     mov $outp, %rdi
> +     mov %rsp, %rsi
> +     mov $len, %rcx
> +     .long 0x9066A4F3        # rep movsb
> +     mov %rsp, %rdi          # clear stack
> +     mov \$16, %rcx
> +     xor %rax, %rax
> +     .long 0x9066AAF3        # rep stosb
> +     add \$16, %rsp
> +.Lcbc_dec_ret:
> +     movups $iv, ($ivp)
> +.Lcbc_just_ret:
> +     ret
> +.size        aesni_cbc_encrypt, . - aesni_cbc_encrypt
> +___
> +
> +$code.=<<___;
> +     .long   0x80808080, 0x80808080, 0xfefefefe, 0xfefefefe
> +     .long   0x1b1b1b1b, 0x1b1b1b1b, 0, 0
> +.asciz  "AES for Intel AESNI, CRYPTOGAMS by <ying.hua...@intel.com>"
> +.align       64
> +___
> +
> +$code =~ s/\`([^\`]*)\`/eval($1)/gem;
> +
> +print $code;
> +
> +close STDOUT;
> --- a/crypto/engine/Makefile
> +++ b/crypto/engine/Makefile
> @@ -11,6 +11,8 @@ MAKEFILE=   Makefile
>  AR=          ar r
>  
>  CFLAGS= $(INCLUDES) $(CFLAG)
> +ASFLAGS= $(INCLUDES) $(ASFLAG)
> +AFLAGS= $(ASFLAGS)
>  
>  GENERAL=Makefile
>  TEST= enginetest.c
> @@ -21,12 +23,14 @@ LIBSRC= eng_err.c eng_lib.c eng_list.c e
>       eng_table.c eng_pkey.c eng_fat.c eng_all.c \
>       tb_rsa.c tb_dsa.c tb_ecdsa.c tb_dh.c tb_ecdh.c tb_rand.c tb_store.c \
>       tb_cipher.c tb_digest.c tb_pkmeth.c tb_asnmth.c \
> -     eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c
> +     eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c \
> +     eng_aesni.c eng_aesni_asm.pl
>  LIBOBJ= eng_err.o eng_lib.o eng_list.o eng_init.o eng_ctrl.o \
>       eng_table.o eng_pkey.o eng_fat.o eng_all.o \
>       tb_rsa.o tb_dsa.o tb_ecdsa.o tb_dh.o tb_ecdh.o tb_rand.o tb_store.o \
>       tb_cipher.o tb_digest.o tb_pkmeth.o tb_asnmth.o \
> -     eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o
> +     eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o \
> +     eng_aesni.o eng_aesni_asm.o
>  
>  SRC= $(LIBSRC)
>  
> @@ -45,6 +49,9 @@ lib:        $(LIBOBJ)
>       $(RANLIB) $(LIB) || echo Never mind.
>       @touch lib
>  
> +eng_aesni_asm.s: eng_aesni_asm.pl
> +     $(PERL) eng_aesni_asm.pl $(PERLASM_SCHEME) > $@
> +
>  files:
>       $(PERL) $(TOP)/util/files.pl Makefile >> $(TOP)/MINFO
>  
> --- a/crypto/engine/engine.h
> +++ b/crypto/engine/engine.h
> @@ -346,6 +346,7 @@ void ENGINE_load_gost(void);
>  #endif
>  #endif
>  void ENGINE_load_cryptodev(void);
> +void ENGINE_load_aesni(void);
>  void ENGINE_load_builtin_engines(void);
>  
>  /* Get and set global flags (ENGINE_TABLE_FLAG_***) for the implementation
> 

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to