Hi, All, It seems that Andy is not available from Christmas on. Who can tell me where can I find him? Or how can I do to have this patch reviewed?
Best Regards, Huang Ying On Wed, 2008-12-24 at 11:12 +0800, Huang Ying wrote: > This patch adds support to Intel AES-NI instruction set for x86_64 > platform. > > Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) > instructions that are going to be introduced in the next generation of > Intel processor, as of 2009. These instructions enable fast and secure > data encryption and decryption, using the Advanced Encryption Standard > (AES), defined by FIPS Publication number 197. The architecture > introduces six instructions that offer full hardware support for > AES. Four of them support high performance data encryption and > decryption, and the other two instructions support the AES key > expansion procedure. > > The white paper can be downloaded from: > > http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf > > > AES-NI support is implemented as an engine in crypto/engine/. > > > ChangeLog: > > v3: > > - Rename INTEL or INTEL_AES stuff to AESNI > > - Use cfb and ofb modes implementation of crypto/modes instead of copying. > > v2: > > - AES-NI support is implemented as an engine instead of "branch". > > - ECB and CBC modes are implemented in parallel style to take > advantage of pipelined hardware implementation. > > - AES key scheduling algorithm is re-implemented with higher performance. > > > Known issues: > > - How to add conditional compilation for eng_intel_asm.pl? It can not > be compiled on non-x86 platform. > > - NID for CTR mode can not be found, how to support it in engine? > > - CFB1, CFB8, OFB1, OFB8 modes are not supported. If it is necessary > to add AES-NI support for them, I can add them. > > > Signed-off-by: Huang Ying <ying.hu...@intel.com> > > --- > crypto/engine/Makefile | 11 > crypto/engine/eng_aesni.c | 409 ++++++++++++++++++ > crypto/engine/eng_aesni_asm.pl | 918 > +++++++++++++++++++++++++++++++++++++++++ > crypto/engine/eng_all.c | 3 > crypto/engine/engine.h | 1 > 5 files changed, 1340 insertions(+), 2 deletions(-) > > --- /dev/null > +++ b/crypto/engine/eng_aesni.c > @@ -0,0 +1,409 @@ > +/* > + * Support for Intel AES-NI intruction set > + * Author: Huang Ying <ying.hu...@intel.com> > + * > + * Intel AES-NI is a new set of Single Instruction Multiple Data > + * (SIMD) instructions that are going to be introduced in the next > + * generation of Intel processor, as of 2009. These instructions > + * enable fast and secure data encryption and decryption, using the > + * Advanced Encryption Standard (AES), defined by FIPS Publication > + * number 197. The architecture introduces six instructions that > + * offer full hardware support for AES. Four of them support high > + * performance data encryption and decryption, and the other two > + * instructions support the AES key expansion procedure. > + * > + * The white paper can be downloaded from: > + * > http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf > + * > + * This file is based on engines/e_padlock.c > + */ > + > +/* ==================================================================== > + * Copyright (c) 1999-2001 The OpenSSL Project. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * > + * 3. All advertising materials mentioning features or use of this > + * software must display the following acknowledgment: > + * "This product includes software developed by the OpenSSL Project > + * for use in the OpenSSL Toolkit. (http://www.OpenSSL.org/)" > + * > + * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to > + * endorse or promote products derived from this software without > + * prior written permission. For written permission, please contact > + * licens...@openssl.org. > + * > + * 5. Products derived from this software may not be called "OpenSSL" > + * nor may "OpenSSL" appear in their names without prior written > + * permission of the OpenSSL Project. > + * > + * 6. Redistributions of any form whatsoever must retain the following > + * acknowledgment: > + * "This product includes software developed by the OpenSSL Project > + * for use in the OpenSSL Toolkit (http://www.OpenSSL.org/)" > + * > + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY > + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR > + * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; > + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED > + * OF THE POSSIBILITY OF SUCH DAMAGE. > + * ==================================================================== > + * > + * This product includes cryptographic software written by Eric Young > + * (e...@cryptsoft.com). This product includes software written by Tim > + * Hudson (t...@cryptsoft.com). > + * > + */ > + > + > +#include <openssl/opensslconf.h> > + > +#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AES_NI) && > !defined(OPENSSL_NO_AES) > + > +#include <stdio.h> > +#include <string.h> > +#include <assert.h> > +#include <openssl/crypto.h> > +#include <openssl/dso.h> > +#include <openssl/engine.h> > +#include <openssl/evp.h> > +#include <openssl/aes.h> > +#include <openssl/err.h> > +#include <cryptlib.h> > +#include "crypto/modes/modes.h" > + > +/* AES-NI is available *ONLY* on some x86 CPUs. Not only that it > + doesn't exist elsewhere, but it even can't be compiled on other > + platforms! */ > +#undef COMPILE_HW_AESNI > +#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_AMD64)) && > !defined(I386_ONLY) > +#define COMPILE_HW_AESNI > +static ENGINE *ENGINE_aesni (void); > +#endif > + > +void ENGINE_load_aesni (void) > +{ > +/* On non-x86 CPUs it just returns. */ > +#ifdef COMPILE_HW_AESNI > + ENGINE *toadd = ENGINE_aesni(); > + if (!toadd) > + return; > + ENGINE_add (toadd); > + ENGINE_free (toadd); > + ERR_clear_error (); > +#endif > +} > + > +#ifdef COMPILE_HW_AESNI > +int aesni_set_encrypt_key(const unsigned char *userKey, const int bits, > + AES_KEY *key); > +int aesni_set_decrypt_key(const unsigned char *userKey, const int bits, > + AES_KEY *key); > + > +void aesni_encrypt(const unsigned char *in, unsigned char *out, > + const AES_KEY *key); > +void aesni_decrypt(const unsigned char *in, unsigned char *out, > + const AES_KEY *key); > + > +void aesni_ecb_encrypt(const unsigned char *in, > + unsigned char *out, > + const unsigned long length, > + const AES_KEY *key, > + const int enc); > +void aesni_cbc_encrypt(const unsigned char *in, > + unsigned char *out, > + const unsigned long length, > + const AES_KEY *key, > + unsigned char *ivec, const int enc); > + > +/* Function for ENGINE detection and control */ > +static int aesni_init(ENGINE *e); > + > +/* Cipher Stuff */ > +static int aesni_ciphers(ENGINE *e, const EVP_CIPHER **cipher, > + const int **nids, int nid); > + > +#define AESNI_MIN_ALIGN 16 > +#define AESNI_ALIGN(x) \ > + ((void *)(((unsigned long)(x)+AESNI_MIN_ALIGN-1)&~(AESNI_MIN_ALIGN-1))) > + > +/* Engine names */ > +static const char *aesni_id = "AESNI"; > +static char *aesni_name = "AESNI"; > + > +/* ===== Engine "management" functions ===== */ > + > +/* Prepare the ENGINE structure for registration */ > +static int > +aesni_bind_helper(ENGINE *e) > +{ > + if (!(OPENSSL_ia32cap_P & (1UL << 57))) > + return 0; > + > + /* Register everything or return with an error */ > + if (!ENGINE_set_id(e, aesni_id) || > + !ENGINE_set_name(e, aesni_name) || > + > + !ENGINE_set_init_function(e, aesni_init) || > + !ENGINE_set_ciphers (e, aesni_ciphers)) > + return 0; > + > + /* Everything looks good */ > + return 1; > +} > + > +/* Constructor */ > +static ENGINE * > +ENGINE_aesni(void) > +{ > + ENGINE *eng = ENGINE_new(); > + > + if (!eng) { > + return NULL; > + } > + > + if (!aesni_bind_helper(eng)) { > + ENGINE_free(eng); > + return NULL; > + } > + > + return eng; > +} > + > +/* Check availability of the engine */ > +static int > +aesni_init(ENGINE *e) > +{ > + return 1; > +} > + > +#if defined(NID_aes_128_cfb128) && ! defined (NID_aes_128_cfb) > +#define NID_aes_128_cfb NID_aes_128_cfb128 > +#endif > + > +#if defined(NID_aes_128_ofb128) && ! defined (NID_aes_128_ofb) > +#define NID_aes_128_ofb NID_aes_128_ofb128 > +#endif > + > +#if defined(NID_aes_192_cfb128) && ! defined (NID_aes_192_cfb) > +#define NID_aes_192_cfb NID_aes_192_cfb128 > +#endif > + > +#if defined(NID_aes_192_ofb128) && ! defined (NID_aes_192_ofb) > +#define NID_aes_192_ofb NID_aes_192_ofb128 > +#endif > + > +#if defined(NID_aes_256_cfb128) && ! defined (NID_aes_256_cfb) > +#define NID_aes_256_cfb NID_aes_256_cfb128 > +#endif > + > +#if defined(NID_aes_256_ofb128) && ! defined (NID_aes_256_ofb) > +#define NID_aes_256_ofb NID_aes_256_ofb128 > +#endif > + > +/* List of supported ciphers. */ > +static int aesni_cipher_nids[] = { > + NID_aes_128_ecb, > + NID_aes_128_cbc, > + NID_aes_128_cfb, > + NID_aes_128_ofb, > + > + NID_aes_192_ecb, > + NID_aes_192_cbc, > + NID_aes_192_cfb, > + NID_aes_192_ofb, > + > + NID_aes_256_ecb, > + NID_aes_256_cbc, > + NID_aes_256_cfb, > + NID_aes_256_ofb, > +}; > +static int aesni_cipher_nids_num = > + (sizeof(aesni_cipher_nids)/sizeof(aesni_cipher_nids[0])); > + > +/* Function prototypes ... */ > +static int aesni_init_key(EVP_CIPHER_CTX *ctx, const unsigned char *key, > + const unsigned char *iv, int enc); > +static int aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out, > + const unsigned char *in, size_t inl); > + > +typedef struct > +{ > + AES_KEY ks; > + unsigned int _pad1[3]; > +} AESNI_KEY; > + > +#define AES_BLOCK_SIZE 16 > + > +#define EVP_CIPHER_block_size_ECB AES_BLOCK_SIZE > +#define EVP_CIPHER_block_size_CBC AES_BLOCK_SIZE > +#define EVP_CIPHER_block_size_OFB 1 > +#define EVP_CIPHER_block_size_CFB 1 > + > +/* Declaring so many ciphers by hand would be a pain. > + Instead introduce a bit of preprocessor magic :-) */ > +#define DECLARE_AES_EVP(ksize,lmode,umode) \ > +static const EVP_CIPHER aesni_##ksize##_##lmode = { \ > + NID_aes_##ksize##_##lmode, \ > + EVP_CIPHER_block_size_##umode, \ > + ksize / 8, \ > + AES_BLOCK_SIZE, \ > + 0 | EVP_CIPH_##umode##_MODE, \ > + aesni_init_key, \ > + aesni_cipher, \ > + NULL, \ > + sizeof(AESNI_KEY), \ > + EVP_CIPHER_set_asn1_iv, \ > + EVP_CIPHER_get_asn1_iv, \ > + NULL, \ > + NULL \ > +} > + > +DECLARE_AES_EVP(128,ecb,ECB); > +DECLARE_AES_EVP(128,cbc,CBC); > +DECLARE_AES_EVP(128,cfb,CFB); > +DECLARE_AES_EVP(128,ofb,OFB); > + > +DECLARE_AES_EVP(192,ecb,ECB); > +DECLARE_AES_EVP(192,cbc,CBC); > +DECLARE_AES_EVP(192,cfb,CFB); > +DECLARE_AES_EVP(192,ofb,OFB); > + > +DECLARE_AES_EVP(256,ecb,ECB); > +DECLARE_AES_EVP(256,cbc,CBC); > +DECLARE_AES_EVP(256,cfb,CFB); > +DECLARE_AES_EVP(256,ofb,OFB); > + > +static int > +aesni_ciphers (ENGINE *e, const EVP_CIPHER **cipher, > + const int **nids, int nid) > +{ > + /* No specific cipher => return a list of supported nids ... */ > + if (!cipher) { > + *nids = aesni_cipher_nids; > + return aesni_cipher_nids_num; > + } > + > + /* ... or the requested "cipher" otherwise */ > + switch (nid) { > + case NID_aes_128_ecb: > + *cipher = &aesni_128_ecb; > + break; > + case NID_aes_128_cbc: > + *cipher = &aesni_128_cbc; > + break; > + case NID_aes_128_cfb: > + *cipher = &aesni_128_cfb; > + break; > + case NID_aes_128_ofb: > + *cipher = &aesni_128_ofb; > + break; > + > + case NID_aes_192_ecb: > + *cipher = &aesni_192_ecb; > + break; > + case NID_aes_192_cbc: > + *cipher = &aesni_192_cbc; > + break; > + case NID_aes_192_cfb: > + *cipher = &aesni_192_cfb; > + break; > + case NID_aes_192_ofb: > + *cipher = &aesni_192_ofb; > + break; > + > + case NID_aes_256_ecb: > + *cipher = &aesni_256_ecb; > + break; > + case NID_aes_256_cbc: > + *cipher = &aesni_256_cbc; > + break; > + case NID_aes_256_cfb: > + *cipher = &aesni_256_cfb; > + break; > + case NID_aes_256_ofb: > + *cipher = &aesni_256_ofb; > + break; > + > + default: > + /* Sorry, we don't support this NID */ > + *cipher = NULL; > + return 0; > + } > + > + return 1; > +} > + > +/* Prepare the encryption key for AES NI usage */ > +static int > +aesni_init_key (EVP_CIPHER_CTX *ctx, const unsigned char *user_key, > + const unsigned char *iv, int enc) > +{ > + int ret; > + AES_KEY *key = AESNI_ALIGN(ctx->cipher_data); > + > + if ((ctx->cipher->flags & EVP_CIPH_MODE) == EVP_CIPH_CFB_MODE > + || (ctx->cipher->flags & EVP_CIPH_MODE) == EVP_CIPH_OFB_MODE > + || enc) > + ret=aesni_set_encrypt_key(user_key, ctx->key_len * 8, key); > + else > + ret=aesni_set_decrypt_key(user_key, ctx->key_len * 8, key); > + > + if(ret < 0) { > + EVPerr(EVP_F_AES_INIT_KEY,EVP_R_AES_KEY_SETUP_FAILED); > + return 0; > + } > + > + return 1; > +} > + > +static int > +aesni_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out, > + const unsigned char *in, size_t inl) > +{ > + AES_KEY *key = AESNI_ALIGN(ctx->cipher_data); > + > + switch (EVP_CIPHER_CTX_mode(ctx)) { > + case EVP_CIPH_ECB_MODE: > + aesni_ecb_encrypt(in, out, inl, key, ctx->encrypt); > + break; > + case EVP_CIPH_CBC_MODE: > + aesni_cbc_encrypt(in, out, inl, key, > + ctx->iv, ctx->encrypt); > + break; > + case EVP_CIPH_CFB_MODE: > + CRYPTO_cfb128_encrypt(in, out, inl, key, ctx->iv, > + &ctx->num, ctx->encrypt, > + aesni_encrypt); > + break; > + case EVP_CIPH_OFB_MODE: > + CRYPTO_ofb128_encrypt(in, out, inl, key, > + ctx->iv, &ctx->num, > + aesni_encrypt); > + break; > + default: > + return 0; > + } > + > + return 1; > +} > + > +#endif /* COMPILE_HW_AESNI */ > +#endif /* !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI) && > !defined(OPENSSL_NO_AES) */ > --- a/crypto/engine/eng_all.c > +++ b/crypto/engine/eng_all.c > @@ -71,6 +71,9 @@ void ENGINE_load_builtin_engines(void) > #if defined(__OpenBSD__) || defined(__FreeBSD__) > ENGINE_load_cryptodev(); > #endif > +#if !defined(OPENSSL_NO_HW) && !defined(OPENSSL_NO_HW_AESNI) > + ENGINE_load_aesni(); > +#endif > ENGINE_load_dynamic(); > #ifndef OPENSSL_NO_STATIC_ENGINE > #ifndef OPENSSL_NO_HW > --- /dev/null > +++ b/crypto/engine/eng_aesni_asm.pl > @@ -0,0 +1,918 @@ > +# > +# ==================================================================== > +# Written by Intel Corporation for the OpenSSL project to add support > +# for Intel AES-NI instructions. Rights for redistribution and usage > +# in source and binary forms are granted according to the OpenSSL > +# license. > +# > +# Author: Huang Ying <ying.hu...@intel.com> > +# Vinodh Gopal <vinodh.go...@intel.com> > +# Kahraman Akdemir > +# ==================================================================== > +# > + > +$output=shift; > + > +$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1; > +( $xlate="${dir}x86_64-xlate.pl" and -f $xlate ) or > +( $xlate="${dir}../perlasm/x86_64-xlate.pl" and -f $xlate) or > +die "can't locate x86_64-xlate.pl"; > + > +open STDOUT,"| $^X $xlate $output"; > + > +$code=".text\n"; > + > +$state="%xmm0"; > +$state1="%xmm0"; > +$key="%xmm1"; > +$in="%xmm2"; > +$in1="%xmm2"; > +$iv="%xmm3"; > +$state2="%xmm4"; > +$state3="%xmm5"; > +$state4="%xmm6"; > +$in2="%xmm7"; > +$in3="%xmm8"; > +$in4="%xmm9"; > + > +$inp="%r11"; > +$len="%rdx"; > +$outp="%r10"; > +$keyp="%r9"; > +$ivp="%r8"; > +$rnds="%esi"; > +$t1="%rdi"; > +$t1d="%edi"; > +$tkeyp=$t1; > +$t2="%rcx"; > +$t3="%rax"; > + > +$code.=<<___; > +.type _key_expansion_128,\...@abi-omnipotent > +.align 16 > +_key_expansion_128: > +_key_expansion_256a: > + pshufd \$0b11111111, %xmm1, %xmm1 > + shufps \$0b00010000, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + shufps \$0b10001100, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + pxor %xmm1, %xmm0 > + movaps %xmm0, (%rcx) > + add \$0x10, %rcx > + ret > +.size _key_expansion_128, . - _key_expansion_128 > +___ > + > +$code.=<<___; > +.type _key_expansion_192a,\...@abi-omnipotent > +.align 16 > +_key_expansion_192a: > + pshufd \$0b01010101, %xmm1, %xmm1 > + shufps \$0b00010000, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + shufps \$0b10001100, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + pxor %xmm1, %xmm0 > + > + movaps %xmm2, %xmm5 > + movaps %xmm2, %xmm6 > + pslldq \$4, %xmm5 > + pshufd \$0b11111111, %xmm0, %xmm3 > + pxor %xmm3, %xmm2 > + pxor %xmm5, %xmm2 > + > + movaps %xmm0, %xmm1 > + shufps \$0b01000100, %xmm0, %xmm6 > + movaps %xmm6, (%rcx) > + shufps \$0b01001110, %xmm2, %xmm1 > + movaps %xmm1, 0x10(%rcx) > + add \$0x20, %rcx > + ret > +.size _key_expansion_192a, . - _key_expansion_192a > +___ > + > +$code.=<<___; > +.type _key_expansion_192b,\...@abi-omnipotent > +.align 16 > +_key_expansion_192b: > + pshufd \$0b01010101, %xmm1, %xmm1 > + shufps \$0b00010000, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + shufps \$0b10001100, %xmm0, %xmm4 > + pxor %xmm4, %xmm0 > + pxor %xmm1, %xmm0 > + > + movaps %xmm2, %xmm5 > + pslldq \$4, %xmm5 > + pshufd \$0b11111111, %xmm0, %xmm3 > + pxor %xmm3, %xmm2 > + pxor %xmm5, %xmm2 > + > + movaps %xmm0, (%rcx) > + add \$0x10, %rcx > + ret > +.size _key_expansion_192b, . - _key_expansion_192b > +___ > + > +$code.=<<___; > +.type _key_expansion_256b,\...@abi-omnipotent > +.align 16 > +_key_expansion_256b: > + pshufd \$0b10101010, %xmm1, %xmm1 > + shufps \$0b00010000, %xmm2, %xmm4 > + pxor %xmm4, %xmm2 > + shufps \$0b10001100, %xmm2, %xmm4 > + pxor %xmm4, %xmm2 > + pxor %xmm1, %xmm2 > + movaps %xmm2, (%rcx) > + add \$0x10, %rcx > + ret > +.size _key_expansion_256b, . - _key_expansion_256b > +___ > + > +# int aesni_set_encrypt_key(const unsigned char *userKey, const int bits, > +# AES_KEY *key) > +$code.=<<___; > +.globl aesni_set_encrypt_key > +.type aesni_set_encrypt_key,\...@function,3 > +.align 16 > +aesni_set_encrypt_key: > + call _aesni_set_encrypt_key > + ret > +.size aesni_set_encrypt_key, . - aesni_set_encrypt_key > + > +.type _aesni_set_encrypt_key,\...@abi-omnipotent > +.align 16 > +_aesni_set_encrypt_key: > + test %rdi, %rdi > + jz .Lenc_key_invalid_param > + test %rdx, %rdx > + jz .Lenc_key_invalid_param > + movups (%rdi), %xmm0 # user key (first 16 bytes) > + movaps %xmm0, (%rdx) > + lea 0x10(%rdx), %rcx # key addr > + pxor %xmm4, %xmm4 # xmm4 is assumed 0 in _key_expansion_x > + cmp \$256, %esi > + jnz .Lenc_key192 > + mov \$14, %esi > + movl %esi, 240(%rdx) # 14 rounds for 256 > + movups 0x10(%rdi), %xmm2 # other user key > + movaps %xmm2, (%rcx) > + add \$0x10, %rcx > + # aeskeygenassist \$0x1, %xmm2, %xmm1 # round 1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01 > + call _key_expansion_256a > + # aeskeygenassist \$0x1, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01 > + call _key_expansion_256b > + # aeskeygenassist \$0x2, %xmm2, %xmm1 # round 2 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02 > + call _key_expansion_256a > + # aeskeygenassist \$0x2, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02 > + call _key_expansion_256b > + # aeskeygenassist \$0x4, %xmm2, %xmm1 # round 3 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04 > + call _key_expansion_256a > + # aeskeygenassist \$0x4, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04 > + call _key_expansion_256b > + # aeskeygenassist \$0x8, %xmm2, %xmm1 # round 4 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08 > + call _key_expansion_256a > + # aeskeygenassist \$0x8, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08 > + call _key_expansion_256b > + # aeskeygenassist \$0x10, %xmm2, %xmm1 # round 5 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10 > + call _key_expansion_256a > + # aeskeygenassist \$0x10, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10 > + call _key_expansion_256b > + # aeskeygenassist \$0x20, %xmm2, %xmm1 # round 6 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20 > + call _key_expansion_256a > + # aeskeygenassist \$0x20, %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20 > + call _key_expansion_256b > + # aeskeygenassist \$0x40, %xmm2, %xmm1 # round 7 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40 > + call _key_expansion_256a > + xor %rax, %rax > + ret > +.Lenc_key192: > + cmp \$192, %esi > + jnz .Lenc_key128 > + mov \$12, %esi > + movl %esi, 240(%rdx) # 12 rounds for 192 > + movq 0x10(%rdi), %xmm2 # other user key > + # aeskeygenassist \$0x1, %xmm2, %xmm1 # round 1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x01 > + call _key_expansion_192a > + # aeskeygenassist \$0x2, %xmm2, %xmm1 # round 2 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x02 > + call _key_expansion_192b > + # aeskeygenassist \$0x4, %xmm2, %xmm1 # round 3 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x04 > + call _key_expansion_192a > + # aeskeygenassist \$0x8, %xmm2, %xmm1 # round 4 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x08 > + call _key_expansion_192b > + # aeskeygenassist \$0x10, %xmm2, %xmm1 # round 5 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x10 > + call _key_expansion_192a > + # aeskeygenassist \$0x20, %xmm2, %xmm1 # round 6 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x20 > + call _key_expansion_192b > + # aeskeygenassist \$0x40, %xmm2, %xmm1 # round 7 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x40 > + call _key_expansion_192a > + # aeskeygenassist \$0x80, %xmm2, %xmm1 # round 8 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xca, 0x80 > + call _key_expansion_192b > + xor %rax, %rax > + ret > +.Lenc_key128: > + cmp \$128, %esi > + jnz .Lenc_key_invalid_key_bits > + mov \$10, %esi > + movl %esi, 240(%rdx) # 10 rounds for 128 > + # aeskeygenassist \$0x1, %xmm0, %xmm1 # round 1 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x01 > + call _key_expansion_128 > + # aeskeygenassist \$0x2, %xmm0, %xmm1 # round 2 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x02 > + call _key_expansion_128 > + # aeskeygenassist \$0x4, %xmm0, %xmm1 # round 3 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x04 > + call _key_expansion_128 > + # aeskeygenassist \$0x8, %xmm0, %xmm1 # round 4 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x08 > + call _key_expansion_128 > + # aeskeygenassist \$0x10, %xmm0, %xmm1 # round 5 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x10 > + call _key_expansion_128 > + # aeskeygenassist \$0x20, %xmm0, %xmm1 # round 6 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x20 > + call _key_expansion_128 > + # aeskeygenassist \$0x40, %xmm0, %xmm1 # round 7 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x40 > + call _key_expansion_128 > + # aeskeygenassist \$0x80, %xmm0, %xmm1 # round 8 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x80 > + call _key_expansion_128 > + # aeskeygenassist \$0x1b, %xmm0, %xmm1 # round 9 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x1b > + call _key_expansion_128 > + # aeskeygenassist \$0x36, %xmm0, %xmm1 # round 10 > + .byte 0x66, 0x0f, 0x3a, 0xdf, 0xc8, 0x36 > + call _key_expansion_128 > + xor %eax, %eax > + ret > +.Lenc_key_invalid_param: > + mov \$-1, %rax > + ret > +.Lenc_key_invalid_key_bits: > + mov \$-2, %rax > + ret > +.size _aesni_set_encrypt_key, . - _aesni_set_encrypt_key > +___ > + > + > +# int aesni_set_decrypt_key(const unsigned char *userKey, const int bits, > +# AES_KEY *key) > +$code.=<<___; > +.globl aesni_set_decrypt_key > +.type aesni_set_decrypt_key,\...@function,3 > +.align 16 > +aesni_set_decrypt_key: > + call _aesni_set_encrypt_key > + test %rax, %rax > + jnz .Ldec_key_exit > + lea 0x10(%rdx), %rcx > + shl \$4, %esi > + add %rdx, %rsi > + mov %rsi, %rdi > +.align 4 > +.Ldec_key_reorder_loop: > + movaps (%rdx), %xmm0 > + movaps (%rsi), %xmm1 > + movaps %xmm0, (%rsi) > + movaps %xmm1, (%rdx) > + lea 0x10(%rdx), %rdx > + lea -0x10(%rsi), %rsi > + cmp %rdx, %rsi > + ja .Ldec_key_reorder_loop > +.align 4 > +.Ldec_key_inv_loop: > + movaps (%rcx), %xmm0 > + # aesimc %xmm0, %xmm1 > + .byte 0x66, 0x0f, 0x38, 0xdb, 0xc8 > + movaps %xmm1, (%rcx) > + lea 0x10(%rcx), %rcx > + cmp %rdi, %rcx > + jnz .Ldec_key_inv_loop > +.Ldec_key_exit: > + ret > +.size aesni_set_encrypt_key, . - aesni_set_encrypt_key > +___ > + > +# void aesni_encrypt (const void *inp,void *out,const AES_KEY *key); > +$code.=<<___; > +.globl aesni_encrypt > +.type aesni_encrypt,\...@function,3 > +.align 16 > +aesni_encrypt: > + mov %rdi, $inp > + mov %rsi, $outp > + mov %rdx, $keyp > + mov 240($keyp), $rnds # round count > + movups ($inp), $state # input > + call _aesni_encrypt1 > + movups $state, ($outp) # output > + ret > +.size aesni_encrypt, . - aesni_encrypt > +___ > + > +# _aesni_encrypt1: internal ABI > +# input: > +# $keyp: key struct pointer > +# $rnds: round count > +# $state: initial state (input) > +# output: > +# $state: finial state (output) > +# changed: > +# $key > +# $tkeyp ($t1) > +$code.=<<___; > +.type _aesni_encrypt1,\...@abi-omnipotent > +.align 16 > +_aesni_encrypt1: > + movaps ($keyp), $key # key > + mov $keyp, $tkeyp > + pxor $key, $state # round 0 > + lea 0x30($tkeyp), $tkeyp > + cmp \$12, $rnds > + jb .Lenc128 > + lea 0x20($tkeyp), $tkeyp > + je .Lenc192 > + lea 0x20($tkeyp), $tkeyp > + movaps -0x60($tkeyp), $key > + aesenc $key, $state > + movaps -0x50($tkeyp), $key > + aesenc $key, $state > +.align 4 > +.Lenc192: > + movaps -0x40($tkeyp), $key > + aesenc $key, $state > + movaps -0x30($tkeyp), $key > + aesenc $key, $state > +.align 4 > +.Lenc128: > + movaps -0x20($tkeyp), $key > + aesenc $key, $state > + movaps -0x10($tkeyp), $key > + aesenc $key, $state > + movaps ($tkeyp), $key > + aesenc $key, $state > + movaps 0x10($tkeyp), $key > + aesenc $key, $state > + movaps 0x20($tkeyp), $key > + aesenc $key, $state > + movaps 0x30($tkeyp), $key > + aesenc $key, $state > + movaps 0x40($tkeyp), $key > + aesenc $key, $state > + movaps 0x50($tkeyp), $key > + aesenc $key, $state > + movaps 0x60($tkeyp), $key > + aesenc $key, $state > + movaps 0x70($tkeyp), $key > + aesenclast $key, $state # last round > + ret > +.size _aesni_encrypt1, . - _aesni_encrypt1 > +___ > + > +# _aesni_encrypt4: internal ABI > +# input: > +# $keyp: key struct pointer > +# $rnds: round count > +# $state1: initial state (input) > +# $state2 > +# $state3 > +# $state4 > +# output: > +# $state1: finial state (output) > +# $state2 > +# $state3 > +# $state4 > +# changed: > +# $key > +# $tkeyp ($t1) > +$code.=<<___; > +.type _aesni_encrypt4,\...@abi-omnipotent > +.align 16 > +_aesni_encrypt4: > + movaps ($keyp), $key # key > + mov $keyp, $tkeyp > + pxor $key, $state1 # round 0 > + pxor $key, $state2 > + pxor $key, $state3 > + pxor $key, $state4 > + lea 0x30($tkeyp), $tkeyp > + cmp \$12, $rnds > + jb .L4enc128 > + lea 0x20($tkeyp), $tkeyp > + je .L4enc192 > + lea 0x20($tkeyp), $tkeyp > + movaps -0x60($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps -0x50($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > +.align 4 > +.L4enc192: > + movaps -0x40($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps -0x30($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > +.align 4 > +.L4enc128: > + movaps -0x20($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps -0x10($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps ($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x10($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x20($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x30($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x40($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x50($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x60($tkeyp), $key > + aesenc $key, $state1 > + aesenc $key, $state2 > + aesenc $key, $state3 > + aesenc $key, $state4 > + movaps 0x70($tkeyp), $key > + aesenclast $key, $state1 # last round > + aesenclast $key, $state2 > + aesenclast $key, $state3 > + aesenclast $key, $state4 > + ret > +.size _aesni_encrypt4, . - _aesni_encrypt4 > +___ > + > +# void aesni_decrypt (const void *inp,void *out,const AES_KEY *key); > +$code.=<<___; > +.globl aesni_decrypt > +.type aesni_decrypt,\...@function,3 > +.align 16 > +aesni_decrypt: > + mov %rdi, $inp > + mov %rsi, $outp > + mov %rdx, $keyp > + mov 240($keyp), $rnds # round count > + movups ($inp), $state # input > + call _aesni_decrypt1 > + movups $state, ($outp) #output > + ret > +.size aesni_encrypt, . - aesni_encrypt > +___ > + > +# _aesni_decrypt1: internal ABI > +# input: > +# $keyp: key struct pointer > +# $rnds: round count > +# $state: initial state (input) > +# output: > +# $state: finial state (output) > +# changed: > +# $key > +# $tkeyp ($t1) > +$code.=<<___; > +.type _aesni_decrypt1,\...@abi-omnipotent > +.align 16 > +_aesni_decrypt1: > + movaps ($keyp), $key # key > + mov $keyp, $tkeyp > + pxor $key, $state # round 0 > + lea 0x30($tkeyp), $tkeyp > + cmp \$12, $rnds > + jb .Ldec128 > + lea 0x20($tkeyp), $tkeyp > + je .Ldec192 > + lea 0x20($tkeyp), $tkeyp > + movaps -0x60($tkeyp), $key > + aesdec $key, $state > + movaps -0x50($tkeyp), $key > + aesdec $key, $state > +.align 4 > +.Ldec192: > + movaps -0x40($tkeyp), $key > + aesdec $key, $state > + movaps -0x30($tkeyp), $key > + aesdec $key, $state > +.align 4 > +.Ldec128: > + movaps -0x20($tkeyp), $key > + aesdec $key, $state > + movaps -0x10($tkeyp), $key > + aesdec $key, $state > + movaps ($tkeyp), $key > + aesdec $key, $state > + movaps 0x10($tkeyp), $key > + aesdec $key, $state > + movaps 0x20($tkeyp), $key > + aesdec $key, $state > + movaps 0x30($tkeyp), $key > + aesdec $key, $state > + movaps 0x40($tkeyp), $key > + aesdec $key, $state > + movaps 0x50($tkeyp), $key > + aesdec $key, $state > + movaps 0x60($tkeyp), $key > + aesdec $key, $state > + movaps 0x70($tkeyp), $key > + aesdeclast $key, $state # last round > + ret > +.size _aesni_decrypt1, . - _aesni_decrypt1 > +___ > + > +# _aesni_decrypt4: internal ABI > +# input: > +# $keyp: key struct pointer > +# $rnds: round count > +# $state1: initial state (input) > +# $state2 > +# $state3 > +# $state4 > +# output: > +# $state1: finial state (output) > +# $state2 > +# $state3 > +# $state4 > +# changed: > +# $key > +# $tkeyp ($t1) > +$code.=<<___; > +.type _aesni_decrypt4,\...@abi-omnipotent > +.align 16 > +_aesni_decrypt4: > + movaps ($keyp), $key # key > + mov $keyp, $tkeyp > + pxor $key, $state1 # round 0 > + pxor $key, $state2 > + pxor $key, $state3 > + pxor $key, $state4 > + lea 0x30($tkeyp), $tkeyp > + cmp \$12, $rnds > + jb .L4dec128 > + lea 0x20($tkeyp), $tkeyp > + je .L4dec192 > + lea 0x20($tkeyp), $tkeyp > + movaps -0x60($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps -0x50($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > +.align 4 > +.L4dec192: > + movaps -0x40($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps -0x30($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > +.align 4 > +.L4dec128: > + movaps -0x20($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps -0x10($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps ($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x10($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x20($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x30($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x40($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x50($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x60($tkeyp), $key > + aesdec $key, $state1 > + aesdec $key, $state2 > + aesdec $key, $state3 > + aesdec $key, $state4 > + movaps 0x70($tkeyp), $key > + aesdeclast $key, $state1 # last round > + aesdeclast $key, $state2 > + aesdeclast $key, $state3 > + aesdeclast $key, $state4 > + ret > +.size _aesni_decrypt4, . - _aesni_decrypt4 > +___ > + > +# void aesni_ecb_encrypt(const unsigned char *in, unsigned char *out, > +# size_t length, const AES_KEY *key, > +# const int enc); > +$code.=<<___; > +.globl aesni_ecb_encrypt > +.type aesni_ecb_encrypt,\...@function,5 > +.align 16 > +aesni_ecb_encrypt: > + test $len, $len # check length > + jz .Lecb_just_ret > + mov %rdi, $inp > + mov %rsi, $outp > + mov %r8d, $t1d # clear upper half of enc > + mov %rcx, $keyp > + mov 240($keyp), $rnds > + test $t1, $t1 > + jz .Lecb_decrypt > +#--------------------------- ENCRYPT ------------------------------# > + cmp \$16, $len > + jb .Lecb_just_ret > + cmp \$64, $len > + jb .Lecb_enc_loop1 > +.align 4 > +.Lecb_enc_loop4: > + movups ($inp), $state1 > + movups 0x10($inp), $state2 > + movups 0x20($inp), $state3 > + movups 0x30($inp), $state4 > + call _aesni_encrypt4 > + movups $state1, ($outp) > + movups $state2, 0x10($outp) > + movups $state3, 0x20($outp) > + movups $state4, 0x30($outp) > + sub \$64, $len > + add \$64, $inp > + add \$64, $outp > + cmp \$64, $len > + jge .Lecb_enc_loop4 > + cmp \$16, $len > + jb .Lecb_just_ret > +.align 4 > +.Lecb_enc_loop1: > + movups ($inp), $state1 > + call _aesni_encrypt1 > + movups $state1, ($outp) > + sub \$16, $len > + add \$16, $inp > + add \$16, $outp > + cmp \$16, $len > + jge .Lecb_enc_loop1 > + jmp .Lecb_just_ret > +#--------------------------- DECRYPT ------------------------------# > +.Lecb_decrypt: > + cmp \$16, $len > + jb .Lecb_just_ret > + cmp \$64, $len > + jb .Lecb_dec_loop1 > +.align 4 > +.Lecb_dec_loop4: > + movups ($inp), $state1 > + movups 0x10($inp), $state2 > + movups 0x20($inp), $state3 > + movups 0x30($inp), $state4 > + call _aesni_decrypt4 > + movups $state1, ($outp) > + movups $state2, 0x10($outp) > + movups $state3, 0x20($outp) > + movups $state4, 0x30($outp) > + sub \$64, $len > + add \$64, $inp > + add \$64, $outp > + cmp \$64, $len > + jge .Lecb_dec_loop4 > + cmp \$16, $len > + jb .Lecb_just_ret > +.align 4 > +.Lecb_dec_loop1: > + movups ($inp), $state1 > + call _aesni_decrypt1 > + movups $state1, ($outp) > + sub \$16, $len > + add \$16, $inp > + add \$16, $outp > + cmp \$16, $len > + jge .Lecb_dec_loop1 > +.Lecb_just_ret: > + ret > +.size aesni_ecb_encrypt, . - aesni_ecb_encrypt > +___ > + > +# void aesni_cbc_encrypt (const void char *inp, unsigned char *out, > +# size_t length, const AES_KEY *key, > +# unsigned char *ivp,const int enc); > +$code.=<<___; > +.globl aesni_cbc_encrypt > +.type aesni_cbc_encrypt,\...@function,6 > +.align 16 > +aesni_cbc_encrypt: > + test $len, $len # check length > + jz .Lcbc_just_ret > + mov %rdi, $inp > + mov %rsi, $outp > + mov %r9d, $t1d # clear upper half of enc > + mov %rcx, $keyp > + mov 240($keyp), $rnds > + test $t1, $t1 > + jz .Lcbc_decrypt > +#--------------------------- ENCRYPT ------------------------------# > + movups ($ivp), $state # load iv as initial state > + cmp \$16, $len > + jb .Lcbc_enc_tail > +.align 4 > +.Lcbc_enc_loop: > + movups ($inp), $in # load input > + pxor $in, $state > + call _aesni_encrypt1 > + movups $state, ($outp) # store output > + sub \$16, $len > + add \$16, $inp > + add \$16, $outp > + cmp \$16, $len > + jge .Lcbc_enc_loop > + test \$0xf, $len > + jnz .Lcbc_enc_tail > + movups $state, ($ivp) > + jmp .Lcbc_just_ret > +.Lcbc_enc_tail: > + mov $len, %rcx > + mov $inp, %rsi > + mov $outp, %rdi > + .long 0x9066A4F3 # rep movsb > + mov 240($keyp), $rnds # restore $rnds (%esi) > + mov \$16, %rcx # zero tail > + sub $len, %rcx > + xor %rax, %rax > + .long 0x9066AAF3 # rep stosb > + mov $outp, $inp # this is not a mistake! > + movq \$16, $len # len=16 > + jmp .Lcbc_enc_loop # one more spin > +#--------------------------- DECRYPT ------------------------------# > +.Lcbc_decrypt: > + movups ($ivp), $iv > + cmp \$16, $len > + jb .Lcbc_dec_tail > + cmp \$64, $len > + jb .Lcbc_dec_loop1 > +.align 4 > +.Lcbc_dec_loop4: > + movups ($inp), $in1 > + movaps $in1, $state1 > + movups 0x10($inp), $in2 > + movaps $in2, $state2 > + movups 0x20($inp), $in3 > + movaps $in3, $state3 > + movups 0x30($inp), $in4 > + movaps $in4, $state4 > + call _aesni_decrypt4 > + pxor $iv, $state1 > + pxor $in1, $state2 > + pxor $in2, $state3 > + pxor $in3, $state4 > + movaps $in4, $iv > + movups $state1, ($outp) > + movups $state2, 0x10($outp) > + movups $state3, 0x20($outp) > + movups $state4, 0x30($outp) > + sub \$64, $len > + add \$64, $inp > + add \$64, $outp > + cmp \$64, $len > + jge .Lcbc_dec_loop4 > + cmp \$0, $len > + jz .Lcbc_dec_ret > + cmp \$16, $len > + jb .Lcbc_dec_tail > +.align 4 > +.Lcbc_dec_loop1: > + movups ($inp), $in > + movaps $in, $state > + call _aesni_decrypt1 > + pxor $iv, $state > + movups $state, ($outp) > + movaps $in, $iv > + sub \$16, $len > + add \$16, $inp > + add \$16, $outp > + cmp \$16, $len > + jge .Lcbc_dec_loop1 > + test \$0xf, $len > + jz .Lcbc_dec_ret > +.Lcbc_dec_tail: > + movups ($inp), $in > + movaps $in, $state > + call _aesni_decrypt1 > + pxor $iv, $state > + movaps $in, $iv > + sub \$16, %rsp # alloc temporary space > + movups $state, (%rsp) > + mov $outp, %rdi > + mov %rsp, %rsi > + mov $len, %rcx > + .long 0x9066A4F3 # rep movsb > + mov %rsp, %rdi # clear stack > + mov \$16, %rcx > + xor %rax, %rax > + .long 0x9066AAF3 # rep stosb > + add \$16, %rsp > +.Lcbc_dec_ret: > + movups $iv, ($ivp) > +.Lcbc_just_ret: > + ret > +.size aesni_cbc_encrypt, . - aesni_cbc_encrypt > +___ > + > +$code.=<<___; > + .long 0x80808080, 0x80808080, 0xfefefefe, 0xfefefefe > + .long 0x1b1b1b1b, 0x1b1b1b1b, 0, 0 > +.asciz "AES for Intel AESNI, CRYPTOGAMS by <ying.hua...@intel.com>" > +.align 64 > +___ > + > +$code =~ s/\`([^\`]*)\`/eval($1)/gem; > + > +print $code; > + > +close STDOUT; > --- a/crypto/engine/Makefile > +++ b/crypto/engine/Makefile > @@ -11,6 +11,8 @@ MAKEFILE= Makefile > AR= ar r > > CFLAGS= $(INCLUDES) $(CFLAG) > +ASFLAGS= $(INCLUDES) $(ASFLAG) > +AFLAGS= $(ASFLAGS) > > GENERAL=Makefile > TEST= enginetest.c > @@ -21,12 +23,14 @@ LIBSRC= eng_err.c eng_lib.c eng_list.c e > eng_table.c eng_pkey.c eng_fat.c eng_all.c \ > tb_rsa.c tb_dsa.c tb_ecdsa.c tb_dh.c tb_ecdh.c tb_rand.c tb_store.c \ > tb_cipher.c tb_digest.c tb_pkmeth.c tb_asnmth.c \ > - eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c > + eng_openssl.c eng_cnf.c eng_dyn.c eng_cryptodev.c \ > + eng_aesni.c eng_aesni_asm.pl > LIBOBJ= eng_err.o eng_lib.o eng_list.o eng_init.o eng_ctrl.o \ > eng_table.o eng_pkey.o eng_fat.o eng_all.o \ > tb_rsa.o tb_dsa.o tb_ecdsa.o tb_dh.o tb_ecdh.o tb_rand.o tb_store.o \ > tb_cipher.o tb_digest.o tb_pkmeth.o tb_asnmth.o \ > - eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o > + eng_openssl.o eng_cnf.o eng_dyn.o eng_cryptodev.o \ > + eng_aesni.o eng_aesni_asm.o > > SRC= $(LIBSRC) > > @@ -45,6 +49,9 @@ lib: $(LIBOBJ) > $(RANLIB) $(LIB) || echo Never mind. > @touch lib > > +eng_aesni_asm.s: eng_aesni_asm.pl > + $(PERL) eng_aesni_asm.pl $(PERLASM_SCHEME) > $@ > + > files: > $(PERL) $(TOP)/util/files.pl Makefile >> $(TOP)/MINFO > > --- a/crypto/engine/engine.h > +++ b/crypto/engine/engine.h > @@ -346,6 +346,7 @@ void ENGINE_load_gost(void); > #endif > #endif > void ENGINE_load_cryptodev(void); > +void ENGINE_load_aesni(void); > void ENGINE_load_builtin_engines(void); > > /* Get and set global flags (ENGINE_TABLE_FLAG_***) for the implementation >
signature.asc
Description: This is a digitally signed message part