Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization
On Tue, Jun 10, 2014 at 10:34:47PM +0200, Mathias Krause wrote: > On 10 June 2014 18:22, chandramouli narayanan wrote: > > Patch is > Reviewed-by: Mathias Krause Patch applied. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization
On Tue, 2014-06-10 at 22:34 +0200, Mathias Krause wrote: > On 10 June 2014 18:22, chandramouli narayanan wrote: > > This patch introduces "by8" AES CTR mode AVX optimization inspired by > > Intel Optimized IPSEC Cryptograhpic library. For additional information, > > please see: > > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972 > > > > The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and > > aes_ctr_enc_256_avx_by8() are adapted from > > Intel Optimized IPSEC Cryptographic library. When both AES and AVX features > > are enabled in a platform, the glue code in AESNI module overrieds the > > existing "by4" CTR mode en/decryption with the "by8" > > AES CTR mode en/decryption. > > > > On a Haswell desktop, with turbo disabled and all cpus running > > at maximum frequency, the "by8" CTR mode optimization > > shows better performance results across data & key sizes > > as measured by tcrypt. > > > > The average performance improvement of the "by8" version over the "by4" > > version is as follows: > > > > For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement. > > For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement. > > For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement. > > > > A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8" > > optimization shows the following results: > > > > tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop: > > --- > > > > testing speed of __ctr-aes-aesni encryption > > test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes) > > test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes) > > test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes) > > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 > > bytes) > > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 > > bytes) > > test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes) > > test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes) > > test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes) > > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 > > bytes) > > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 > > bytes) > > test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes) > > test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes) > > test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 > > bytes) > > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 > > bytes) > > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 > > bytes) > > > > testing speed of __ctr-aes-aesni decryption > > test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes) > > test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes) > > test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes) > > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 > > bytes) > > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 > > bytes) > > test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes) > > test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes) > > test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes) > > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 > > bytes) > > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 > > bytes) > > test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes) > > test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes) > > test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 > > bytes) > > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 > > bytes) > > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 > > bytes) > > > > tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop: > > --- > > > > testing speed of __ctr-aes-aesni encryption > > test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes) > > test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes) > > test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes) > > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 > > bytes) > > test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 > > bytes) > > test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes) > > test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64
Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization
On 10 June 2014 18:22, chandramouli narayanan wrote: > This patch introduces "by8" AES CTR mode AVX optimization inspired by > Intel Optimized IPSEC Cryptograhpic library. For additional information, > please see: > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972 > > The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and > aes_ctr_enc_256_avx_by8() are adapted from > Intel Optimized IPSEC Cryptographic library. When both AES and AVX features > are enabled in a platform, the glue code in AESNI module overrieds the > existing "by4" CTR mode en/decryption with the "by8" > AES CTR mode en/decryption. > > On a Haswell desktop, with turbo disabled and all cpus running > at maximum frequency, the "by8" CTR mode optimization > shows better performance results across data & key sizes > as measured by tcrypt. > > The average performance improvement of the "by8" version over the "by4" > version is as follows: > > For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement. > For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement. > For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement. > > A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8" > optimization shows the following results: > > tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop: > --- > > testing speed of __ctr-aes-aesni encryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes) > test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes) > test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes) > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 > bytes) > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 > bytes) > test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes) > test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes) > test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes) > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 > bytes) > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 > bytes) > test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes) > test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes) > test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes) > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 > bytes) > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 > bytes) > > testing speed of __ctr-aes-aesni decryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes) > test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes) > test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes) > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 > bytes) > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 > bytes) > test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes) > test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes) > test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes) > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 > bytes) > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 > bytes) > test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes) > test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes) > test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes) > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 > bytes) > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 > bytes) > > tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop: > --- > > testing speed of __ctr-aes-aesni encryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes) > test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes) > test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes) > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 > bytes) > test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 > bytes) > test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes) > test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes) > test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes) > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 > bytes) > test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192
[PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization
This patch introduces "by8" AES CTR mode AVX optimization inspired by Intel Optimized IPSEC Cryptograhpic library. For additional information, please see: http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972 The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and aes_ctr_enc_256_avx_by8() are adapted from Intel Optimized IPSEC Cryptographic library. When both AES and AVX features are enabled in a platform, the glue code in AESNI module overrieds the existing "by4" CTR mode en/decryption with the "by8" AES CTR mode en/decryption. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, the "by8" CTR mode optimization shows better performance results across data & key sizes as measured by tcrypt. The average performance improvement of the "by8" version over the "by4" version is as follows: For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement. For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement. For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement. A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8" optimization shows the following results: tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop: --- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes) tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop: --- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes) test 13