Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-20 Thread Herbert Xu
On Tue, Jun 10, 2014 at 10:34:47PM +0200, Mathias Krause wrote:
> On 10 June 2014 18:22, chandramouli narayanan  wrote:
>
> Patch is
> Reviewed-by: Mathias Krause 

Patch applied.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-10 Thread chandramouli narayanan
On Tue, 2014-06-10 at 22:34 +0200, Mathias Krause wrote:
> On 10 June 2014 18:22, chandramouli narayanan  wrote:
> > This patch introduces "by8" AES CTR mode AVX optimization inspired by
> > Intel Optimized IPSEC Cryptograhpic library. For additional information,
> > please see:
> > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972
> >
> > The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
> > aes_ctr_enc_256_avx_by8() are adapted from
> > Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
> > are enabled in a platform, the glue code in AESNI module overrieds the
> > existing "by4" CTR mode en/decryption with the "by8"
> > AES CTR mode en/decryption.
> >
> > On a Haswell desktop, with turbo disabled and all cpus running
> > at maximum frequency, the "by8" CTR mode optimization
> > shows better performance results across data & key sizes
> > as measured by tcrypt.
> >
> > The average performance improvement of the "by8" version over the "by4"
> > version is as follows:
> >
> > For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
> > For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
> > For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.
> >
> > A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
> > optimization shows the following results:
> >
> > tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
> > ---
> >
> > testing speed of __ctr-aes-aesni encryption
> > test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
> > test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
> > test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
> > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 
> > bytes)
> > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 
> > bytes)
> > test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
> > test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
> > test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
> > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 
> > bytes)
> > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 
> > bytes)
> > test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
> > test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
> > test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 
> > bytes)
> > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 
> > bytes)
> > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 
> > bytes)
> >
> > testing speed of __ctr-aes-aesni decryption
> > test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
> > test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
> > test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
> > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 
> > bytes)
> > test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 
> > bytes)
> > test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
> > test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
> > test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
> > test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 
> > bytes)
> > test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 
> > bytes)
> > test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
> > test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
> > test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 
> > bytes)
> > test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 
> > bytes)
> > test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 
> > bytes)
> >
> > tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
> > ---
> >
> > testing speed of __ctr-aes-aesni encryption
> > test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
> > test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
> > test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
> > test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 
> > bytes)
> > test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 
> > bytes)
> > test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
> > test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 

Re: [PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-10 Thread Mathias Krause
On 10 June 2014 18:22, chandramouli narayanan  wrote:
> This patch introduces "by8" AES CTR mode AVX optimization inspired by
> Intel Optimized IPSEC Cryptograhpic library. For additional information,
> please see:
> http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972
>
> The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
> aes_ctr_enc_256_avx_by8() are adapted from
> Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
> are enabled in a platform, the glue code in AESNI module overrieds the
> existing "by4" CTR mode en/decryption with the "by8"
> AES CTR mode en/decryption.
>
> On a Haswell desktop, with turbo disabled and all cpus running
> at maximum frequency, the "by8" CTR mode optimization
> shows better performance results across data & key sizes
> as measured by tcrypt.
>
> The average performance improvement of the "by8" version over the "by4"
> version is as follows:
>
> For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
> For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
> For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.
>
> A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
> optimization shows the following results:
>
> tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
> ---
>
> testing speed of __ctr-aes-aesni encryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 
> bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 
> bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 
> bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 
> bytes)
>
> testing speed of __ctr-aes-aesni decryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 
> bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 
> bytes)
> test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
> test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
> test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
> test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 
> bytes)
> test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 
> bytes)
>
> tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
> ---
>
> testing speed of __ctr-aes-aesni encryption
> test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
> test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
> test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
> test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 
> bytes)
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 
> bytes)
> test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
> test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
> test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
> test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 
> bytes)
> test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 

[PATCH v5 1/1] crypto: AES CTR x86_64 "by8" AVX optimization

2014-06-10 Thread chandramouli narayanan
This patch introduces "by8" AES CTR mode AVX optimization inspired by
Intel Optimized IPSEC Cryptograhpic library. For additional information,
please see:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972

The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
aes_ctr_enc_256_avx_by8() are adapted from
Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
are enabled in a platform, the glue code in AESNI module overrieds the
existing "by4" CTR mode en/decryption with the "by8"
AES CTR mode en/decryption.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, the "by8" CTR mode optimization
shows better performance results across data & key sizes
as measured by tcrypt.

The average performance improvement of the "by8" version over the "by4"
version is as follows:

For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.

A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
optimization shows the following results:

tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
---

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 
bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 
bytes)

tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
---

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes)
test 13