Re: [PATCH v2 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-17 Thread Herbert Xu
On Thu, Jul 16, 2015 at 07:13:58PM +0200, Martin Willi wrote:
> This patch series adds both ChaCha20 and Poly1305 specific ciphers for
> x86_64 using SSE2/SSSE3 and AVX2 instructions. The idea is to have a drop-in
> replacement for AESNI/CLMUL-accelerated AES-GCM providing at least somewhat
> comparable performance, refer to RFC7539 for details. It is based on 
> cryptodev,
> including the ChaCha20/Poly1305 AEAD interface conversion patch.
> 
> The first patch adds some speed tests to tcrypt. The second patch exports
> some functionality from chacha20-generic to use it as fallback. Patch 3
> adds a single block SSSE3 driver for ChaCha20, while patch 4 and 5 extend it
> by an optimized four block SSSE3 and an eight block AVX2 variant. Patch 6
> adds an additional test vector for ChaCha20 to actually test the AVX2 eight
> block variant processing 512-bytes at once.
> 
> Patch 7 exports some poly1305-generic functionality to use it as fallback.
> Patch 8 introduces a single block SSE2 driver for Poly1305, while patch 9
> and 10 add an optimized two block SSE2 and a four block AVX2 variant.
> 
> Overall speedup for the ChaCha20/Poly1305 AEAD for typical IPsec payloads
> is ~50-150% with SSE2/SSSE3 and ~100-200% with AVX2, or even more for larger
> payloads:

All applied.  Thanks Martin!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-16 Thread Martin Willi
This patch series adds both ChaCha20 and Poly1305 specific ciphers for
x86_64 using SSE2/SSSE3 and AVX2 instructions. The idea is to have a drop-in
replacement for AESNI/CLMUL-accelerated AES-GCM providing at least somewhat
comparable performance, refer to RFC7539 for details. It is based on cryptodev,
including the ChaCha20/Poly1305 AEAD interface conversion patch.

The first patch adds some speed tests to tcrypt. The second patch exports
some functionality from chacha20-generic to use it as fallback. Patch 3
adds a single block SSSE3 driver for ChaCha20, while patch 4 and 5 extend it
by an optimized four block SSSE3 and an eight block AVX2 variant. Patch 6
adds an additional test vector for ChaCha20 to actually test the AVX2 eight
block variant processing 512-bytes at once.

Patch 7 exports some poly1305-generic functionality to use it as fallback.
Patch 8 introduces a single block SSE2 driver for Poly1305, while patch 9
and 10 add an optimized two block SSE2 and a four block AVX2 variant.

Overall speedup for the ChaCha20/Poly1305 AEAD for typical IPsec payloads
is ~50-150% with SSE2/SSSE3 and ~100-200% with AVX2, or even more for larger
payloads:

generic:
testing speed of rfc7539esp(chacha20,poly1305) 
(rfc7539esp(chacha20-generic,poly1305-generic)) encryption
test 0 (288 bit key, 16 byte blocks): 10456041 operations in 10 seconds 
(167296656 bytes)
test 1 (288 bit key, 64 byte blocks): 411 operations in 10 seconds 
(639962304 bytes)
test 2 (288 bit key, 256 byte blocks): 5793012 operations in 10 seconds 
(1483011072 bytes)
test 3 (288 bit key, 512 byte blocks): 3743676 operations in 10 seconds 
(1916762112 bytes)
test 4 (288 bit key, 1024 byte blocks): 2190023 operations in 10 seconds 
(2242583552 bytes)
test 5 (288 bit key, 2048 byte blocks): 1195864 operations in 10 seconds 
(2449129472 bytes)
test 6 (288 bit key, 4096 byte blocks): 627625 operations in 10 seconds 
(2570752000 bytes)
test 7 (288 bit key, 8192 byte blocks): 319844 operations in 10 seconds 
(2620162048 bytes)

SSE2/SSSE3:
testing speed of rfc7539esp(chacha20,poly1305) 
(rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 10077910 operations in 10 seconds 
(161246560 bytes)
test 1 (288 bit key, 64 byte blocks): 9990400 operations in 10 seconds 
(639385600 bytes)
test 2 (288 bit key, 256 byte blocks): 7953774 operations in 10 seconds 
(2036166144 bytes)
test 3 (288 bit key, 512 byte blocks): 6351059 operations in 10 seconds 
(3251742208 bytes)
test 4 (288 bit key, 1024 byte blocks): 4593059 operations in 10 seconds 
(4703292416 bytes)
test 5 (288 bit key, 2048 byte blocks): 2956300 operations in 10 seconds 
(6054502400 bytes)
test 6 (288 bit key, 4096 byte blocks): 1724958 operations in 10 seconds 
(7065427968 bytes)
test 7 (288 bit key, 8192 byte blocks): 925156 operations in 10 seconds 
(7578877952 bytes)

AVX2:
testing speed of rfc7539esp(chacha20,poly1305) 
(rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 10006774 operations in 10 seconds 
(160108384 bytes)
test 1 (288 bit key, 64 byte blocks): 9896498 operations in 10 seconds 
(633375872 bytes)
test 2 (288 bit key, 256 byte blocks): 7922198 operations in 10 seconds 
(2028082688 bytes)
test 3 (288 bit key, 512 byte blocks): 7261666 operations in 10 seconds 
(3717972992 bytes)
test 4 (288 bit key, 1024 byte blocks): 5835006 operations in 10 seconds 
(5975046144 bytes)
test 5 (288 bit key, 2048 byte blocks): 4172937 operations in 10 seconds 
(8546174976 bytes)
test 6 (288 bit key, 4096 byte blocks): 2670484 operations in 10 seconds 
(10938302464 bytes)
test 7 (288 bit key, 8192 byte blocks): 1504684 operations in 10 seconds 
(12326371328 bytes)

All benchmark results from a Core i5-4670T.

The ChaCha20/Poly1305 AEAD on Haswell with AVX2 has about half the raw
AESNI/CLMUL-accelerated AES-GCM (rfc4106-gcm-aesni) performance for typical
IPsec MTUs. On Ivy Bridge using SSE2/SSSE3 the numbers compared to AES-GCM
are very similar due to the less efficient CLMUL instructions.

Changes in v2:
- No code changes
- Use sec=10 for more reliable benchmark results

Martin Willi (10):
  crypto: tcrypt - Add ChaCha20/Poly1305 speed tests
  crypto: chacha20 - Export common ChaCha20 helpers
  crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64
  crypto: chacha20 - Add a four block SSSE3 variant for x86_64
  crypto: chacha20 - Add an eight block AVX2 variant for x86_64
  crypto: testmgr - Add a longer ChaCha20 test vector
  crypto: poly1305 - Export common Poly1305 helpers
  crypto: poly1305 - Add a SSE2 SIMD variant for x86_64
  crypto: poly1305 - Add a two block SSE2 variant for x86_64
  crypto: poly1305 - Add a four block AVX2 variant for x86_64

 arch/x86/crypto/Makefile|   6 +
 arch/x86/crypto/chacha20-avx2-x86_64.S  | 443 ++
 arch/x86/crypto/chacha20-ssse3-x86_64.S | 625 
 arch/x86/crypto/chacha20_glue.c | 150 
 arch/x86/crypto/