[PATCH 1/3] crypto: aesni - fix counter overflow handling in "by8" variant
The "by8" CTR AVX implementation fails to propperly handle counter overflows. That was the reason it got disabled in commit 7da4b29d496b ("crypto: aesni - disable "by8" AVX CTR optimization"). Fix the overflow handling by incrementing the counter block as a double quad word, i.e. a 128 bit, and testing for overflows afterwards. We need to use VPTEST to do so as VPADD* does not set the flags itself and silently drops the carry bit. As this change adds branches to the hot path, minor performance regressions might be a side effect. But, OTOH, we now have a conforming implementation -- the preferable goal. A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical numbers for the old and this version with differences within the noise range. A dm-crypt test with the fixed version gave even slightly better results for this version. So the performance impact might not be as big as expected. Tested-by: Romain Francoise Signed-off-by: Mathias Krause Cc: Chandramouli Narayanan --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index f091f122ed24..a029bc744244 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -108,6 +108,10 @@ byteswap_const: .octa 0x000102030405060708090A0B0C0D0E0F +ddq_low_msk: + .octa 0x +ddq_high_add_1: + .octa 0x0001 ddq_add_1: .octa 0x0001 ddq_add_2: @@ -169,7 +173,12 @@ ddq_add_8: .rept (by - 1) club DDQ_DATA, i club XDATA, i - vpaddd var_ddq_add(%rip), xcounter, var_xdata + vpaddq var_ddq_add(%rip), xcounter, var_xdata + vptest ddq_low_msk(%rip), var_xdata + jnz 1f + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: vpshufb xbyteswap, var_xdata, var_xdata .set i, (i +1) .endr @@ -178,7 +187,11 @@ ddq_add_8: vpxor xkey0, xdata0, xdata0 club DDQ_DATA, by - vpaddd var_ddq_add(%rip), xcounter, xcounter + vpaddq var_ddq_add(%rip), xcounter, xcounter + vptest ddq_low_msk(%rip), xcounter + jnz 1f + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: .set i, 1 .rept (by - 1) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Revert "crypto: aesni - disable "by8" AVX CTR optimization"
This reverts commit 7da4b29d496b1389d3a29b55d3668efecaa08ebd. Now, that the issue is fixed, we can re-enable the code. Signed-off-by: Mathias Krause Cc: Chandramouli Narayanan --- arch/x86/crypto/aesni-intel_glue.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index a7ccd57f19e4..888950f29fd9 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx, crypto_inc(ctrblk, AES_BLOCK_SIZE); } -#if 0 /* temporary disabled due to failing crypto tests */ +#ifdef CONFIG_AS_AVX static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv) { @@ -1522,7 +1522,7 @@ static int __init aesni_init(void) aesni_gcm_dec_tfm = aesni_gcm_dec; } aesni_ctr_enc_tfm = aesni_ctr_enc; -#if 0 /* temporary disabled due to failing crypto tests */ +#ifdef CONFIG_AS_AVX if (cpu_has_avx) { /* optimize performance of ctr mode encryption transform */ aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] crypto: aesni - remove unused defines in "by8" variant
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're probably left overs from merging the three source files for 128, 192 and 256 bit AES. They can safely be removed. Signed-off-by: Mathias Krause Cc: Chandramouli Narayanan --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index a029bc744244..2df2a0298f5a 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -79,9 +79,6 @@ #define xcounter %xmm8 #define xbyteswap %xmm9 #define xkey0 %xmm10 -#define xkey3 %xmm11 -#define xkey6 %xmm12 -#define xkey9 %xmm13 #define xkey4 %xmm11 #define xkey8 %xmm12 #define xkey12 %xmm13 -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] crypto: aesni - fix and re-enable "by8" CTR variant
This series fixes the counter overflow handling of the "by8" CTR variant which lead to failing cryptomgr tests and, in turn, disabling this optimization with commit 7da4b29d496b. Patch 1 fixes the bug, patch 2 removes some unused defines (left overs from the unification of the initial source files) and patch 3 re-enables the code. The fix was tested by me, doing tcrypt and dm-crypt tests. It was also tested by Romain who initially reported the issue. The patches should go on top of crypto-2.6.git. In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed to stable to propagate the fix. Please apply! Thanks, Mathias Mathias Krause (3): crypto: aesni - fix counter overflow handling in "by8" variant crypto: aesni - remove unused defines in "by8" variant Revert "crypto: aesni - disable "by8" AVX CTR optimization" arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 20 +++- arch/x86/crypto/aesni-intel_glue.c |4 ++-- 2 files changed, 17 insertions(+), 7 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html