Only perform the NEON yield check for every 8 blocks of input, to
prevent taking a considerable performance hit on cores with very
fast crypto instructions and comparatively slow memory accesses,
such as the Cortex-A53.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/crypto/aes-ce-ccm-core.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S 
b/arch/arm64/crypto/aes-ce-ccm-core.S
index 88f5aef7934c..627710cdc220 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -208,6 +208,9 @@ CPU_LE(     rev     x26, x26                )       /* keep 
swabbed ctr in reg */
        st1     {v1.16b}, [x19], #16            /* write output block */
        beq     5f
 
+       tst     w21, #(0x7 * 16)                /* yield every 8 blocks */
+       b.ne    0b
+
        if_will_cond_yield_neon
        st1     {v0.16b}, [x24]                 /* store mac */
        do_cond_yield_neon
-- 
2.11.0

Reply via email to