[PATCH 01/10] crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode

2017-01-17 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 25 +--- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index 8d0cdaa2768d..2ada12dd768e 100644 --- a/arch/arm64/crypto

[PATCH 02/10] crypto: arm/aes-ce - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aes-ce-core.S | 84 ++-- arch/arm

[PATCH 03/10] crypto: arm/chacha20 - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1 deletion

[PATCH 00/10] crypto - AES for ARM/arm64 updates for v4.11 (round #2)

2017-01-17 Thread Ard Biesheuvel
end cores such as the Cortex-A53 that can be found in the Raspberry Pi3 Ard Biesheuvel (10): crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode crypto: arm/aes-ce - remove cra_alignmask crypto: arm/chacha20 - remove cra_alignmask crypto: arm64/aes-ce-ccm - remove cra_ali

[PATCH] crypto: arm/aes-neonbs - fix issue with v2.22 and older assembler

2017-01-19 Thread Ard Biesheuvel
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7' Fix this by setting the element size explicitly, by replacin

[PATCH v2 03/10] crypto: arm/chacha20 - remove cra_alignmask

2017-01-23 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1 deletion

[PATCH v2 00/10] crypto - AES for ARM/arm64 updates for v4.11 (round #2)

2017-01-23 Thread Ard Biesheuvel
end cores such as the Cortex-A53 that can be found in the Raspberry Pi3 Changes since v1: - shave off another few cycles from the sequential AES NEON code (patch #9) Ard Biesheuvel (10): crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode crypto: arm/aes-ce - remove cra_alignmas

[PATCH v2 01/10] crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode

2017-01-23 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 25 +--- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index 8d0cdaa2768d..2ada12dd768e 100644 --- a/arch/arm64/crypto

[PATCH v2 09/10] crypto: arm64/aes-neon-blk - tweak performance for low end cores

2017-01-23 Thread Ard Biesheuvel
constants from memory in every round. To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 2 + arch/arm64/crypto/aes-neon.S | 210

[PATCH v2 10/10] crypto: arm64/aes - replace scalar fallback with plain NEON fallback

2017-01-23 Thread Ard Biesheuvel
sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 2 +- arch/arm64/crypto/aes-neonbs-glue.c | 38 ++-- 2 files changed, 29 insertions(+), 11 deletions(-) diff

[PATCH v2 08/10] crypto: arm64/aes - performance tweak

2017-01-23 Thread Ard Biesheuvel
Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 52 +++- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/arch/arm64/crypto/aes-cipher

[PATCH v2 04/10] crypto: arm64/aes-ce-ccm - remove cra_alignmask

2017-01-23 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 1 - 1 file changed, 1 deletion

[PATCH v2 02/10] crypto: arm/aes-ce - remove cra_alignmask

2017-01-23 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aes-ce-core.S | 84 ++-- arch/arm

[PATCH v2 05/10] crypto: arm64/aes-blk - remove cra_alignmask

2017-01-23 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 16 ++-- arch/arm64/crypto

[PATCH v2 06/10] crypto: arm64/chacha20 - remove cra_alignmask

2017-01-23 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1

[PATCH v2 07/10] crypto: arm64/aes - avoid literals for cross-module symbol references

2017-01-23 Thread Ard Biesheuvel
KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/crypto/aes-cip

[PATCH 0/4] crypto: time invariant AES for CCM (and GCM/CTR)

2017-01-26 Thread Ard Biesheuvel
each block. It is 50% slower than generic AES, but this may be acceptable in many cases. Ard Biesheuvel (4): crypto: testmgr - add test cases for cbcmac(aes) crypto: ccm - switch to separate cbcmac driver crypto: arm64/aes - add NEON and Crypto Extension CBC-MAC driver crypto: aes - add

[PATCH 1/4] crypto: testmgr - add test cases for cbcmac(aes)

2017-01-26 Thread Ard Biesheuvel
In preparation of splitting off the CBC-MAC transform in the CCM driver into a separate algorithm, define some test cases for the AES incarnation of cbcmac. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 7 +++ crypto/testmgr.h | 58 2 files changed, 65 insertions

[PATCH 3/4] crypto: arm64/aes - add NEON and Crypto Extension CBC-MAC driver

2017-01-26 Thread Ard Biesheuvel
code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 102 arch/arm64

[RFC PATCH 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-01-26 Thread Ard Biesheuvel
d CBC-MAC. AES decryption can easily be implemented in a similar way, but is significantly more costly. This code runs at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Signed-off-by: Ard Biesheuvel --- crypto/Kconfig

[PATCH 2/4] crypto: ccm - switch to separate cbcmac driver

2017-01-26 Thread Ard Biesheuvel
alternative CBC-MAC implementations that don't suffer from performance degradation due to significant setup time (e.g., the NEON based AES code needs to load the entire S-box into SIMD registers, which cannot be amortized over the entire input when using the AES cipher directly) Signed-off-by

Re: [RFC PATCH 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-01-26 Thread Ard Biesheuvel
nificantly more costly. >> >> This code runs at ~25 cycles per byte on ARM Cortex-A57 (while the >> ordinary generic AES driver manages 18 cycles per byte on this >> hardware). >> >> Signed-off-by: Ard Biesheuvel >> --- >> crypto/Kconfig | 14 + &g

Re: [PATCH 2/4] crypto: ccm - switch to separate cbcmac driver

2017-01-27 Thread Ard Biesheuvel
On 26 January 2017 at 17:17, Ard Biesheuvel wrote: > Update the generic CCM driver to defer CBC-MAC processing to a > dedicated CBC-MAC ahash transform rather than open coding this > transform (and much of the associated scatterwalk plumbing) in > the CCM driver itself. > >

Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-01-27 Thread Ard Biesheuvel
Hi Mathias, On 27 January 2017 at 10:40, Matthias Brugger wrote: > Older compilers may not be able to detect the crc32 extended cpu type. What do you mean 'detect'? Could you describe the failure in more detail please? > Anyway only inline assembler code is used, which gets passed to the > asse

[PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-01-28 Thread Ard Biesheuvel
her instead. Signed-off-by: Ard Biesheuvel --- crypto/ccm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/crypto/ccm.c b/crypto/ccm.c index b388ac6edfb9..8976ef9bc2e7 100644 --- a/crypto/ccm.c +++ b/crypto/ccm.c @@ -362,7 +362,7 @@ static int crypto_ccm_decrypt(str

[PATCH v3 02/10] crypto: arm/aes-ce - remove cra_alignmask

2017-01-28 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aes-ce-core.S | 84 ++-- arch/arm

[PATCH v3 03/10] crypto: arm/chacha20 - remove cra_alignmask

2017-01-28 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1 deletion

[PATCH v3 01/10] crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode

2017-01-28 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 25 +--- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index 8d0cdaa2768d..2ada12dd768e 100644 --- a/arch/arm64/crypto

[PATCH v3 00/10] crypto - AES for ARM/arm64 updates for v4.11 (round #2)

2017-01-28 Thread Ard Biesheuvel
code (patch #9) Ard Biesheuvel (10): crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode crypto: arm/aes-ce - remove cra_alignmask crypto: arm/chacha20 - remove cra_alignmask crypto: arm64/aes-ce-ccm - remove cra_alignmask crypto: arm64/aes-blk - remove cra_alignmask

[PATCH v3 10/10] crypto: arm64/aes - replace scalar fallback with plain NEON fallback

2017-01-28 Thread Ard Biesheuvel
sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 2 +- arch/arm64/crypto/aes-neonbs-glue.c | 38 ++-- 2 files changed, 29 insertions(+), 11 deletions(-) diff

[PATCH v3 05/10] crypto: arm64/aes-blk - remove cra_alignmask

2017-01-28 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 16 ++-- arch/arm64/crypto

[PATCH v3 08/10] crypto: arm64/aes - performance tweak

2017-01-28 Thread Ard Biesheuvel
Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 52 +++- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/arch/arm64/crypto/aes-cipher

[PATCH v3 06/10] crypto: arm64/chacha20 - remove cra_alignmask

2017-01-28 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 1 - 1 file changed, 1

[PATCH v3 04/10] crypto: arm64/aes-ce-ccm - remove cra_alignmask

2017-01-28 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 1 - 1 file changed, 1 deletion

[PATCH v3 09/10] crypto: arm64/aes-neon-blk - tweak performance for low end cores

2017-01-28 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 2 + arch/arm64/crypto/aes-neon.S | 235 +--- 2 files changed, 102 insertions(+), 135 deletions(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 8ee1fb7aaa4f..055bc3f61138 100644

[PATCH v3 07/10] crypto: arm64/aes - avoid literals for cross-module symbol references

2017-01-28 Thread Ard Biesheuvel
KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-cipher-core.S | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/crypto/aes-cip

[PATCH v2 0/4] crypto: time invariant AES for CCM (and GCM/CTR)

2017-01-28 Thread Ard Biesheuvel
Ard Biesheuvel (4): crypto: testmgr - add test cases for cbcmac(aes) crypto: ccm - switch to separate cbcmac driver crypto: arm64/aes - add NEON and Crypto Extension CBC-MAC driver crypto: aes - add generic time invariant AES for CTR/CCM/GCM arch/arm64/crypto/aes-glue.c | 107 ++ arch

[PATCH v2 1/4] crypto: testmgr - add test cases for cbcmac(aes)

2017-01-28 Thread Ard Biesheuvel
In preparation of splitting off the CBC-MAC transform in the CCM driver into a separate algorithm, define some test cases for the AES incarnation of cbcmac. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 7 +++ crypto/testmgr.h | 58 2 files changed, 65 insertions

[PATCH v2 3/4] crypto: arm64/aes - add NEON and Crypto Extension CBC-MAC driver

2017-01-28 Thread Ard Biesheuvel
code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 107 arch/arm64

[PATCH v2 2/4] crypto: ccm - switch to separate cbcmac driver

2017-01-28 Thread Ard Biesheuvel
alternative CBC-MAC implementations that don't suffer from performance degradation due to significant setup time (e.g., the NEON based AES code needs to load the entire S-box into SIMD registers, which cannot be amortized over the entire input when using the AES cipher directly) Signed-off-by

[RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-01-28 Thread Ard Biesheuvel
d CBC-MAC. AES decryption can easily be implemented in a similar way, but is significantly more costly. This code runs at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Signed-off-by: Ard Biesheuvel --- crypto/Kconfig

[RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-01-30 Thread Ard Biesheuvel
eliminated by the compiler if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. Signed-off-by: Ard Biesheuvel --- crypto/algapi.c | 102 crypto/cbc.c| 3 - crypto/cmac.c | 3 +- crypto/ctr.c| 2 +- crypto/cts.c| 3 - crypto/pcbc.c | 3 - crypto/seqiv.c | 2 - 7

Re: [PATCH v2 0/4] crypto: time invariant AES for CCM (and GCM/CTR)

2017-01-31 Thread Ard Biesheuvel
On 28 January 2017 at 23:33, Ard Biesheuvel wrote: > This series is primarily directed at improving the performance and security > of CCM on the Rasperry Pi 3. This involves splitting the MAC handling of > CCM into a separate driver so that we can efficiently replace it by something >

Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 27 January 2017 at 10:52, Will Deacon wrote: > On Fri, Jan 27, 2017 at 10:43:16AM +0000, Ard Biesheuvel wrote: >> On 27 January 2017 at 10:40, Matthias Brugger wrote: >> > Older compilers may not be able to detect the crc32 extended cpu type. >> >> What d

Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 1 February 2017 at 09:07, Ard Biesheuvel wrote: > On 27 January 2017 at 10:52, Will Deacon wrote: >> On Fri, Jan 27, 2017 at 10:43:16AM +0000, Ard Biesheuvel wrote: >>> On 27 January 2017 at 10:40, Matthias Brugger wrote: >>> > Older compilers may not be able to

Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 1 February 2017 at 13:58, Alexander Graf wrote: > On 02/01/2017 10:43 AM, Ard Biesheuvel wrote: >> >> On 1 February 2017 at 09:07, Ard Biesheuvel >> wrote: >>> >>> On 27 January 2017 at 10:52, Will Deacon wrote: >>>> >>>>

[PATCH] crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers

2017-02-01 Thread Ard Biesheuvel
makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel --- This is a meaningful patch by

Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-02-01 Thread Ard Biesheuvel
On 28 January 2017 at 20:40, Ard Biesheuvel wrote: > The skcipher API mandates that chaining modes involving IVs calculate > an outgoing IV value that is suitable for encrypting additional blocks > of data. This means the CCM driver cannot assume that req->iv points to > the or

Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 07:38, Eric Biggers wrote: > Hi Ard, > > On Sat, Jan 28, 2017 at 11:33:33PM +0000, Ard Biesheuvel wrote: >> >> Note that this only implements AES encryption, which is all we need >> for CTR and CBC-MAC. AES decryption can easily be implemented

Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 06:47, Eric Biggers wrote: > On Mon, Jan 30, 2017 at 02:11:29PM +0000, Ard Biesheuvel wrote: >> Instead of unconditionally forcing 4 byte alignment for all generic >> chaining modes that rely on crypto_xor() or crypto_inc() (which may >> result in unnece

Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 07:48, Ard Biesheuvel wrote: > On 2 February 2017 at 07:38, Eric Biggers wrote: >> Hi Ard, >> >> On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote: >>> >>> Note that this only implements AES encryption, which is a

Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-02-02 Thread Ard Biesheuvel
On 2 February 2017 at 05:13, Herbert Xu wrote: > On Wed, Feb 01, 2017 at 08:08:09PM +0000, Ard Biesheuvel wrote: >> >> Could you please forward this patch to Linus as well? I noticed that the >> patch > > Sure, I will do that. > >> crypto: arm64/aes-blk - hono

[PATCH 1/2] crypto: arm64/aes - don't use IV buffer to return final keystream block

2017-02-02 Thread Ard Biesheuvel
, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel --- Note that this patch includes the fix crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode which I sent

[PATCH 2/2] crypto: arm/aes - don't use IV buffer to return final keystream block

2017-02-02 Thread Ard Biesheuvel
may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aes-neonbs-core.S | 16 +--- arch/arm/crypto/aes-neonbs-glue.c | 9 + 2 files

Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-02-02 Thread Ard Biesheuvel
On 2 February 2017 at 09:53, Herbert Xu wrote: > On Thu, Feb 02, 2017 at 08:01:47AM +0000, Ard Biesheuvel wrote: >> >> You are right: due to its construction, the CCM mode does not care >> about the incremented counter because it clears the counter part of >> the IV be

[PATCH v2] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-02 Thread Ard Biesheuvel
compiler when HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. Signed-off-by: Ard Biesheuvel --- I have greatly simplified the code, but it should still emit an optimal sequence of loads and stores depending on the misalignment. crypto/algapi.c | 65 +++- crypto/cbc.c| 3 - crypto

[PATCH] crypto: generic/aes - drop alignment requirement

2017-02-02 Thread Ard Biesheuvel
align mask, and fix the code to use get_unaligned_le32() where appropriate, which will resolve to whatever is optimal for the architecture. Signed-off-by: Ard Biesheuvel --- crypto/aes_generic.c | 64 ++-- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/c

[PATCH v3] crypto: aes - add generic time invariant AES cipher

2017-02-02 Thread Ard Biesheuvel
data dependent latencies. This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Decryption is substantially slower. Signed-off-by: Ard Biesheuvel --- Sending this out as a separate patch since the CCM/CBCMAC s

Re: [PATCH v2 2/4] crypto: ccm - switch to separate cbcmac driver

2017-02-02 Thread Ard Biesheuvel
On 28 January 2017 at 23:33, Ard Biesheuvel wrote: > Update the generic CCM driver to defer CBC-MAC processing to a > dedicated CBC-MAC ahash transform rather than open coding this > transform (and much of the associated scatterwalk plumbing) in > the CCM driver itself. > >

[PATCH v3 2/3] crypto: ccm - switch to separate cbcmac driver

2017-02-03 Thread Ard Biesheuvel
rface) Signed-off-by: Ard Biesheuvel --- crypto/Kconfig | 1 + crypto/ccm.c | 381 +--- 2 files changed, 245 insertions(+), 137 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 160f08e721cc..e8269d1b0282 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -263,6 +

[PATCH v3 0/3] crypto: time invariant AES for CCM (and CMAC/XCBC)

2017-02-03 Thread Ard Biesheuvel
with zero cryptlen (#2) - use correctly sized dg[] array in desc ctx (#3, #4) - fix bug in update routine (#3) - various other tweaks Ard Biesheuvel (3): crypto: testmgr - add test cases for cbcmac(aes) crypto: ccm - switch to separate cbcmac driver crypto: arm64/aes - add NEON/Crypto

[PATCH v3 1/3] crypto: testmgr - add test cases for cbcmac(aes)

2017-02-03 Thread Ard Biesheuvel
In preparation of splitting off the CBC-MAC transform in the CCM driver into a separate algorithm, define some test cases for the AES incarnation of cbcmac. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.c | 7 +++ crypto/testmgr.h | 60 2 files changed, 67 insertions

[PATCH v3 3/3] crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driver

2017-02-03 Thread Ard Biesheuvel
are implemented, expose CMAC and XCBC algorithms as well based on the same core update code. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 240 +++- arch/arm64/crypto/aes-modes.S | 29 ++- 2 files changed, 267 insertions(+), 2 deletions(-) diff --git a/arch

Re: [PATCH v2] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-04 Thread Ard Biesheuvel
On 4 February 2017 at 21:20, Eric Biggers wrote: > Hi Ard, > > On Thu, Feb 02, 2017 at 03:56:28PM +0000, Ard Biesheuvel wrote: >> + const int size = sizeof(unsigned long); >> + int delta = ((unsigned long)dst ^ (unsigned long)src) & (size - 1); >> + int

[PATCH v3] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-05 Thread Ard Biesheuvel
compiler when HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. Signed-off-by: Ard Biesheuvel --- v3: fix thinko in processing of unaligned leading chunk inline common case where the input size is a constant multiple of the word size on architectures with h/w handling of unaligned accesses crypto

Re: [PATCH v3 2/3] crypto: ccm - switch to separate cbcmac driver

2017-02-06 Thread Ard Biesheuvel
On 3 February 2017 at 14:49, Ard Biesheuvel wrote: > Update the generic CCM driver to defer CBC-MAC processing to a > dedicated CBC-MAC ahash transform rather than open coding this > transform (and much of the associated scatterwalk plumbing) in > the CCM driver itself. > >

Re: [PATCH v3 0/3] crypto: time invariant AES for CCM (and CMAC/XCBC)

2017-02-11 Thread Ard Biesheuvel
On 11 February 2017 at 10:53, Herbert Xu wrote: > On Fri, Feb 03, 2017 at 02:49:34PM +0000, Ard Biesheuvel wrote: >> This series is primarily directed at improving the performance and security >> of CCM on the Rasperry Pi 3. This involves splitting the MAC handling of >>

[PATCH 1/2] crypto: ccm - honour alignmask of subordinate MAC cipher

2017-02-11 Thread Ard Biesheuvel
. Signed-off-by: Ard Biesheuvel --- crypto/ccm.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/crypto/ccm.c b/crypto/ccm.c index 52e307807ff6..24c26ab052ca 100644 --- a/crypto/ccm.c +++ b/crypto/ccm.c @@ -58,7 +58,6 @@ struct cbcmac_tfm_ctx { struct

[PATCH 2/2] crypto: ccm - drop unnecessary minimum 32-bit alignment

2017-02-11 Thread Ard Biesheuvel
The CCM driver forces 32-bit alignment even if the underlying ciphers don't care about alignment. This is because crypto_xor() used to require this, but since this is no longer the case, drop the hardcoded minimum of 32 bits. Signed-off-by: Ard Biesheuvel --- crypto/ccm.c | 3 +-- 1

Re: [PATCH v3] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-14 Thread Ard Biesheuvel
On 13 February 2017 at 21:55, Jason A. Donenfeld wrote: > On Sun, Feb 5, 2017 at 11:06 AM, Ard Biesheuvel > wrote: >> + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || >> + !((unsigned long)b & (__alignof__(*b) - 1))) > > Why not simply use t

[PATCH 1/2] crypto: arm/aes-neonbs - resolve fallback cipher at runtime

2017-02-14 Thread Ard Biesheuvel
er is guaranteed to be available when the builtin test is performed at registration time. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 2 +- arch/arm/crypto/aes-neonbs-glue.c | 65 ++- 2 files changed, 51 insertions(+), 16 deletions(-) diff --

[PATCH 2/2] crypto: algapi - annotate expected branch behavior in crypto_inc()

2017-02-14 Thread Ard Biesheuvel
ted as non-taken, resulting in optimal execution in the vast majority of cases. Also, replace the open coded alignment test with IS_ALIGNED(). Cc: Jason A. Donenfeld Signed-off-by: Ard Biesheuvel --- crypto/algapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cry

Re: [PATCH 1/2] crypto: arm/aes-neonbs - resolve fallback cipher at runtime

2017-02-14 Thread Ard Biesheuvel
On 14 February 2017 at 10:03, Ard Biesheuvel wrote: > Currently, the bit sliced NEON AES code for ARM has a link time > dependency on the scalar ARM asm implementation, which it uses as a > fallback to perform CBC encryption and the encryption of the initial > XTS tweak. > > T

[PATCH v2 1/2] crypto: arm/aes-neonbs - resolve fallback cipher at runtime

2017-02-14 Thread Ard Biesheuvel
er is guaranteed to be available when the builtin test is performed at registration time. Signed-off-by: Ard Biesheuvel --- v2: remove spurious change from aesbs_xts_setkey() arch/arm/crypto/Kconfig | 2 +- arch/arm/crypto/aes-neonbs-glue.c | 60 +++- 2 files changed, 46 inser

[PATCH v2 2/2] crypto: algapi - annotate expected branch behavior in crypto_inc()

2017-02-14 Thread Ard Biesheuvel
ted as non-taken, resulting in optimal execution in the vast majority of cases. Also, replace the open coded alignment test with IS_ALIGNED(). Cc: Jason A. Donenfeld Signed-off-by: Ard Biesheuvel --- v2: no change crypto/algapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --

Re: [PATCH] crypto: Fix next IV issue for CTS template

2017-02-16 Thread Ard Biesheuvel
Hello Libo, On 17 February 2017 at 03:47, wrote: > From: Libo Wang > > CTS template assumes underlying CBC algorithm will carry out next IV for > further process.But some implementations of CBC algorithm in kernel break > this assumption, for example, some hardware crypto drivers ignore next IV

Re: [PATCH] crypto: Fix next IV issue for CTS template

2017-02-17 Thread Ard Biesheuvel
> On 17 Feb 2017, at 09:17, Dennis Chen wrote: > > Hello Ard, > Morning! >> On Fri, Feb 17, 2017 at 07:12:46AM +0000, Ard Biesheuvel wrote: >> Hello Libo, >> >>> On 17 February 2017 at 03:47, wrote: >>> From: Libo Wang >>> >>

Re: [PATCH] crypto: Fix next IV issue for CTS template

2017-02-17 Thread Ard Biesheuvel
> On 17 Feb 2017, at 10:06, Dennis Chen wrote: > >> On Fri, Feb 17, 2017 at 09:23:00AM +, Ard Biesheuvel wrote: >> >>> On 17 Feb 2017, at 09:17, Dennis Chen wrote: >>> >>> Hello Ard, >>> Morning! >>>> On Fri, Feb 1

[PATCH] crypto: ccm - move cbcmac input off the stack

2017-02-27 Thread Ard Biesheuvel
onstraints when the stack is virtually mapped. So move idata/odata back to the request ctx struct, of which we can reasonably expect that it has been allocated using kmalloc() et al. Reported-by: Johannes Berg Fixes: f15f05b0a5de ("crypto: ccm - switch to separate cbcmac driver") Signed-of

[PATCH 1/2] crypto: arm/crc32 - fix build error with outdated binutils

2017-02-28 Thread Ard Biesheuvel
Annotate a vmov instruction with an explicit element size of 32 bits. This is inferred by recent toolchains, but apparently, older versions need some help figuring this out. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/crc32-ce-core.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[PATCH 2/2] crypto: arm - add build time test for CRC instruction support

2017-02-28 Thread Ard Biesheuvel
binutils exist that support the vmull.p64 instruction but not the crc32 instructions. So refactor the Makefile logic so that this module only gets built if binutils has support for both. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Makefile | 12 +++- 1 file changed, 11 insertions

Re: KASAN errors after 21c8e72037fb ("crypto: testmgr - use calculated count for number of test vectors")

2017-02-28 Thread Ard Biesheuvel
"\x7b\x72\x8a\xf7", > .rlen = 44, > }, > > > If I pad iv with extra NULL bytes the KASAN error goes away. > > Thoughts? > CCM IVs are 16 bytes, but due to the way they are constructed internally, the final couple of bytes of input IV are dont-cares. Apparently, we do read all 16 bytes, which triggers the KASAN errors. So adding any kind of padding bytes to pad to length 16 should fix the issue, and so your proposed fix is correct. Acked-by: Ard Biesheuvel

Re: [PATCH] crypto: scompress - eliminate percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
On 21 July 2017 at 13:42, Herbert Xu wrote: > On Thu, Jul 20, 2017 at 12:40:00PM +0100, Ard Biesheuvel wrote: >> The scompress code unconditionally allocates 2 per-CPU scratch buffers >> of 128 KB each, in order to avoid allocation overhead in the async >> wrapper

Re: [PATCH] crypto: scompress - eliminate percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
On 21 July 2017 at 14:11, Herbert Xu wrote: > On Fri, Jul 21, 2017 at 02:09:39PM +0100, Ard Biesheuvel wrote: >> >> Right. And is req->dst guaranteed to be assigned in that case? Because >> crypto_scomp_sg_alloc() happily allocates pages and kmalloc()s the >> s

Re: [PATCH] crypto: scompress - eliminate percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
On 21 July 2017 at 14:24, Ard Biesheuvel wrote: > On 21 July 2017 at 14:11, Herbert Xu wrote: >> On Fri, Jul 21, 2017 at 02:09:39PM +0100, Ard Biesheuvel wrote: >>> >>> Right. And is req->dst guaranteed to be assigned in that case? Because >>> crypto_scomp

Re: [PATCH] crypto: scompress - eliminate percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
On 21 July 2017 at 14:31, Herbert Xu wrote: > On Fri, Jul 21, 2017 at 02:24:02PM +0100, Ard Biesheuvel wrote: >> >> OK, but that doesn't really answer any of my questions: >> - Should we enforce that CRYPTO_ACOMP_ALLOC_OUTPUT is mutually >> exclusive with CRY

Re: [PATCH] crypto: scompress - eliminate percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
On 21 July 2017 at 14:44, Herbert Xu wrote: > On Fri, Jul 21, 2017 at 02:42:20PM +0100, Ard Biesheuvel wrote: >> >> >> - Would you mind a patch that makes the code only use the per-CPU >> >> buffers if we are running atomically to begin with? >> > >>

[PATCH v2 1/3] crypto: scompress - don't sleep with preemption disabled

2017-07-21 Thread Ard Biesheuvel
Due to the use of per-CPU buffers, scomp_acomp_comp_decomp() executes with preemption disabled, and so whether the CRYPTO_TFM_REQ_MAY_SLEEP flag is set is irrelevant, since we cannot sleep anyway. So disregard the flag, and use GFP_ATOMIC unconditionally. Cc: # v4.10+ Signed-off-by: Ard

[PATCH v2 3/3] crypto: scompress - defer allocation of scratch buffer to first use

2017-07-21 Thread Ard Biesheuvel
n. Signed-off-by: Ard Biesheuvel --- crypto/scompress.c | 46 1 file changed, 17 insertions(+), 29 deletions(-) diff --git a/crypto/scompress.c b/crypto/scompress.c index 2c07648305ad..2075e2c4e7df 100644 --- a/crypto/scompress.c +++ b/crypto/scompress.c @@ -65,11 +65,6 @@ s

[PATCH v2 2/3] crypto: scompress - free partially allocated scratch buffers on failure

2017-07-21 Thread Ard Biesheuvel
When allocating the per-CPU scratch buffers, we allocate the source and destination buffers separately, but bail immediately if the second allocation fails, without freeing the first one. Fix that. Signed-off-by: Ard Biesheuvel --- crypto/scompress.c | 5 - 1 file changed, 4 insertions

[PATCH v2 0/3] crypto: scompress - defer allocation of percpu scratch buffers

2017-07-21 Thread Ard Biesheuvel
exit hooks, so that we only have the per-CPU buffers allocated if their are any acomp ciphers of the right kind (i.e, ones that encapsulate a synchronous implementation) in use (#3) Patches #1 and #2 are fixes for issues I spotted when working on this code. Ard Biesheuvel (3): crypto: scompre

Re: [PATCH v4 0/8] crypto: aes - retire table based generic AES

2017-07-24 Thread Ard Biesheuvel
On 18 July 2017 at 13:06, Ard Biesheuvel wrote: > The generic AES driver uses 16 lookup tables of 1 KB each, and has > encryption and decryption routines that are fully unrolled. Given how > the dependencies between this code and other drivers are declared in > Kconfig files, this co

[PATCH resend 13/18] crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR

2017-07-24 Thread Ard Biesheuvel
-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 1 + arch/arm64/crypto/aes-neonbs-glue.c | 48 ++-- 2 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index a068dcbe2518..f9e264b83366 100644 --- a

[PATCH resend 01/18] crypto/algapi - use separate dst and src operands for __crypto_xor()

2017-07-24 Thread Ard Biesheuvel
In preparation of introducing crypto_xor_cpy(), which will use separate operands for input and output, modify the __crypto_xor() implementation, which it will share with the existing crypto_xor(), which provides the actual functionality when not using the inline version. Signed-off-by: Ard

[PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14

2017-07-24 Thread Ard Biesheuvel
ter. This supersedes all other crypto patches I have outstanding, including the AES refactor ones which I will rework later. Ard Biesheuvel (18): crypto/algapi - use separate dst and src operands for __crypto_xor() crypto/algapi - make crypto_xor() take separate dst and src arguments cry

[PATCH resend 10/18] crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback

2017-07-24 Thread Ard Biesheuvel
The arm64 kernel will shortly disallow nested kernel mode NEON. So honour this in the ARMv8 Crypto Extensions implementation of CCM-AES, and fall back to a scalar implementation using the generic crypto helpers for AES, XOR and incrementing the CTR counter. Signed-off-by: Ard Biesheuvel

[PATCH resend 03/18] crypto: arm64/ghash-ce - add non-SIMD scalar fallback

2017-07-24 Thread Ard Biesheuvel
The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 3 +- arch/arm64/crypto/ghash-ce-glue.c | 49 2 files changed, 43

[PATCH resend 09/18] crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback

2017-07-24 Thread Ard Biesheuvel
The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar code that can be invoked in that case. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 1 + arch/arm64/crypto/aes-ce-cipher.c | 20 +--- 2 files changed, 18

[PATCH resend 06/18] crypto: arm64/sha1-ce - add non-SIMD generic fallback

2017-07-24 Thread Ard Biesheuvel
The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig| 3 ++- arch/arm64/crypto/sha1-ce-glue.c | 18 ++ 2 files changed, 16

[PATCH resend 04/18] crypto: arm64/crct10dif - add non-SIMD generic fallback

2017-07-24 Thread Ard Biesheuvel
The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crct10dif-ce-glue.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch

<    1   2   3   4   5   6   7   8   9   10   >