[PATCH v2 0/3] crypto: arm64/chacha - performance improvements

2018-12-04 Thread Ard Biesheuvel
(370278656 bytes) tcrypt: test 5 (256 bit key, 8192 byte blocks): 47650 operations in 1 seconds (390348800 bytes) Cc: Eric Biggers Cc: Martin Willi Ard Biesheuvel (3): crypto: tcrypt - add block size of 1472 to skcipher template crypto: arm64/chacha - optimize for arbitrary length inputs

[PATCH v2 1/3] crypto: tcrypt - add block size of 1472 to skcipher template

2018-12-04 Thread Ard Biesheuvel
In order to have better coverage of algorithms operating on block sizes that are in the ballpark of a VPN packet, add 1472 to the block_sizes array. Signed-off-by: Ard Biesheuvel --- crypto/tcrypt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/tcrypt.c b/crypto

[PATCH v2 3/3] crypto: arm64/chacha - use combined SIMD/ALU routine for more speed

2018-12-04 Thread Ard Biesheuvel
-by: Ard Biesheuvel --- arch/arm64/crypto/chacha-neon-core.S | 235 ++-- arch/arm64/crypto/chacha-neon-glue.c | 39 ++-- 2 files changed, 239 insertions(+), 35 deletions(-) diff --git a/arch/arm64/crypto/chacha-neon-core.S b/arch/arm64/crypto/chacha-neon-core.S index

[PATCH v2 2/3] crypto: arm64/chacha - optimize for arbitrary length inputs

2018-12-04 Thread Ard Biesheuvel
ple of 256 bytes (and thus in tcrypt benchmarks), performance drops by around 1% on Cortex-A57, while performance for inputs drawn randomly from the range [64, 1024) increases by around 30%. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha-neon-core.S | 183 ++-- arch/ar

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Ard Biesheuvel
On 9 November 2018 at 10:45, Herbert Xu wrote: > On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote: >> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote: >> > >> > This should be >> > >> > reqs

Re: .S_shipped unnecessary?

2018-11-08 Thread Ard Biesheuvel
(+ Masahiro, kbuild ml) On 8 November 2018 at 21:37, Jason A. Donenfeld wrote: > Hi Ard, Eric, and others, > > As promised, the next Zinc patchset will have less generated code! After a > bit of work with Andy and Samuel, I'll be bundling the perlasm. > Wonderful! Any problems doing that for

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Ard Biesheuvel
On 8 November 2018 at 23:55, Ard Biesheuvel wrote: > The simd wrapper's skcipher request context structure consists > of a single subrequest whose size is taken from the subordinate > skcipher. However, in simd_skcipher_init(), the reqsize that is > retrieved is not from the subordin

[PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Ard Biesheuvel
is completely unrelated to the actual wrapped skcipher. Reported-by: Qian Cai Signed-off-by: Ard Biesheuvel --- crypto/simd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/simd.c b/crypto/simd.c index ea7240be3001..2f3d6e897afc 100644 --- a/crypto/simd.c +++ b/crypto/simd.c

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-23 Thread Ard Biesheuvel
On 23 October 2018 at 04:01, James Bottomley wrote: > On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote: > [...] >> > +static void hmac_init(struct shash_desc *desc, u8 *key, int >> > keylen) >> > +{ >> > + u8 pad[SHA256_BLOCK_SIZE]; >>

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-22 Thread Ard Biesheuvel
Hi James, Some comments below on how you are using the crypto API. On 22 October 2018 at 04:36, James Bottomley wrote: > This code adds true session based HMAC authentication plus parameter > decryption and response encryption using AES. > > The basic design of this code is to segregate all the

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel
On 21 October 2018 at 11:00, James Bottomley wrote: > On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel > wrote: >>On 21 October 2018 at 10:07, James Bottomley >> wrote: >>> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: >>>> (+ James) >

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel
On 21 October 2018 at 10:07, James Bottomley wrote: > On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: >> (+ James) > > Thanks! > >> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov >> wrote: >> > crypto_cfb_decrypt_segment() incorrectly XOR'e

Re: [PATCH 2/2] crypto: testmgr: add AES-CFB tests

2018-10-21 Thread Ard Biesheuvel
(+ James) On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov wrote: > Add AES128/192/256-CFB testvectors from NIST SP800-38A. > > Signed-off-by: Dmitry Eremin-Solenikov > Cc: sta...@vger.kernel.org > Signed-off-by: Dmitry Eremin-Solenikov > --- > crypto/tcrypt.c | 5 >

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel
(+ James) On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov wrote: > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with > IV, rather than with data stream, resulting in incorrect decryption. > Test vectors will be added in the next patch. > > Signed-off-by: Dmitry

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Ard Biesheuvel
On 20 October 2018 at 04:39, Eric Biggers wrote: > On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote: >> On 19 October 2018 at 13:41, Ard Biesheuvel >> wrote: >> > On 18 October 2018 at 12:37, Eric Biggers wrote: >> >> From: Eric Biggers

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Ard Biesheuvel
On 19 October 2018 at 13:41, Ard Biesheuvel wrote: > On 18 October 2018 at 12:37, Eric Biggers wrote: >> From: Eric Biggers >> >> Make the ARM scalar AES implementation closer to constant-time by >> disabling interrupts and prefetching the tables into L1 cache. This

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-18 Thread Ard Biesheuvel
onstant-time AES > software. But it's valuable to make such attacks more difficult. > > Much of this patch is based on patches suggested by Ard Biesheuvel. > > Suggested-by: Ard Biesheuvel > Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel > --- > arch/ar

Re: [PATCH v2 1/2] crypto: aes_ti - disable interrupts while accessing S-box

2018-10-17 Thread Ard Biesheuvel
d in writing truly constant-time AES > software. But it's valuable to make such attacks more difficult. > > Signed-off-by: Eric Biggers Thanks for taking a look. Could we add something to the Kconfig blurb that mentions that it runs the algorithm witn interrupts disabled? In any

Re: [PATCH v2 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-17 Thread Ard Biesheuvel
ter these changes, the implementation still isn't > necessarily guaranteed to be constant-time; see > https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion > of the many difficulties involved in writing truly constant-time AES > software. But it's valuable to make such a

Re: [PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path

2018-10-09 Thread Ard Biesheuvel
On 9 October 2018 at 05:47, Eric Biggers wrote: > Hi Ard, > > On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote: >> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> because the ordinary load/store instructions (ldr, ldrh, ldrb) can >>

Re: [PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path

2018-10-09 Thread Ard Biesheuvel
On 9 October 2018 at 05:34, Eric Biggers wrote: > Hi Ard, > > On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote: >> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> because the ordinary load/store instructions (ldr, ldrh, ldrb) can >>

[PATCH 0/3] crypto: use unaligned accessors in aligned fast paths

2018-10-08 Thread Ard Biesheuvel
for architectures other than ARM that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, let's switch to them in a couple of places in the crypto code. Note that all patches are against code that has been observed to be emitted with ldm or ldrd instructions when building ARM's multi_v7_defconfig. Ard Biesheuvel (3

[PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path

2018-10-08 Thread Ard Biesheuvel
of misalignment, and generate code for ARMv6+ that avoids load/store instructions that trigger alignment faults. Signed-off-by: Ard Biesheuvel --- crypto/memneq.c | 24 ++-- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/crypto/memneq.c b/crypto/memneq.c index afed1bd16aee

[PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path

2018-10-08 Thread Ard Biesheuvel
of misalignment, and generate code for ARMv6+ that avoids load/store instructions that trigger alignment faults. Signed-off-by: Ard Biesheuvel --- crypto/algapi.c | 7 +++ include/crypto/algapi.h | 11 +-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/crypto/algapi.c

[PATCH 3/3] crypto: siphash - drop _aligned variants

2018-10-08 Thread Ard Biesheuvel
on the aligned code paths. Given the above, this either produces the same code, or better in the ARMv6+ case. However, since that removes the only difference between the aligned and unaligned variants, we can drop the aligned variant entirely. Signed-off-by: Ard Biesheuvel --- include/linux/siphash.h

[PATCH] crypto: arm64/aes-blk - ensure XTS mask is always loaded

2018-10-08 Thread Ard Biesheuvel
different buffers. So let's ensure that the first load occurs unconditionally, and move the reload to the end so it doesn't occur needlessly. Fixes: 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling") Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 8

Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal

2018-10-05 Thread Ard Biesheuvel
On 5 October 2018 at 19:13, Eric Biggers wrote: > From: Eric Biggers > > aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit() > to register/unregister the "fpu" template. But these functions don't > exist anymore, causing a build error. Remove the calls to them. > > Fixes:

Re: [PATCH] crypto: qat - move temp buffers off the stack

2018-10-05 Thread Ard Biesheuvel
On 5 October 2018 at 04:29, Herbert Xu wrote: > On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote: >> Arnd reports that with Kees's latest VLA patches applied, the HMAC >> handling in the QAT driver uses a worst case estimate of 160 bytes >> for the SH

Re: [PATCH] crypto: aes_ti - disable interrupts while accessing sbox

2018-10-04 Thread Ard Biesheuvel
Hi Eric, On 4 October 2018 at 06:07, Eric Biggers wrote: > From: Eric Biggers > > The generic constant-time AES implementation is supposed to preload the > AES S-box into the CPU's L1 data cache. But, an interrupt handler can > run on the CPU and muck with the cache. Worse, on preemptible

Re: [PATCH] crypto: arm64/aes - fix handling sub-block CTS-CBC inputs

2018-10-03 Thread Ard Biesheuvel
) to > indicate the minimum input size. > > Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode") > Signed-off-by: Eric Biggers Thanks Eric Reviewed-by: Ard Biesheuvel > --- > arch/arm64/crypto/aes-glue.c | 13 + > 1 file cha

[PATCH v2 1/2] crypto: morus/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel
: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations") Cc: # v4.18+ Reviewed-by: Ondrej Mosnacek Signed-off-by: Ard Biesheuvel --- crypto/morus1280.c | 7 ++- crypto/morus640.c | 16 2 files changed, 6 insertions(+), 17 deletions(-) diff --gi

[PATCH v2 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel
AEAD implementations") Cc: # v4.18+ Signed-off-by: Ard Biesheuvel --- crypto/aegis.h | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/crypto/aegis.h b/crypto/aegis.h index f1c6900ddb80..405e025fc906 100644 --- a/crypto/aegis.h +++ b/crypto/aegis.h @@ -

[PATCH v2 0/2] crypto - fix aegis/morus for big endian systems

2018-10-01 Thread Ard Biesheuvel
Some bug fixes for issues that I stumbled upon while working on other stuff. Changes since v1: - add Ondrej's ack to #1 - simplify #2 and drop unrelated performance tweak Ard Biesheuvel (2): crypto: morus/generic - fix for big endian systems crypto: aegis/generic - fix for big endian systems

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel
On 1 October 2018 at 10:00, Ondrej Mosnacek wrote: > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel > wrote: >> On 30 September 2018 at 10:58, Ard Biesheuvel >> wrote: >> > Use the correct __le32 annotation and accessors to perform the >> > single roun

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel
On 1 October 2018 at 09:50, Ondrej Mosnacek wrote: > Hi Ard, > > On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel > wrote: >> Use the correct __le32 annotation and accessors to perform the >> single round of AES encryption performed inside the AEGIS transform. >

Re: [PATCH 1/2] crypto: morus/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel
On 1 October 2018 at 09:26, Ondrej Mosnacek wrote: > On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel > wrote: >> Omit the endian swabbing when folding the lengths of the assoc and >> crypt input buffers into the state to finalize the tag. This is not >> necessa

[PATCH] crypto: lrw - fix rebase error after out of bounds fix

2018-09-30 Thread Ard Biesheuvel
6bf347 ("crypto: lrw - Optimize tweak computation") Signed-off-by: Ard Biesheuvel --- crypto/lrw.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/crypto/lrw.c b/crypto/lrw.c index 6fcf0d431185..0430ccd08728 100644 --- a/crypto/lrw.c +++ b/crypto/lrw.c @

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-09-30 Thread Ard Biesheuvel
On 30 September 2018 at 10:58, Ard Biesheuvel wrote: > Use the correct __le32 annotation and accessors to perform the > single round of AES encryption performed inside the AEGIS transform. > Otherwise, tcrypt reports: > > alg: aead: Test 1 failed on encryption for aegis128-gener

[PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-09-30 Thread Ard Biesheuvel
, and derive the other ones by rotation. This reduces the D-cache footprint by 75%, and shouldn't be too costly or free on load/store architectures (and X86 has its own AES-NI based implementation) Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") Cc: # v4.18+ Signed-o

[PATCH 1/2] crypto: morus/generic - fix for big endian systems

2018-09-30 Thread Ard Biesheuvel
: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations") Cc: # v4.18+ Signed-off-by: Ard Biesheuvel --- crypto/morus1280.c | 7 ++- crypto/morus640.c | 16 2 files changed, 6 insertions(+), 17 deletions(-) diff --git a/crypto/morus1280.c b/crypto/morus12

[PATCH 0/2] crypto - fix aegis/morus for big endian systems

2018-09-30 Thread Ard Biesheuvel
Some bug fixes for issues that I stumbled upon while working on other stuff. Ard Biesheuvel (2): crypto: morus/generic - fix for big endian systems crypto: aegis/generic - fix for big endian systems crypto/aegis.h | 23 +--- crypto/morus1280.c | 7 ++ crypto

Re: [PATCH] crypto: qat - move temp buffers off the stack

2018-09-26 Thread Ard Biesheuvel
On Wed, 26 Sep 2018 at 11:53, Ard Biesheuvel wrote: > > Arnd reports that with Kees's latest VLA patches applied, the HMAC > handling in the QAT driver uses a worst case estimate of 160 bytes > for the SHA blocksize, allowing the compiler to determine the size > of the stack f

[RFC PATCH] crypto: x86/aes-ni - remove special handling of AES in PCBC mode

2018-09-24 Thread Ard Biesheuvel
(as well as LRW), but currently, PCBC is the only remaining user. Since there are no known users of pcbc(aes) in the kernel, let's remove this special driver, and rely on the generic pcbc driver to encapsulate the AES-NI core cipher. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/Makefile

Re: [PATCH 0/4] crypto: arm64/aes-blk - cleanups and optimizations for XTS/CTS-CBC

2018-09-20 Thread Ard Biesheuvel
On 10 September 2018 at 07:41, Ard Biesheuvel wrote: > Some cleanups and optimizations for the arm64 AES skcipher routines. > > Patch #1 fixes the peculiar use of u8 arrays to refer to AES round keys, > which are natively arrays of u32. > > Patch #2 partially reverts the use o

Re: [PATCH 1/5] crypto: arm/aes-ce - enable module autoloading based on CPU feature bits

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 08:24, Stefan Agner wrote: > On 10.09.2018 00:01, Ard Biesheuvel wrote: >> On 10 September 2018 at 08:21, Stefan Agner wrote: >>> Hi Ard, >>> >>> On 21.05.2017 03:23, Ard Biesheuvel wrote: >>>> Make the module

Re: [PATCH] crypto: tcrypt - fix ghash-generic speed test

2018-09-12 Thread Ard Biesheuvel
cks, 16 bytes per update, 1 updates): > tcrypt: hashing failed ret=-126 > > Cc: # 4.6+ > Fixes: 0660511c0bee ("crypto: tcrypt - Use ahash") > Tested-by: Franck Lenormand > Signed-off-by: Horia Geantă Acked-by: Ard Biesheuvel > --- > crypto/tcrypt.c |

[PATCH 1/4] crypto: arm64/aes-blk - remove pointless (u8 *) casts

2018-09-10 Thread Ard Biesheuvel
For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8* casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel --- arch/arm64

[PATCH 2/4] crypto: arm64/aes-blk - revert NEON yield for skciphers

2018-09-10 Thread Ard Biesheuvel
additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 281 --

[PATCH 4/4] crypto: arm64/aes-blk - improve XTS mask handling

2018-09-10 Thread Ard Biesheuvel
in a ~4% speedup. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-ce.S| 5 +++ arch/arm64/crypto/aes-modes.S | 40 ++-- arch/arm64/crypto/aes-neon.S | 6 +++ 3 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch

[PATCH 0/4] crypto: arm64/aes-blk - cleanups and optimizations for XTS/CTS-CBC

2018-09-10 Thread Ard Biesheuvel
for cts(cbc(aes)) in the NEON chaining mode handling. Patch #4 tweaks the XTS handling to remove a literal load from the inner loop. Cc: Eric Biggers Cc: Theodore Ts'o Cc: Steve Capper Ard Biesheuvel (4): crypto: arm64/aes-blk - remove pointless (u8 *) casts crypto: arm64/aes-blk - revert NEON

[PATCH 3/4] crypto: arm64/aes-blk - add support for CTS-CBC mode

2018-09-10 Thread Ard Biesheuvel
the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel --- This patch supersedes '[RFC/RFT PATCH] crypto: arm64/aes-ce - add support for CTS-CBC mode' sent out last Saturday. Changes: - keep subreq

Re: [PATCH 1/5] crypto: arm/aes-ce - enable module autoloading based on CPU feature bits

2018-09-10 Thread Ard Biesheuvel
On 10 September 2018 at 08:21, Stefan Agner wrote: > Hi Ard, > > On 21.05.2017 03:23, Ard Biesheuvel wrote: >> Make the module autoloadable by tying it to the CPU feature bit that >> describes whether the optional instructions it relies on are implemented >> by the cur

[RFC/RFT PATCH] crypto: arm64/aes-ce - add support for CTS-CBC mode

2018-09-08 Thread Ard Biesheuvel
the CTS handling into the core algorithm. On Cortex-A53, this results in a ~50% speedup for smaller block sizes. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-glue.c | 142 arch/arm64/crypto/aes-modes.S | 73 ++ 2

Re: [PATCH] fscrypt: remove CRYPTO_CTR dependency

2018-09-06 Thread Ard Biesheuvel
d. Or maybe it was actually a bug in a non-upstream crypto > driver. > > So, remove the dependency. If it turns out there's actually still a > bug, we'll fix it properly. > > Signed-off-by: Eric Biggers Acked-by: Ard Biesheuvel This may be related to 11e3b725cfc2 crypto: arm64/ae

Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-08-31 Thread Ard Biesheuvel
On 31 August 2018 at 17:56, Ard Biesheuvel wrote: > Hi Eric, > > On 31 August 2018 at 10:01, Eric Biggers wrote: >> From: Eric Biggers >> >> Optimize ChaCha20 NEON performance by: >> >> - Implementing the 8-bit rotations using the 'vtbl.8' instructio

Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-08-31 Thread Ard Biesheuvel
Hi Eric, On 31 August 2018 at 10:01, Eric Biggers wrote: > From: Eric Biggers > > Optimize ChaCha20 NEON performance by: > > - Implementing the 8-bit rotations using the 'vtbl.8' instruction. > - Streamlining the part that adds the original state and XORs the data. > - Making some other small

Re: [PATCH v2] crypto: arm64/aes-modes - get rid of literal load of addend vector

2018-08-23 Thread Ard Biesheuvel
On 23 August 2018 at 21:04, Nick Desaulniers wrote: > On Thu, Aug 23, 2018 at 9:48 AM Ard Biesheuvel > wrote: >> >> Replace the literal load of the addend vector with a sequence that >> performs each add individually. This sequence is only 2 instructions >> l

[PATCH v2] crypto: arm64/aes-modes - get rid of literal load of addend vector

2018-08-23 Thread Ard Biesheuvel
not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers Signed-off-by: Ard Biesheuvel --- v2: replace convoluted code involving a SIMD add to increment four

[PATCH v2] crypto: arm/ghash-ce - implement support for 4-way aggregation

2018-08-23 Thread Ard Biesheuvel
to 3.0 cycles per byte. Signed-off-by: Ard Biesheuvel --- v2: modulo schedule the loads of the input add AES/GCM performance numbers to commit log arch/arm/crypto/Kconfig | 1 + arch/arm/crypto/ghash-ce-core.S | 108 +++- arch/arm/crypto/ghash-ce-glue.c | 38

[PATCH] crypto: arm/ghash-ce - implement support for 4-way aggregation

2018-08-22 Thread Ard Biesheuvel
Speed up the GHASH algorithm based on 64-bit polynomial multiplication by adding support for 4-way aggregation. This improves throughput by ~60% on Cortex-A53, from 1.70 cycles per byte to 1.05 cycles per byte. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/Kconfig | 1 + arch/arm

Re: [PATCH] crypto: arm64/aes-modes - get rid of literal load of addend vector

2018-08-21 Thread Ard Biesheuvel
On 21 August 2018 at 20:34, Nick Desaulniers wrote: > On Tue, Aug 21, 2018 at 11:19 AM Ard Biesheuvel > wrote: >> >> On 21 August 2018 at 20:04, Nick Desaulniers wrote: >> > On Tue, Aug 21, 2018 at 9:46 AM Ard Biesheuvel >> > wrote: >> >> >

Re: [PATCH] crypto: arm64/aes-modes - get rid of literal load of addend vector

2018-08-21 Thread Ard Biesheuvel
On 21 August 2018 at 20:04, Nick Desaulniers wrote: > On Tue, Aug 21, 2018 at 9:46 AM Ard Biesheuvel > wrote: >> >> Replace the literal load of the addend vector with a sequence that >> composes it using immediates. While at it, tweak the code that refers >>

[PATCH] crypto: arm64/aes-modes - get rid of literal load of addend vector

2018-08-21 Thread Ard Biesheuvel
issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers. Cc: Nick Desaulniers Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 18 -- 1 file changed, 12 insertions(+), 6

[PATCH] crypto: arm64/aes-gcm-ce - fix scatterwalk API violation

2018-08-20 Thread Ard Biesheuvel
rm64/aes-ce-gcm - operate on two ...") Reported-by: Vakul Garg Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-glue.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/g

[PATCH] crypto: arm64/sm4-ce - check for the right CPU feature bit

2018-08-07 Thread Ard Biesheuvel
fix that. Signed-off-by: Ard Biesheuvel --- It would be good to get this backported to -stable but there is no need to merge this as a fix at -rc8 arch/arm64/crypto/sm4-ce-glue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64

[PATCH 0/2] crypto: arm64/ghash-ce - performance improvements

2018-08-04 Thread Ard Biesheuvel
at 1.1 cycles per byte on Cortex-A53 (down from 2.4 cycles per byte) Ard Biesheuvel (2): crypto: arm64/ghash-ce - replace NEON yield check with block limit crypto: arm64/ghash-ce - implement 4-way aggregation arch/arm64/crypto/ghash-ce-core.S | 153 ++-- arch/arm64/crypto/ghash-ce

[PATCH 2/2] crypto: arm64/ghash-ce - implement 4-way aggregation

2018-08-04 Thread Ard Biesheuvel
Enhance the GHASH implementation that uses 64-bit polynomial multiplication by adding support for 4-way aggregation. This more than doubles the performance, from 2.4 cycles per byte to 1.1 cpb on Cortex-A53. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 122

[PATCH 1/2] crypto: arm64/ghash-ce - replace NEON yield check with block limit

2018-08-04 Thread Ard Biesheuvel
for excessive amounts of time. So let's simply cap the maximum input size that is processed in one go to 64 KB. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 39 ++-- arch/arm64/crypto/ghash-ce-glue.c | 16 ++-- 2 files changed, 23 insertions(+), 32

Re: [PATCH v2 0/3] crypto/arm64: aes-ce-gcm - switch to 2-way aggregation

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 17:47, Herbert Xu wrote: > On Mon, Jul 30, 2018 at 11:06:39PM +0200, Ard Biesheuvel wrote: >> Update the combined AES-GCM AEAD implementation to process two blocks >> at a time, allowing us to switch to a faster version of the GHASH >> implementation. >

Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 10:17, Herbert Xu wrote: > On Fri, Aug 03, 2018 at 09:10:08AM +0200, Ard Biesheuvel wrote: >> But I think it's too late now to take this into v4.18. Could you >> please queue this (and my other two pending arm64/aes-gcm patches, if >> possible) for v4.19

Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 08:14, Herbert Xu wrote: > On Sun, Jul 29, 2018 at 04:52:30PM +0200, Ard Biesheuvel wrote: >> As it turns out, checking the TIF_NEED_RESCHED flag after each >> iteration results in a significant performance regression (~10%) >> when running fast algorithms

Re: [PATCH] crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair

2018-07-31 Thread Ard Biesheuvel
(+ Catalin, Will) On 27 July 2018 at 14:59, Ard Biesheuvel wrote: > Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and > kernel_neon_end() to be used since the routine touches the NEON > register file. Add the missing calls. > > Also, since NEON re

[PATCH v2 1/3] crypto/arm64: aes-ce-gcm - operate on two input blocks at a time

2018-07-30 Thread Ard Biesheuvel
Update the core AES/GCM transform and the associated plumbing to operate on 2 AES/GHASH blocks at a time. By itself, this is not expected to result in a noticeable speedup, but it paves the way for reimplementing the GHASH component using 2-way aggregation. Signed-off-by: Ard Biesheuvel

[PATCH v2 3/3] crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable

2018-07-30 Thread Ard Biesheuvel
Squeeze out another 5% of performance by minimizing the number of invocations of kernel_neon_begin()/kernel_neon_end() on the common path, which also allows some reloads of the key schedule to be optimized away. The resulting code runs at 2.3 cycles per byte on a Cortex-A53. Signed-off-by: Ard

[PATCH v2 2/3] crypto/arm64: aes-ce-gcm - implement 2-way aggregation

2018-07-30 Thread Ard Biesheuvel
Implement a faster version of the GHASH transform which amortizes the reduction modulo the characteristic polynomial across two input blocks at a time. On a Cortex-A53, the gcm(aes) performance increases 24%, from 3.0 cycles per byte to 2.4 cpb for large input sizes. Signed-off-by: Ard

[PATCH v2 0/3] crypto/arm64: aes-ce-gcm - switch to 2-way aggregation

2018-07-30 Thread Ard Biesheuvel
the changes in patch 'crypto: arm64 - revert NEON yield for fast AEAD implementations' which I sent out on July 29th - add a patch to reduce the number of invocations of kernel_neon_begin() and kernel_neon_end() on the common path Ard Biesheuvel (3): crypto/arm64: aes-ce-gcm - operate on two

[PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-07-29 Thread Ard Biesheuvel
mit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3. Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...") Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...") Signed

[PATCH 0/2] crypto/arm64: aes-ce-gcm - switch to 2-way aggregation

2018-07-28 Thread Ard Biesheuvel
to benefit substantially from aggregation, given that the multiplication phase is much more dominant in this case (and it is only the reduction phase that is amortized over multiple blocks) Performance numbers for Cortex-A53 can be found after patch #2. Ard Biesheuvel (2): crypto/arm64: aes-ce

[PATCH 1/2] crypto/arm64: aes-ce-gcm - operate on two input blocks at a time

2018-07-28 Thread Ard Biesheuvel
Update the core AES/GCM transform and the associated plumbing to operate on 2 AES/GHASH blocks at a time. By itself, this is not expected to result in a noticeable speedup, but it paves the way for reimplementing the GHASH component using 2-way aggregation. Signed-off-by: Ard Biesheuvel

[PATCH 2/2] crypto/arm64: aes-ce-gcm - implement 2-way aggregation

2018-07-28 Thread Ard Biesheuvel
3, the gcm(aes) performance increases 24%, from 3.0 cycles per byte to 2.4 cpb for large input sizes. Signed-off-by: Ard Biesheuvel --- Raw numbers after the patch arch/arm64/crypto/ghash-ce-core.S | 87 +++- arch/arm64/crypto/ghash-ce-glue.c | 33 ++-- 2 files changed, 54

[PATCH] crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair

2018-07-27 Thread Ard Biesheuvel
. Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...") Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-glue.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-gl

Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks

2018-07-26 Thread Ard Biesheuvel
On 25 July 2018 at 18:50, bige...@linutronix.de wrote: > On 2018-07-25 11:54:53 [+0200], Ard Biesheuvel wrote: >> Indeed. OTOH, if the -rt people (Sebastian?) turn up and say that a >> 1000 cycle limit to the quantum of work performed with preemption >> disabled is unr

Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 11:45, Dave Martin wrote: > On Wed, Jul 25, 2018 at 10:23:00AM +0100, Ard Biesheuvel wrote: >> On 25 July 2018 at 11:09, Dave Martin wrote: >> > On Tue, Jul 24, 2018 at 06:12:20PM +0100, Ard Biesheuvel wrote: >> >> Vakul reports a considerabl

Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 11:48, Dave Martin wrote: > On Wed, Jul 25, 2018 at 10:11:42AM +0100, Ard Biesheuvel wrote: >> On 25 July 2018 at 11:05, Dave Martin wrote: >> > On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote: >> >> As reported by Vakul, checking t

Re: [PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 11:09, Dave Martin wrote: > On Tue, Jul 24, 2018 at 06:12:20PM +0100, Ard Biesheuvel wrote: >> Vakul reports a considerable performance hit when running the accelerated >> arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have >>

Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 11:05, Dave Martin wrote: > On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote: >> As reported by Vakul, checking the TIF_NEED_RESCHED flag after every >> iteration of the GHASH and AES-GCM core routines is having a considerable >> perfor

Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
On 25 July 2018 at 09:27, Ard Biesheuvel wrote: > (+ Mark) > > On 25 July 2018 at 08:57, Vakul Garg wrote: >> >> >>> -Original Message- >>> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org] >>> Sent: Tuesday, July 24, 2018 10:42 PM &

Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

2018-07-25 Thread Ard Biesheuvel
(+ Mark) On 25 July 2018 at 08:57, Vakul Garg wrote: > > >> -Original Message----- >> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org] >> Sent: Tuesday, July 24, 2018 10:42 PM >> To: linux-crypto@vger.kernel.org >> Cc: herb...@gondor.apana.org.

Re: [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates

2018-07-25 Thread Ard Biesheuvel
too. > > Signed-off-by: Eric Biggers Acked-by: Ard Biesheuvel > --- > arch/arm/crypto/chacha20-neon-core.S | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/arm/crypto/chacha20-neon-core.S > b/arch/arm/crypto/chacha20-neon-co

[PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

2018-07-24 Thread Ard Biesheuvel
due to disabling preemption to ~1000 cycles. Cc: Vakul Garg Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index dcffb9e77589

[PATCH 0/4] crypto/arm64: reduce impact of NEON yield checks

2018-07-24 Thread Ard Biesheuvel
between throughput and worst case scheduling latency. Ard Biesheuvel (4): crypto/arm64: ghash - reduce performance impact of NEON yield checks crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks crypto/arm64: sha1 - reduce performance impact of NEON yield checks crypto

[PATCH 3/4] crypto/arm64: sha1 - reduce performance impact of NEON yield checks

2018-07-24 Thread Ard Biesheuvel
Only perform the NEON yield check for every 4 blocks of input, to prevent taking a considerable performance hit on cores with very fast crypto instructions and comparatively slow memory accesses, such as the Cortex-A53. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 3

[PATCH 4/4] crypto/arm64: sha2 - reduce performance impact of NEON yield checks

2018-07-24 Thread Ard Biesheuvel
Only perform the NEON yield check for every 4 blocks of input, to prevent taking a considerable performance hit on cores with very fast crypto instructions and comparatively slow memory accesses, such as the Cortex-A53. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha2-ce-core.S | 3

[PATCH 2/4] crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks

2018-07-24 Thread Ard Biesheuvel
Only perform the NEON yield check for every 8 blocks of input, to prevent taking a considerable performance hit on cores with very fast crypto instructions and comparatively slow memory accesses, such as the Cortex-A53. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 3

Re: [PATCH] crypto: arm/speck - fix building in Thumb2 mode

2018-06-19 Thread Ard Biesheuvel
n 'sp'. > */ > - sub sp, #128 > - bic sp, #0xf > + sub r12, sp, #128 > + bic r12, #0xf > + mov sp, r12 > > .if \n == 64 > // Load first tweak Acked-by: Ard Biesheuvel

Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-06-18 Thread Ard Biesheuvel
On 18 June 2018 at 23:56, Eric Biggers wrote: > On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote: >> >>>>> + >> >>>>> + // One-time XTS preparation >> >>>>> + >> >>>>

Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-06-17 Thread Ard Biesheuvel
On 17 June 2018 at 12:41, Stefan Agner wrote: > On 17.06.2018 11:40, Ard Biesheuvel wrote: >> On 17 June 2018 at 11:30, Ard Biesheuvel wrote: >>> On 17 June 2018 at 00:40, Stefan Agner wrote: >>>> Hi Eric, >>>> >>>> On 14.02.2018 19:42,

Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-06-17 Thread Ard Biesheuvel
On 17 June 2018 at 11:30, Ard Biesheuvel wrote: > On 17 June 2018 at 00:40, Stefan Agner wrote: >> Hi Eric, >> >> On 14.02.2018 19:42, Eric Biggers wrote: >>> Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on >>> 128-byte chunks at a

Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-06-17 Thread Ard Biesheuvel
On 17 June 2018 at 00:40, Stefan Agner wrote: > Hi Eric, > > On 14.02.2018 19:42, Eric Biggers wrote: >> Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on >> 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for >> Speck64. Each 128-byte chunk goes through

Re: [PATCH] crypto: don't optimize keccakf()

2018-06-08 Thread Ard Biesheuvel
rm to help > the compiler optimize") > Reported-by: syzbot+37035ccfa9a0a017f...@syzkaller.appspotmail.com > Reported-by: syzbot+e073e4740cfbb3ae2...@syzkaller.appspotmail.com > Cc: linux-crypto@vger.kernel.org > Cc: "David S. Miller" > Cc: Herbert Xu > Cc: Ard

  1   2   3   4   5   6   7   8   9   >