(370278656 bytes)
tcrypt: test 5 (256 bit key, 8192 byte blocks): 47650 operations in 1 seconds
(390348800 bytes)
Cc: Eric Biggers
Cc: Martin Willi
Ard Biesheuvel (3):
crypto: tcrypt - add block size of 1472 to skcipher template
crypto: arm64/chacha - optimize for arbitrary length inputs
In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN packet, add 1472 to the
block_sizes array.
Signed-off-by: Ard Biesheuvel
---
crypto/tcrypt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/tcrypt.c b/crypto
-by: Ard Biesheuvel
---
arch/arm64/crypto/chacha-neon-core.S | 235 ++--
arch/arm64/crypto/chacha-neon-glue.c | 39 ++--
2 files changed, 239 insertions(+), 35 deletions(-)
diff --git a/arch/arm64/crypto/chacha-neon-core.S
b/arch/arm64/crypto/chacha-neon-core.S
index
ple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/chacha-neon-core.S | 183 ++--
arch/ar
On 9 November 2018 at 10:45, Herbert Xu wrote:
> On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote:
>> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
>> >
>> > This should be
>> >
>> > reqs
(+ Masahiro, kbuild ml)
On 8 November 2018 at 21:37, Jason A. Donenfeld wrote:
> Hi Ard, Eric, and others,
>
> As promised, the next Zinc patchset will have less generated code! After a
> bit of work with Andy and Samuel, I'll be bundling the perlasm.
>
Wonderful! Any problems doing that for
On 8 November 2018 at 23:55, Ard Biesheuvel wrote:
> The simd wrapper's skcipher request context structure consists
> of a single subrequest whose size is taken from the subordinate
> skcipher. However, in simd_skcipher_init(), the reqsize that is
> retrieved is not from the subordin
is completely unrelated to
the actual wrapped skcipher.
Reported-by: Qian Cai
Signed-off-by: Ard Biesheuvel
---
crypto/simd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/simd.c b/crypto/simd.c
index ea7240be3001..2f3d6e897afc 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
On 23 October 2018 at 04:01, James Bottomley
wrote:
> On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote:
> [...]
>> > +static void hmac_init(struct shash_desc *desc, u8 *key, int
>> > keylen)
>> > +{
>> > + u8 pad[SHA256_BLOCK_SIZE];
>>
Hi James,
Some comments below on how you are using the crypto API.
On 22 October 2018 at 04:36, James Bottomley
wrote:
> This code adds true session based HMAC authentication plus parameter
> decryption and response encryption using AES.
>
> The basic design of this code is to segregate all the
On 21 October 2018 at 11:00, James Bottomley
wrote:
> On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel
> wrote:
>>On 21 October 2018 at 10:07, James Bottomley
>> wrote:
>>> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
>>>> (+ James)
>
On 21 October 2018 at 10:07, James Bottomley
wrote:
> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
>> (+ James)
>
> Thanks!
>
>> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
>> wrote:
>> > crypto_cfb_decrypt_segment() incorrectly XOR'e
(+ James)
On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
wrote:
> Add AES128/192/256-CFB testvectors from NIST SP800-38A.
>
> Signed-off-by: Dmitry Eremin-Solenikov
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Eremin-Solenikov
> ---
> crypto/tcrypt.c | 5
>
(+ James)
On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
wrote:
> crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
> IV, rather than with data stream, resulting in incorrect decryption.
> Test vectors will be added in the next patch.
>
> Signed-off-by: Dmitry
On 20 October 2018 at 04:39, Eric Biggers wrote:
> On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote:
>> On 19 October 2018 at 13:41, Ard Biesheuvel
>> wrote:
>> > On 18 October 2018 at 12:37, Eric Biggers wrote:
>> >> From: Eric Biggers
On 19 October 2018 at 13:41, Ard Biesheuvel wrote:
> On 18 October 2018 at 12:37, Eric Biggers wrote:
>> From: Eric Biggers
>>
>> Make the ARM scalar AES implementation closer to constant-time by
>> disabling interrupts and prefetching the tables into L1 cache. This
onstant-time AES
> software. But it's valuable to make such attacks more difficult.
>
> Much of this patch is based on patches suggested by Ard Biesheuvel.
>
> Suggested-by: Ard Biesheuvel
> Signed-off-by: Eric Biggers
Reviewed-by: Ard Biesheuvel
> ---
> arch/ar
d in writing truly constant-time AES
> software. But it's valuable to make such attacks more difficult.
>
> Signed-off-by: Eric Biggers
Thanks for taking a look. Could we add something to the Kconfig blurb
that mentions that it runs the algorithm witn interrupts disabled? In
any
ter these changes, the implementation still isn't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software. But it's valuable to make such a
On 9 October 2018 at 05:47, Eric Biggers wrote:
> Hi Ard,
>
> On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote:
>> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
>>
On 9 October 2018 at 05:34, Eric Biggers wrote:
> Hi Ard,
>
> On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote:
>> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
>>
for architectures other than ARM that define
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, let's switch to them in a couple
of places in the crypto code.
Note that all patches are against code that has been observed to be emitted
with ldm or ldrd instructions when building ARM's multi_v7_defconfig.
Ard Biesheuvel (3
of misalignment, and generate code for ARMv6+ that
avoids load/store instructions that trigger alignment faults.
Signed-off-by: Ard Biesheuvel
---
crypto/memneq.c | 24 ++--
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/crypto/memneq.c b/crypto/memneq.c
index afed1bd16aee
of misalignment, and generate code for ARMv6+ that
avoids load/store instructions that trigger alignment faults.
Signed-off-by: Ard Biesheuvel
---
crypto/algapi.c | 7 +++
include/crypto/algapi.h | 11 +--
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/crypto/algapi.c
on the aligned code paths. Given the above, this
either produces the same code, or better in the ARMv6+ case. However,
since that removes the only difference between the aligned and unaligned
variants, we can drop the aligned variant entirely.
Signed-off-by: Ard Biesheuvel
---
include/linux/siphash.h
different buffers. So let's ensure that the first load occurs
unconditionally, and move the reload to the end so it doesn't occur
needlessly.
Fixes: 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling")
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-modes.S | 8
On 5 October 2018 at 19:13, Eric Biggers wrote:
> From: Eric Biggers
>
> aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
> to register/unregister the "fpu" template. But these functions don't
> exist anymore, causing a build error. Remove the calls to them.
>
> Fixes:
On 5 October 2018 at 04:29, Herbert Xu wrote:
> On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote:
>> Arnd reports that with Kees's latest VLA patches applied, the HMAC
>> handling in the QAT driver uses a worst case estimate of 160 bytes
>> for the SH
Hi Eric,
On 4 October 2018 at 06:07, Eric Biggers wrote:
> From: Eric Biggers
>
> The generic constant-time AES implementation is supposed to preload the
> AES S-box into the CPU's L1 data cache. But, an interrupt handler can
> run on the CPU and muck with the cache. Worse, on preemptible
) to
> indicate the minimum input size.
>
> Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode")
> Signed-off-by: Eric Biggers
Thanks Eric
Reviewed-by: Ard Biesheuvel
> ---
> arch/arm64/crypto/aes-glue.c | 13 +
> 1 file cha
: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations")
Cc: # v4.18+
Reviewed-by: Ondrej Mosnacek
Signed-off-by: Ard Biesheuvel
---
crypto/morus1280.c | 7 ++-
crypto/morus640.c | 16
2 files changed, 6 insertions(+), 17 deletions(-)
diff --gi
AEAD implementations")
Cc: # v4.18+
Signed-off-by: Ard Biesheuvel
---
crypto/aegis.h | 20 +---
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/crypto/aegis.h b/crypto/aegis.h
index f1c6900ddb80..405e025fc906 100644
--- a/crypto/aegis.h
+++ b/crypto/aegis.h
@@ -
Some bug fixes for issues that I stumbled upon while working on other
stuff.
Changes since v1:
- add Ondrej's ack to #1
- simplify #2 and drop unrelated performance tweak
Ard Biesheuvel (2):
crypto: morus/generic - fix for big endian systems
crypto: aegis/generic - fix for big endian systems
On 1 October 2018 at 10:00, Ondrej Mosnacek wrote:
> On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel
> wrote:
>> On 30 September 2018 at 10:58, Ard Biesheuvel
>> wrote:
>> > Use the correct __le32 annotation and accessors to perform the
>> > single roun
On 1 October 2018 at 09:50, Ondrej Mosnacek wrote:
> Hi Ard,
>
> On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel
> wrote:
>> Use the correct __le32 annotation and accessors to perform the
>> single round of AES encryption performed inside the AEGIS transform.
>
On 1 October 2018 at 09:26, Ondrej Mosnacek wrote:
> On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel
> wrote:
>> Omit the endian swabbing when folding the lengths of the assoc and
>> crypt input buffers into the state to finalize the tag. This is not
>> necessa
6bf347 ("crypto: lrw - Optimize tweak computation")
Signed-off-by: Ard Biesheuvel
---
crypto/lrw.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/crypto/lrw.c b/crypto/lrw.c
index 6fcf0d431185..0430ccd08728 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@
On 30 September 2018 at 10:58, Ard Biesheuvel wrote:
> Use the correct __le32 annotation and accessors to perform the
> single round of AES encryption performed inside the AEGIS transform.
> Otherwise, tcrypt reports:
>
> alg: aead: Test 1 failed on encryption for aegis128-gener
, and
derive the other ones by rotation. This reduces the D-cache footprint
by 75%, and shouldn't be too costly or free on load/store architectures
(and X86 has its own AES-NI based implementation)
Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations")
Cc: # v4.18+
Signed-o
: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations")
Cc: # v4.18+
Signed-off-by: Ard Biesheuvel
---
crypto/morus1280.c | 7 ++-
crypto/morus640.c | 16
2 files changed, 6 insertions(+), 17 deletions(-)
diff --git a/crypto/morus1280.c b/crypto/morus12
Some bug fixes for issues that I stumbled upon while working on other
stuff.
Ard Biesheuvel (2):
crypto: morus/generic - fix for big endian systems
crypto: aegis/generic - fix for big endian systems
crypto/aegis.h | 23 +---
crypto/morus1280.c | 7 ++
crypto
On Wed, 26 Sep 2018 at 11:53, Ard Biesheuvel wrote:
>
> Arnd reports that with Kees's latest VLA patches applied, the HMAC
> handling in the QAT driver uses a worst case estimate of 160 bytes
> for the SHA blocksize, allowing the compiler to determine the size
> of the stack f
(as well as LRW), but currently,
PCBC is the only remaining user.
Since there are no known users of pcbc(aes) in the kernel, let's remove
this special driver, and rely on the generic pcbc driver to encapsulate
the AES-NI core cipher.
Signed-off-by: Ard Biesheuvel
---
arch/x86/crypto/Makefile
On 10 September 2018 at 07:41, Ard Biesheuvel wrote:
> Some cleanups and optimizations for the arm64 AES skcipher routines.
>
> Patch #1 fixes the peculiar use of u8 arrays to refer to AES round keys,
> which are natively arrays of u32.
>
> Patch #2 partially reverts the use o
On 13 September 2018 at 08:24, Stefan Agner wrote:
> On 10.09.2018 00:01, Ard Biesheuvel wrote:
>> On 10 September 2018 at 08:21, Stefan Agner wrote:
>>> Hi Ard,
>>>
>>> On 21.05.2017 03:23, Ard Biesheuvel wrote:
>>>> Make the module
cks, 16 bytes per update, 1 updates):
> tcrypt: hashing failed ret=-126
>
> Cc: # 4.6+
> Fixes: 0660511c0bee ("crypto: tcrypt - Use ahash")
> Tested-by: Franck Lenormand
> Signed-off-by: Horia Geantă
Acked-by: Ard Biesheuvel
> ---
> crypto/tcrypt.c |
For some reason, the asmlinkage prototypes of the NEON routines take
u8[] arguments for the round key arrays, while the actual round keys
are arrays of u32, and so passing them into those routines requires
u8* casts at each occurrence. Fix that.
Signed-off-by: Ard Biesheuvel
---
arch/arm64
additional TIF_NEED_RESCHED flag check inside the inner loop. So revert
the skcipher changes to aes-modes.S (but retain the mac ones)
This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-modes.S | 281 --
in a ~4% speedup.
Signed-off-by: Ard Biesheuvel
---
Raw performance numbers after the patch.
arch/arm64/crypto/aes-ce.S| 5 +++
arch/arm64/crypto/aes-modes.S | 40 ++--
arch/arm64/crypto/aes-neon.S | 6 +++
3 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/arch
for cts(cbc(aes)) in the NEON chaining mode handling.
Patch #4 tweaks the XTS handling to remove a literal load from the inner
loop.
Cc: Eric Biggers
Cc: Theodore Ts'o
Cc: Steve Capper
Ard Biesheuvel (4):
crypto: arm64/aes-blk - remove pointless (u8 *) casts
crypto: arm64/aes-blk - revert NEON
the CTS handling into the SIMD routines.
On Cortex-A53, this results in a ~50% speedup for smaller input sizes.
Signed-off-by: Ard Biesheuvel
---
This patch supersedes '[RFC/RFT PATCH] crypto: arm64/aes-ce - add support
for CTS-CBC mode' sent out last Saturday.
Changes:
- keep subreq
On 10 September 2018 at 08:21, Stefan Agner wrote:
> Hi Ard,
>
> On 21.05.2017 03:23, Ard Biesheuvel wrote:
>> Make the module autoloadable by tying it to the CPU feature bit that
>> describes whether the optional instructions it relies on are implemented
>> by the cur
the CTS handling into the core algorithm.
On Cortex-A53, this results in a ~50% speedup for smaller block sizes.
Signed-off-by: Ard Biesheuvel
---
Raw performance numbers after the patch.
arch/arm64/crypto/aes-glue.c | 142
arch/arm64/crypto/aes-modes.S | 73 ++
2
d. Or maybe it was actually a bug in a non-upstream crypto
> driver.
>
> So, remove the dependency. If it turns out there's actually still a
> bug, we'll fix it properly.
>
> Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
This may be related to
11e3b725cfc2 crypto: arm64/ae
On 31 August 2018 at 17:56, Ard Biesheuvel wrote:
> Hi Eric,
>
> On 31 August 2018 at 10:01, Eric Biggers wrote:
>> From: Eric Biggers
>>
>> Optimize ChaCha20 NEON performance by:
>>
>> - Implementing the 8-bit rotations using the 'vtbl.8' instructio
Hi Eric,
On 31 August 2018 at 10:01, Eric Biggers wrote:
> From: Eric Biggers
>
> Optimize ChaCha20 NEON performance by:
>
> - Implementing the 8-bit rotations using the 'vtbl.8' instruction.
> - Streamlining the part that adds the original state and XORs the data.
> - Making some other small
On 23 August 2018 at 21:04, Nick Desaulniers wrote:
> On Thu, Aug 23, 2018 at 9:48 AM Ard Biesheuvel
> wrote:
>>
>> Replace the literal load of the addend vector with a sequence that
>> performs each add individually. This sequence is only 2 instructions
>> l
not implement the GNU ARM asm syntax
completely, and does not support the =literal notation for FP registers
(more info at https://bugs.llvm.org/show_bug.cgi?id=38642)
Cc: Nick Desaulniers
Signed-off-by: Ard Biesheuvel
---
v2: replace convoluted code involving a SIMD add to increment four
to 3.0 cycles per byte.
Signed-off-by: Ard Biesheuvel
---
v2: modulo schedule the loads of the input
add AES/GCM performance numbers to commit log
arch/arm/crypto/Kconfig | 1 +
arch/arm/crypto/ghash-ce-core.S | 108 +++-
arch/arm/crypto/ghash-ce-glue.c | 38
Speed up the GHASH algorithm based on 64-bit polynomial multiplication
by adding support for 4-way aggregation. This improves throughput by
~60% on Cortex-A53, from 1.70 cycles per byte to 1.05 cycles per byte.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 1 +
arch/arm
On 21 August 2018 at 20:34, Nick Desaulniers wrote:
> On Tue, Aug 21, 2018 at 11:19 AM Ard Biesheuvel
> wrote:
>>
>> On 21 August 2018 at 20:04, Nick Desaulniers wrote:
>> > On Tue, Aug 21, 2018 at 9:46 AM Ard Biesheuvel
>> > wrote:
>> >>
>
On 21 August 2018 at 20:04, Nick Desaulniers wrote:
> On Tue, Aug 21, 2018 at 9:46 AM Ard Biesheuvel
> wrote:
>>
>> Replace the literal load of the addend vector with a sequence that
>> composes it using immediates. While at it, tweak the code that refers
>>
issue, whose integrated assembler does not implement the GNU ARM asm
syntax completely, and does not support the =literal notation for
FP registers.
Cc: Nick Desaulniers
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-modes.S | 18 --
1 file changed, 12 insertions(+), 6
rm64/aes-ce-gcm - operate on two ...")
Reported-by: Vakul Garg
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/ghash-ce-glue.c | 29 +++--
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c
b/arch/arm64/crypto/g
fix that.
Signed-off-by: Ard Biesheuvel
---
It would be good to get this backported to -stable but there is no
need to merge this as a fix at -rc8
arch/arm64/crypto/sm4-ce-glue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64
at 1.1 cycles per byte on Cortex-A53 (down from
2.4 cycles per byte)
Ard Biesheuvel (2):
crypto: arm64/ghash-ce - replace NEON yield check with block limit
crypto: arm64/ghash-ce - implement 4-way aggregation
arch/arm64/crypto/ghash-ce-core.S | 153 ++--
arch/arm64/crypto/ghash-ce
Enhance the GHASH implementation that uses 64-bit polynomial
multiplication by adding support for 4-way aggregation. This
more than doubles the performance, from 2.4 cycles per byte
to 1.1 cpb on Cortex-A53.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/ghash-ce-core.S | 122
for excessive amounts of time.
So let's simply cap the maximum input size that is processed in one go
to 64 KB.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/ghash-ce-core.S | 39 ++--
arch/arm64/crypto/ghash-ce-glue.c | 16 ++--
2 files changed, 23 insertions(+), 32
On 3 August 2018 at 17:47, Herbert Xu wrote:
> On Mon, Jul 30, 2018 at 11:06:39PM +0200, Ard Biesheuvel wrote:
>> Update the combined AES-GCM AEAD implementation to process two blocks
>> at a time, allowing us to switch to a faster version of the GHASH
>> implementation.
>
On 3 August 2018 at 10:17, Herbert Xu wrote:
> On Fri, Aug 03, 2018 at 09:10:08AM +0200, Ard Biesheuvel wrote:
>> But I think it's too late now to take this into v4.18. Could you
>> please queue this (and my other two pending arm64/aes-gcm patches, if
>> possible) for v4.19
On 3 August 2018 at 08:14, Herbert Xu wrote:
> On Sun, Jul 29, 2018 at 04:52:30PM +0200, Ard Biesheuvel wrote:
>> As it turns out, checking the TIF_NEED_RESCHED flag after each
>> iteration results in a significant performance regression (~10%)
>> when running fast algorithms
(+ Catalin, Will)
On 27 July 2018 at 14:59, Ard Biesheuvel wrote:
> Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
> kernel_neon_end() to be used since the routine touches the NEON
> register file. Add the missing calls.
>
> Also, since NEON re
Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.
Signed-off-by: Ard Biesheuvel
Squeeze out another 5% of performance by minimizing the number
of invocations of kernel_neon_begin()/kernel_neon_end() on the
common path, which also allows some reloads of the key schedule
to be optimized away.
The resulting code runs at 2.3 cycles per byte on a Cortex-A53.
Signed-off-by: Ard
Implement a faster version of the GHASH transform which amortizes
the reduction modulo the characteristic polynomial across two
input blocks at a time.
On a Cortex-A53, the gcm(aes) performance increases 24%, from
3.0 cycles per byte to 2.4 cpb for large input sizes.
Signed-off-by: Ard
the changes in patch 'crypto: arm64 - revert NEON yield for
fast AEAD implementations' which I sent out on July 29th
- add a patch to reduce the number of invocations of kernel_neon_begin()
and kernel_neon_end() on the common path
Ard Biesheuvel (3):
crypto/arm64: aes-ce-gcm - operate on two
mit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and
partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3.
Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed
to benefit substantially from aggregation,
given that the multiplication phase is much more dominant in this case
(and it is only the reduction phase that is amortized over multiple
blocks)
Performance numbers for Cortex-A53 can be found after patch #2.
Ard Biesheuvel (2):
crypto/arm64: aes-ce
Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.
Signed-off-by: Ard Biesheuvel
3, the gcm(aes) performance increases 24%, from 3.0 cycles per
byte to 2.4 cpb for large input sizes.
Signed-off-by: Ard Biesheuvel
---
Raw numbers after the patch
arch/arm64/crypto/ghash-ce-core.S | 87 +++-
arch/arm64/crypto/ghash-ce-glue.c | 33 ++--
2 files changed, 54
.
Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/ghash-ce-glue.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c
b/arch/arm64/crypto/ghash-ce-gl
On 25 July 2018 at 18:50, bige...@linutronix.de wrote:
> On 2018-07-25 11:54:53 [+0200], Ard Biesheuvel wrote:
>> Indeed. OTOH, if the -rt people (Sebastian?) turn up and say that a
>> 1000 cycle limit to the quantum of work performed with preemption
>> disabled is unr
On 25 July 2018 at 11:45, Dave Martin wrote:
> On Wed, Jul 25, 2018 at 10:23:00AM +0100, Ard Biesheuvel wrote:
>> On 25 July 2018 at 11:09, Dave Martin wrote:
>> > On Tue, Jul 24, 2018 at 06:12:20PM +0100, Ard Biesheuvel wrote:
>> >> Vakul reports a considerabl
On 25 July 2018 at 11:48, Dave Martin wrote:
> On Wed, Jul 25, 2018 at 10:11:42AM +0100, Ard Biesheuvel wrote:
>> On 25 July 2018 at 11:05, Dave Martin wrote:
>> > On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote:
>> >> As reported by Vakul, checking t
On 25 July 2018 at 11:09, Dave Martin wrote:
> On Tue, Jul 24, 2018 at 06:12:20PM +0100, Ard Biesheuvel wrote:
>> Vakul reports a considerable performance hit when running the accelerated
>> arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have
>>
On 25 July 2018 at 11:05, Dave Martin wrote:
> On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote:
>> As reported by Vakul, checking the TIF_NEED_RESCHED flag after every
>> iteration of the GHASH and AES-GCM core routines is having a considerable
>> perfor
On 25 July 2018 at 09:27, Ard Biesheuvel wrote:
> (+ Mark)
>
> On 25 July 2018 at 08:57, Vakul Garg wrote:
>>
>>
>>> -Original Message-
>>> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
>>> Sent: Tuesday, July 24, 2018 10:42 PM
&
(+ Mark)
On 25 July 2018 at 08:57, Vakul Garg wrote:
>
>
>> -Original Message-----
>> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
>> Sent: Tuesday, July 24, 2018 10:42 PM
>> To: linux-crypto@vger.kernel.org
>> Cc: herb...@gondor.apana.org.
too.
>
> Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
> ---
> arch/arm/crypto/chacha20-neon-core.S | 10 --
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/crypto/chacha20-neon-core.S
> b/arch/arm/crypto/chacha20-neon-co
due to disabling preemption to ~1000
cycles.
Cc: Vakul Garg
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/ghash-ce-core.S | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/crypto/ghash-ce-core.S
b/arch/arm64/crypto/ghash-ce-core.S
index dcffb9e77589
between throughput and worst case scheduling latency.
Ard Biesheuvel (4):
crypto/arm64: ghash - reduce performance impact of NEON yield checks
crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks
crypto/arm64: sha1 - reduce performance impact of NEON yield checks
crypto
Only perform the NEON yield check for every 4 blocks of input, to
prevent taking a considerable performance hit on cores with very
fast crypto instructions and comparatively slow memory accesses,
such as the Cortex-A53.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/sha1-ce-core.S | 3
Only perform the NEON yield check for every 4 blocks of input, to
prevent taking a considerable performance hit on cores with very
fast crypto instructions and comparatively slow memory accesses,
such as the Cortex-A53.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/sha2-ce-core.S | 3
Only perform the NEON yield check for every 8 blocks of input, to
prevent taking a considerable performance hit on cores with very
fast crypto instructions and comparatively slow memory accesses,
such as the Cortex-A53.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-ce-ccm-core.S | 3
n 'sp'.
> */
> - sub sp, #128
> - bic sp, #0xf
> + sub r12, sp, #128
> + bic r12, #0xf
> + mov sp, r12
>
> .if \n == 64
> // Load first tweak
Acked-by: Ard Biesheuvel
On 18 June 2018 at 23:56, Eric Biggers wrote:
> On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
>> >>>>> +
>> >>>>> + // One-time XTS preparation
>> >>>>> +
>> >>>>
On 17 June 2018 at 12:41, Stefan Agner wrote:
> On 17.06.2018 11:40, Ard Biesheuvel wrote:
>> On 17 June 2018 at 11:30, Ard Biesheuvel wrote:
>>> On 17 June 2018 at 00:40, Stefan Agner wrote:
>>>> Hi Eric,
>>>>
>>>> On 14.02.2018 19:42,
On 17 June 2018 at 11:30, Ard Biesheuvel wrote:
> On 17 June 2018 at 00:40, Stefan Agner wrote:
>> Hi Eric,
>>
>> On 14.02.2018 19:42, Eric Biggers wrote:
>>> Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
>>> 128-byte chunks at a
On 17 June 2018 at 00:40, Stefan Agner wrote:
> Hi Eric,
>
> On 14.02.2018 19:42, Eric Biggers wrote:
>> Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
>> 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
>> Speck64. Each 128-byte chunk goes through
rm to help
> the compiler optimize")
> Reported-by: syzbot+37035ccfa9a0a017f...@syzkaller.appspotmail.com
> Reported-by: syzbot+e073e4740cfbb3ae2...@syzkaller.appspotmail.com
> Cc: linux-crypto@vger.kernel.org
> Cc: "David S. Miller"
> Cc: Herbert Xu
> Cc: Ard
1 - 100 of 837 matches
Mail list logo