Improve the performance of NEON based ChaCha:
Patch #1 adds a block size of 1472 to the tcrypt test template so we have
something that reflects the VPN case.
Patch #2 improves performance for arbitrary length inputs: on deep pipelines,
throughput increases ~30% when running on inputs blocks
In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN packet, add 1472 to the
block_sizes array.
Signed-off-by: Ard Biesheuvel
---
crypto/tcrypt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/tcrypt.c
To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.
Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.
On inputs that are a multiple of
From: Eric Biggers
If the stream cipher implementation is asynchronous, then the Adiantum
instance must be flagged as asynchronous as well. Otherwise someone
asking for a synchronous algorithm can get an asynchronous algorithm.
There are no asynchronous xchacha12 or xchacha20 implementations
On Thu, Sep 06, 2018 at 12:43:41PM +0200, Ard Biesheuvel wrote:
> On 5 September 2018 at 21:24, Eric Biggers wrote:
> > From: Eric Biggers
> >
> > fscrypt doesn't use the CTR mode of operation for anything, so there's
> > no need to select CRYPTO_CTR. It was added by commit 71dea01ea2ed
> >
I was curious if it might make implementing F() faster to use
instructions that are meant to work with sets of data similar to what
would be processed