Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-09-01 Thread Eric Biggers
On Fri, Aug 31, 2018 at 06:51:34PM +0200, Ard Biesheuvel wrote: > >> > >> + adr ip, .Lrol8_table > >> mov r3, #10 > >> > >> .Ldoubleround4: > >> @@ -238,24 +268,25 @@ ENTRY(chacha20_4block_xor_neon) > >> // x1 += x5, x13 = rotl32(x13 ^ x1, 8) > >>

Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-09-01 Thread Eric Biggers
Hi Ard, On Fri, Aug 31, 2018 at 05:56:24PM +0200, Ard Biesheuvel wrote: > Hi Eric, > > On 31 August 2018 at 10:01, Eric Biggers wrote: > > From: Eric Biggers > > > > Optimize ChaCha20 NEON performance by: > > > > - Implementing the 8-bit rotations using the 'vtbl.8' instruction. > > -

Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-08-31 Thread Ard Biesheuvel
On 31 August 2018 at 17:56, Ard Biesheuvel wrote: > Hi Eric, > > On 31 August 2018 at 10:01, Eric Biggers wrote: >> From: Eric Biggers >> >> Optimize ChaCha20 NEON performance by: >> >> - Implementing the 8-bit rotations using the 'vtbl.8' instruction. >> - Streamlining the part that adds the

Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-08-31 Thread Ard Biesheuvel
Hi Eric, On 31 August 2018 at 10:01, Eric Biggers wrote: > From: Eric Biggers > > Optimize ChaCha20 NEON performance by: > > - Implementing the 8-bit rotations using the 'vtbl.8' instruction. > - Streamlining the part that adds the original state and XORs the data. > - Making some other small

[PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

2018-08-31 Thread Eric Biggers
From: Eric Biggers Optimize ChaCha20 NEON performance by: - Implementing the 8-bit rotations using the 'vtbl.8' instruction. - Streamlining the part that adds the original state and XORs the data. - Making some other small tweaks. On ARM Cortex-A7, these optimizations improve ChaCha20