Re: [PATCH v2 3/9] arm64: fpsimd: run kernel mode NEON with softirqs disabled

2021-03-30 Thread Will Deacon
On Tue, Mar 02, 2021 at 10:01:12AM +0100, Ard Biesheuvel wrote:
> Kernel mode NEON can be used in task or softirq context, but only in
> a non-nesting manner, i.e., softirq context is only permitted if the
> interrupt was not taken at a point where the kernel was using the NEON
> in task context.
> 
> This means all users of kernel mode NEON have to be aware of this
> limitation, and either need to provide scalar fallbacks that may be much
> slower (up to 20x for AES instructions) and potentially less safe, or
> use an asynchronous interface that defers processing to a later time
> when the NEON is guaranteed to be available.
> 
> Given that grabbing and releasing the NEON is cheap, we can relax this
> restriction, by increasing the granularity of kernel mode NEON code, and
> always disabling softirq processing while the NEON is being used in task
> context.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/crypto/aes-modes.S  |  2 +-
>  arch/arm64/crypto/sha1-ce-core.S   |  2 +-
>  arch/arm64/crypto/sha2-ce-core.S   |  2 +-
>  arch/arm64/crypto/sha3-ce-core.S   |  4 +--
>  arch/arm64/crypto/sha512-ce-core.S |  2 +-
>  arch/arm64/include/asm/assembler.h | 28 +++-
>  arch/arm64/kernel/asm-offsets.c|  2 ++
>  arch/arm64/kernel/fpsimd.c |  4 +--
>  8 files changed, 31 insertions(+), 15 deletions(-)

Acked-by: Will Deacon 

Will


Re: [PATCH v2 2/9] arm64: assembler: introduce wxN aliases for wN registers

2021-03-30 Thread Will Deacon
On Tue, Mar 02, 2021 at 10:01:11AM +0100, Ard Biesheuvel wrote:
> The AArch64 asm syntax has this slightly tedious property that the names
> used in mnemonics to refer to registers depend on whether the opcode in
> question targets the entire 64-bits (xN), or only the least significant
> 8, 16 or 32 bits (wN). When writing parameterized code such as macros,
> this can be annoying, as macro arguments don't lend themselves to
> indexed lookups, and so generating a reference to wN in a macro that
> receives xN as an argument is problematic.
> 
> For instance, an upcoming patch that modifies the implementation of the
> cond_yield macro to be able to refer to 32-bit registers would need to
> modify invocations such as
> 
>   cond_yield  3f, x8
> 
> to
> 
>   cond_yield  3f, 8
> 
> so that the second argument can be token pasted after x or w to emit the
> correct register reference. Unfortunately, this interferes with the self
> documenting nature of the first example, where the second argument is
> obviously a register, whereas in the second example, one would need to
> go and look at the code to find out what '8' means.
> 
> So let's fix this by defining wxN aliases for all xN registers, which
> resolve to the 32-bit alias of each respective 64-bit register. This
> allows the macro implementation to paste the xN reference after a w to
> obtain the correct register name.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/assembler.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index e0fc1d424f9b..7b076ccd1a54 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -23,6 +23,14 @@
>  #include 
>  #include 
>  
> + /*
> +  * Provide a wxN alias for each wN register so what we can paste a xN
> +  * reference after a 'w' to obtain the 32-bit version.
> +  */
> + .irp
> n,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
> + wx\n.reqw\n
> + .endr

That's a pretty neat hack! I remember seeing code elsewhere which would
benefit from this, so might be worth a look at our other macros as I'm sure
I got annoyed by one the other day... ah yes, the SVE macros in fpsimdmacros.h

Acked-by: Will Deacon 

Will


Re: [PATCH v2 1/9] arm64: assembler: remove conditional NEON yield macros

2021-03-30 Thread Will Deacon
On Tue, Mar 02, 2021 at 10:01:10AM +0100, Ard Biesheuvel wrote:
> The users of the conditional NEON yield macros have all been switched to
> the simplified cond_yield macro, and so the NEON specific ones can be
> removed.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/assembler.h | 70 
>  1 file changed, 70 deletions(-)

Acked-by: Will Deacon 

Will


Re: (subset) Re: [PATCH v2 0/9] arm64: rework NEON yielding to avoid scheduling from asm code

2021-02-04 Thread Will Deacon
On Thu, Feb 04, 2021 at 10:10:26PM +1100, Herbert Xu wrote:
> On Thu, Feb 04, 2021 at 09:29:16AM +0100, Ard Biesheuvel wrote:
> >
> > Half of the patches in this series conflict with
> > 
> > 0df07d8117c3 crypto: arm64/sha - add missing module aliases
> > 
> > in the cryptodev tree, so that won't work.
> 
> Fair enough.  I'll take the patches.

Cheers, Herbert. Please just leave the final patch ("arm64: assembler:
remove conditional NEON yield macro"), as we can come back to that for
5.13.

Will


Re: (subset) Re: [PATCH v2 0/9] arm64: rework NEON yielding to avoid scheduling from asm code

2021-02-04 Thread Will Deacon
On Wed, Feb 03, 2021 at 09:31:31PM +, Will Deacon wrote:
> On Wed, 3 Feb 2021 12:36:17 +0100, Ard Biesheuvel wrote:
> > Given how kernel mode NEON code disables preemption (to ensure that the
> > FP/SIMD register state is protected without having to context switch it),
> > we need to take care not to let those algorithms operate on unbounded
> > input data, or we may end up with excessive scheduling blackouts on
> > CONFIG_PREEMPT kernels.
> > 
> > This is currently handled by the cond_yield_neon macros, which check the
> > preempt count and the TIF_NEED_RESCHED flag from assembler code, and call
> > into kernel_neon_end()+kernel_neon_begin(), triggering a reschedule.
> > This works as expected, but is a bit messy, given how much of the state
> > preserve/restore code in the algorithm needs to be duplicated, as well as
> > causing the need to manage the stack frame explicitly. All of this is better
> > handled by the compiler, especially now that we have enabled features such
> > as the shadow call stack and BTI, and are working to improve call stack
> > validation.
> > 
> > [...]
> 
> Applied first patch only to arm64 (for-next/crypto), thanks!

Oops, looks like I typo'd the external branch (for-next/crypo). No offense
intended! I'll rename it now; SHAs will stay the same.

Will


(subset) Re: [PATCH v2 0/9] arm64: rework NEON yielding to avoid scheduling from asm code

2021-02-03 Thread Will Deacon
On Wed, 3 Feb 2021 12:36:17 +0100, Ard Biesheuvel wrote:
> Given how kernel mode NEON code disables preemption (to ensure that the
> FP/SIMD register state is protected without having to context switch it),
> we need to take care not to let those algorithms operate on unbounded
> input data, or we may end up with excessive scheduling blackouts on
> CONFIG_PREEMPT kernels.
> 
> This is currently handled by the cond_yield_neon macros, which check the
> preempt count and the TIF_NEED_RESCHED flag from assembler code, and call
> into kernel_neon_end()+kernel_neon_begin(), triggering a reschedule.
> This works as expected, but is a bit messy, given how much of the state
> preserve/restore code in the algorithm needs to be duplicated, as well as
> causing the need to manage the stack frame explicitly. All of this is better
> handled by the compiler, especially now that we have enabled features such
> as the shadow call stack and BTI, and are working to improve call stack
> validation.
> 
> [...]

Applied first patch only to arm64 (for-next/crypto), thanks!

[1/9] arm64: assembler: add cond_yield macro
  https://git.kernel.org/arm64/c/d13c613f136c

This is the only patch on the branch, so I'm happy for it to be pulled
into the crypto tree too if it enables some of the other patches to land
in 5.12.

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev


Re: [PATCH 1/9] arm64: assembler: add cond_yield macro

2021-01-28 Thread Will Deacon
On Thu, Jan 28, 2021 at 08:24:01PM +, Will Deacon wrote:
> On Thu, Jan 28, 2021 at 02:06:17PM +0100, Ard Biesheuvel wrote:
> > Add a macro cond_yield that branches to a specified label when called if
> > the TIF_NEED_RESCHED flag is set and decreasing the preempt count would
> > make the task preemptible again, resulting in a schedule to occur. This
> > can be used by kernel mode SIMD code that keeps a lot of state in SIMD
> > registers, which would make chunking the input in order to perform the
> > cond_resched() check from C code disproportionately costly.
> > 
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  arch/arm64/include/asm/assembler.h | 16 
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/assembler.h 
> > b/arch/arm64/include/asm/assembler.h
> > index bf125c591116..5f977a7c6b43 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -745,6 +745,22 @@ USER(\label, icivau, \tmp2)
> > // invalidate I line PoU
> >  .Lyield_out_\@ :
> > .endm
> >  
> > +   /*
> > +* Check whether preempt-disabled code should yield as soon as it
> > +* is able. This is the case if re-enabling preemption a single
> > +* time results in a preempt count of zero, and the TIF_NEED_RESCHED
> > +* flag is set. (Note that the latter is stored negated in the
> > +* top word of the thread_info::preempt_count field)
> > +*/
> > +   .macro  cond_yield, lbl:req, tmp:req
> > +#ifdef CONFIG_PREEMPTION
> > +   get_current_task \tmp
> > +   ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
> > +   cmp \tmp, #PREEMPT_DISABLE_OFFSET
> > +   beq \lbl
> 
> Fancy that, I didn't know the '.' was optional in "b.eq"!
> 
> Anyway, a very similar code sequence exists inside if_will_cond_yield_neon,
> only it doesn't touch the flags. Can we use that sequence instead, and then
> use the new macro from there?

... and now I noticed the last patch :)

But it would still be nice not to clobber the flags inside the macro.

Will


Re: [PATCH 1/9] arm64: assembler: add cond_yield macro

2021-01-28 Thread Will Deacon
On Thu, Jan 28, 2021 at 02:06:17PM +0100, Ard Biesheuvel wrote:
> Add a macro cond_yield that branches to a specified label when called if
> the TIF_NEED_RESCHED flag is set and decreasing the preempt count would
> make the task preemptible again, resulting in a schedule to occur. This
> can be used by kernel mode SIMD code that keeps a lot of state in SIMD
> registers, which would make chunking the input in order to perform the
> cond_resched() check from C code disproportionately costly.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/assembler.h | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index bf125c591116..5f977a7c6b43 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -745,6 +745,22 @@ USER(\label, ic  ivau, \tmp2)// 
> invalidate I line PoU
>  .Lyield_out_\@ :
>   .endm
>  
> + /*
> +  * Check whether preempt-disabled code should yield as soon as it
> +  * is able. This is the case if re-enabling preemption a single
> +  * time results in a preempt count of zero, and the TIF_NEED_RESCHED
> +  * flag is set. (Note that the latter is stored negated in the
> +  * top word of the thread_info::preempt_count field)
> +  */
> + .macro  cond_yield, lbl:req, tmp:req
> +#ifdef CONFIG_PREEMPTION
> + get_current_task \tmp
> + ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
> + cmp \tmp, #PREEMPT_DISABLE_OFFSET
> + beq \lbl

Fancy that, I didn't know the '.' was optional in "b.eq"!

Anyway, a very similar code sequence exists inside if_will_cond_yield_neon,
only it doesn't touch the flags. Can we use that sequence instead, and then
use the new macro from there?

Will


Re: [BUG][PATCH] crypto: arm64: Avoid indirect branch to bti_c

2020-10-06 Thread Will Deacon
On Tue, Oct 06, 2020 at 11:01:21AM +0100, Dave Martin wrote:
> On Tue, Oct 06, 2020 at 09:27:48AM +0100, Will Deacon wrote:
> > On Mon, Oct 05, 2020 at 10:48:54PM -0500, Jeremy Linton wrote:
> > > The AES code uses a 'br x7' as part of a function called by
> > > a macro. That branch needs a bti_j as a target. This results
> > > in a panic as seen below. Instead of trying to replace the branch
> > > target with a bti_jc, lets replace the indirect branch with a
> > > bl/ret, bl sequence that can target the existing bti_c.
> > > 
> > >   Bad mode in Synchronous Abort handler detected on CPU1, code 0x3403 
> > > -- BTI
> > >   CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 
> > > 5.8.11-300.fc33.aarch64 #1
> > >   pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-)
> > >   pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
> > >   lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs]
> > >   sp : 80001052b730
> > > 
> > >   aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
> > >__xts_crypt+0xb0/0x2dc [aes_neon_bs]
> > >xts_encrypt+0x28/0x3c [aes_neon_bs]
> > >   crypto_skcipher_encrypt+0x50/0x84
> > >   simd_skcipher_encrypt+0xc8/0xe0
> > >   crypto_skcipher_encrypt+0x50/0x84
> > >   test_skcipher_vec_cfg+0x224/0x5f0
> > >   test_skcipher+0xbc/0x120
> > >   alg_test_skcipher+0xa0/0x1b0
> > >   alg_test+0x3dc/0x47c
> > >   cryptomgr_test+0x38/0x60
> > > 
> > > Fixes: commit 0e89640b640d ("crypto: arm64 - Use modern annotations for 
> > > assembly functions")
> > 
> > nit: the "commit" string shouldn't be here, and I think the linux-next
> > scripts will yell at us if we don't remove it.
> > 
> > > Signed-off-by: Jeremy Linton 
> > > ---
> > >  arch/arm64/crypto/aes-neonbs-core.S | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/crypto/aes-neonbs-core.S 
> > > b/arch/arm64/crypto/aes-neonbs-core.S
> > > index b357164379f6..32f53ebe5e2c 100644
> > > --- a/arch/arm64/crypto/aes-neonbs-core.S
> > > +++ b/arch/arm64/crypto/aes-neonbs-core.S
> > > @@ -788,7 +788,7 @@ SYM_FUNC_START_LOCAL(__xts_crypt8)
> > >  
> > >  0:   mov bskey, x21
> > >   mov rounds, x22
> > > - br  x7
> > > + ret
> 
> Dang, replied on an old version.

They're not versioned, so who knows which one is older!

> Since this is logically a tail call, could we simply be using br x16 or
> br x17 for this?

Yup, that would work too. This was just "obviously correct" and it would
be nice to get a fix in for 5.9.

Will


Re: [BUG][PATCH] crypto: arm64: Avoid indirect branch to bti_c

2020-10-06 Thread Will Deacon
On Mon, Oct 05, 2020 at 10:48:54PM -0500, Jeremy Linton wrote:
> The AES code uses a 'br x7' as part of a function called by
> a macro. That branch needs a bti_j as a target. This results
> in a panic as seen below. Instead of trying to replace the branch
> target with a bti_jc, lets replace the indirect branch with a
> bl/ret, bl sequence that can target the existing bti_c.
> 
>   Bad mode in Synchronous Abort handler detected on CPU1, code 0x3403 -- 
> BTI
>   CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1
>   pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-)
>   pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
>   lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs]
>   sp : 80001052b730
> 
>   aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
>__xts_crypt+0xb0/0x2dc [aes_neon_bs]
>xts_encrypt+0x28/0x3c [aes_neon_bs]
>   crypto_skcipher_encrypt+0x50/0x84
>   simd_skcipher_encrypt+0xc8/0xe0
>   crypto_skcipher_encrypt+0x50/0x84
>   test_skcipher_vec_cfg+0x224/0x5f0
>   test_skcipher+0xbc/0x120
>   alg_test_skcipher+0xa0/0x1b0
>   alg_test+0x3dc/0x47c
>   cryptomgr_test+0x38/0x60
> 
> Fixes: commit 0e89640b640d ("crypto: arm64 - Use modern annotations for 
> assembly functions")

nit: the "commit" string shouldn't be here, and I think the linux-next
scripts will yell at us if we don't remove it.

> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/crypto/aes-neonbs-core.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/crypto/aes-neonbs-core.S 
> b/arch/arm64/crypto/aes-neonbs-core.S
> index b357164379f6..32f53ebe5e2c 100644
> --- a/arch/arm64/crypto/aes-neonbs-core.S
> +++ b/arch/arm64/crypto/aes-neonbs-core.S
> @@ -788,7 +788,7 @@ SYM_FUNC_START_LOCAL(__xts_crypt8)
>  
>  0:   mov bskey, x21
>   mov rounds, x22
> - br  x7
> + ret
>  SYM_FUNC_END(__xts_crypt8)
>  
>   .macro  __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
> @@ -806,8 +806,8 @@ SYM_FUNC_END(__xts_crypt8)
>   uzp1v30.4s, v30.4s, v25.4s
>   ld1 {v25.16b}, [x24]
>  
> -99:  adr x7, \do8
> - bl  __xts_crypt8
> +99:  bl      __xts_crypt8
> + bl  \do8
>  
>   ldp q16, q17, [sp, #.Lframe_local_offset]
>   ldp q18, q19, [sp, #.Lframe_local_offset + 32]

Acked-by: Will Deacon 

Catalin -- can you pick this for 5.9 please?

Will


Re: [PATCH] crypto: arm64: Use PTR_ERR_OR_ZERO rather than its implementation.

2019-09-04 Thread Will Deacon
On Tue, Sep 03, 2019 at 02:54:16PM +0800, zhong jiang wrote:
> PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better to
> use it directly. hence just replace it.
> 
> Signed-off-by: zhong jiang 
> ---
>  arch/arm64/crypto/aes-glue.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
> index ca0c84d..2a2e0a3 100644
> --- a/arch/arm64/crypto/aes-glue.c
> +++ b/arch/arm64/crypto/aes-glue.c
> @@ -409,10 +409,8 @@ static int essiv_cbc_init_tfm(struct crypto_skcipher 
> *tfm)
>   struct crypto_aes_essiv_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
>  
>   ctx->hash = crypto_alloc_shash("sha256", 0, 0);
> - if (IS_ERR(ctx->hash))
> - return PTR_ERR(ctx->hash);
>  
> - return 0;
> + return PTR_ERR_OR_ZERO(ctx->hash);
>  }
>  
>  static void essiv_cbc_exit_tfm(struct crypto_skcipher *tfm)

Acked-by: Will Deacon 

Assuming this will go via Herbert.

Will


Re: [PATCH 0/3] Add support for Graviton TRNG

2019-07-01 Thread Will Deacon
[+Marc]

On Mon, Jul 01, 2019 at 09:28:06AM +0100, Will Deacon wrote:
> [Note: this was in my spam folder]
> 
> On Fri, Jun 28, 2019 at 06:05:10PM +, Saidi, Ali wrote:
> > On 6/7/19, 7:59 AM, " Ali Saidi"  wrote:
> > On 6/5/19, 7:20 AM, "Will Deacon"  wrote:
> > On Tue, Jun 04, 2019 at 08:30:57PM +, Ali Saidi wrote:
> > > AWS Graviton based systems provide an Arm SMC call in the vendor 
> > defined
> > > hypervisor region to read random numbers from a HW TRNG and 
> > return them to the
> > > guest. 
> > > 
> > > We've observed slower guest boot and especially reboot times due 
> > to lack of
> > > entropy and providing access to a TRNG is meant to address this. 
> > 
> > Curious, but why this over something like virtio-rng?
> > 
> > This interface allows us to provide the functionality from both EL2
> > and EL3 and support multiple different types of our instances which we
> > unfortunately can't do with virt-io.
> > 
> > Any additional comments?
> > Do you know when you'll have a chance to rebase arm64/smccc-cleanup?
> 
> Sorry, Ali, this slipped through the cracks. Marc and I will chat today and
> look at respinning what we had before; it should then hopefully be
> straightforward enough for you to take that as a base for what you want to
> do.

Ok, I hacked on this a bit today and hopefully you can use this as a
starting point:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/hvc

I haven't given it any real testing, so apologies for the bugs.

Will


Re: [PATCH 0/3] Add support for Graviton TRNG

2019-07-01 Thread Will Deacon
[Note: this was in my spam folder]

On Fri, Jun 28, 2019 at 06:05:10PM +, Saidi, Ali wrote:
> On 6/7/19, 7:59 AM, " Ali Saidi"  wrote:
> On 6/5/19, 7:20 AM, "Will Deacon"  wrote:
> On Tue, Jun 04, 2019 at 08:30:57PM +, Ali Saidi wrote:
> > AWS Graviton based systems provide an Arm SMC call in the vendor 
> defined
> > hypervisor region to read random numbers from a HW TRNG and return 
> them to the
> > guest. 
> > 
> > We've observed slower guest boot and especially reboot times due to 
> lack of
> > entropy and providing access to a TRNG is meant to address this. 
> 
> Curious, but why this over something like virtio-rng?
> 
> This interface allows us to provide the functionality from both EL2
> and EL3 and support multiple different types of our instances which we
> unfortunately can't do with virt-io.
> 
> Any additional comments?
> Do you know when you'll have a chance to rebase arm64/smccc-cleanup?

Sorry, Ali, this slipped through the cracks. Marc and I will chat today and
look at respinning what we had before; it should then hopefully be
straightforward enough for you to take that as a base for what you want to
do.

Will


Re: [PATCH 0/3] Add support for Graviton TRNG

2019-06-05 Thread Will Deacon
On Tue, Jun 04, 2019 at 08:30:57PM +, Ali Saidi wrote:
> AWS Graviton based systems provide an Arm SMC call in the vendor defined
> hypervisor region to read random numbers from a HW TRNG and return them to the
> guest. 
> 
> We've observed slower guest boot and especially reboot times due to lack of
> entropy and providing access to a TRNG is meant to address this. 

Curious, but why this over something like virtio-rng?

Will


Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Will Deacon
On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote:
> There are cases where the whole feature, for instance arm64/lse or
> arm/crypto, can depend on assembler. Current practice is to report
> buildtime that selected feature is not supported, which can be quite
> annoying...

Why is it annoying? You still end up with a working kernel.

> It'd nicer if we can check assembler first and opt-in feature
> visibility in Kconfig.
> 
> Cc: Masahiro Yamada 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: Ard Biesheuvel 
> Signed-off-by: Vladimir Murzin 
> ---
>  scripts/Kconfig.include | 4 
>  1 file changed, 4 insertions(+)

One issue I have with doing the check like this is that if somebody sends
you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel
using that .config and an old toolchain, the option is silently dropped.

I think the diagnostic is actually useful in this case.

Will


Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions

2018-09-04 Thread Will Deacon
On Tue, Sep 04, 2018 at 11:18:55AM +0800, Herbert Xu wrote:
> On Tue, Aug 28, 2018 at 08:43:35PM +0200, Ard Biesheuvel wrote:
> > On 28 August 2018 at 19:01, Will Deacon  wrote:
> > > On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> > >> Add a CRC32 feature bit and wire it up to the CPU id register so we
> > >> will be able to use alternatives patching for CRC32 operations.
> > >>
> > >> Signed-off-by: Ard Biesheuvel 
> > >> ---
> > >>  arch/arm64/include/asm/cpucaps.h | 3 ++-
> > >>  arch/arm64/kernel/cpufeature.c   | 9 +
> > >>  2 files changed, 11 insertions(+), 1 deletion(-)
> > >
> > > Acked-by: Will Deacon 
> > >
> > > With the minor caveat below...
> > >
> > >> diff --git a/arch/arm64/include/asm/cpucaps.h 
> > >> b/arch/arm64/include/asm/cpucaps.h
> > >> index ae1f70450fb2..9932aca9704b 100644
> > >> --- a/arch/arm64/include/asm/cpucaps.h
> > >> +++ b/arch/arm64/include/asm/cpucaps.h
> > >> @@ -51,7 +51,8 @@
> > >>  #define ARM64_SSBD   30
> > >>  #define ARM64_MISMATCHED_CACHE_TYPE  31
> > >>  #define ARM64_HAS_STAGE2_FWB 32
> > >> +#define ARM64_HAS_CRC32  33
> > >>
> > >> -#define ARM64_NCAPS  33
> > >> +#define ARM64_NCAPS  34
> > >
> > >
> > > ... if this goes via crypto, you'll almost certainly get a (trivial)
> > > conflict with arm64, since these numbers get bumped all the time.
> > >
> > 
> > I think the first three patches should go through the arm64 tree. The
> > last one just removes the now redundant crc32 SIMD driver, and Herbert
> > could pick that up separately, i.e., it should be totally independent.
> 
> Yes let's do that.

Okey doke! In which case, please can we have your Ack on the first patch?

Cheers,

Will


Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions

2018-08-28 Thread Will Deacon
On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> Add a CRC32 feature bit and wire it up to the CPU id register so we
> will be able to use alternatives patching for CRC32 operations.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/cpucaps.h | 3 ++-
>  arch/arm64/kernel/cpufeature.c   | 9 +
>  2 files changed, 11 insertions(+), 1 deletion(-)

Acked-by: Will Deacon 

With the minor caveat below...

> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index ae1f70450fb2..9932aca9704b 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -51,7 +51,8 @@
>  #define ARM64_SSBD   30
>  #define ARM64_MISMATCHED_CACHE_TYPE  31
>  #define ARM64_HAS_STAGE2_FWB 32
> +#define ARM64_HAS_CRC32  33
>  
> -#define ARM64_NCAPS  33
> +#define ARM64_NCAPS  34


... if this goes via crypto, you'll almost certainly get a (trivial)
conflict with arm64, since these numbers get bumped all the time.

Will

>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index e238b7932096..7626b80128f5 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1222,6 +1222,15 @@ static const struct arm64_cpu_capabilities 
> arm64_features[] = {
>   .cpu_enable = cpu_enable_hw_dbm,
>   },
>  #endif
> + {
> + .desc = "CRC32 instructions",
> + .capability = ARM64_HAS_CRC32,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_cpuid_feature,
> + .sys_reg = SYS_ID_AA64ISAR0_EL1,
> + .field_pos = ID_AA64ISAR0_CRC32_SHIFT,
> + .min_field_value = 1,
> + },
>   {},
>  };
>  
> -- 
> 2.18.0
> 


Re: [PATCH] arm64/crypto: remove non-standard notation

2018-08-20 Thread Will Deacon
On Mon, Aug 20, 2018 at 03:40:32PM -0700, Nick Desaulniers wrote:
> It seems that:
> ldr q8, =0x300020001
> 
> is a GNU as convience notation for:
> ldr q8, .Lconstant
> .Lconstant
> .word 0x0001
> .word 0x0002
> .word 0x0003
> .word 0x

Does still this work correctly for a big-endian build?

Will


Re: [PATCH] crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair

2018-07-31 Thread Will Deacon
On Tue, Jul 31, 2018 at 09:22:52AM +0200, Ard Biesheuvel wrote:
> (+ Catalin, Will)
> 
> On 27 July 2018 at 14:59, Ard Biesheuvel  wrote:
> > Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
> > kernel_neon_end() to be used since the routine touches the NEON
> > register file. Add the missing calls.
> >
> > Also, since NEON register contents are not preserved outside of
> > a kernel mode NEON region, pass the key schedule array again.
> >
> > Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  arch/arm64/crypto/ghash-ce-glue.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/crypto/ghash-ce-glue.c 
> > b/arch/arm64/crypto/ghash-ce-glue.c
> > index 7cf0b1aa6ea8..8a10f1d7199a 100644
> > --- a/arch/arm64/crypto/ghash-ce-glue.c
> > +++ b/arch/arm64/crypto/ghash-ce-glue.c
> > @@ -488,9 +488,13 @@ static int gcm_decrypt(struct aead_request *req)
> > err = skcipher_walk_done(&walk,
> >  walk.nbytes % 
> > AES_BLOCK_SIZE);
> > }
> > -   if (walk.nbytes)
> > -   pmull_gcm_encrypt_block(iv, iv, NULL,
> > +   if (walk.nbytes) {
> > +   kernel_neon_begin();
> > +   pmull_gcm_encrypt_block(iv, iv, 
> > ctx->aes_key.key_enc,
> > num_rounds(&ctx->aes_key));
> > +   kernel_neon_end();
> > +   }
> > +
> > } else {
> > __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
> > num_rounds(&ctx->aes_key));
> > --
> > 2.18.0
> >
> 
> This fixes a rather nasty bug in the AES-GCM code: failing to call
> kernel_neon_begin()/_end() may clobber the NEON register state of
> unrelated userland processes.
> 
> Could we please get this queued before v4.18 is released? Thanks.

I can take this via the arm64 tree if Herbert is ok with that.

Herbert?

Will


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-26 Thread Will Deacon
On Fri, Jan 26, 2018 at 12:05:42PM +0300, Yury Norov wrote:
> On Wed, Jan 24, 2018 at 10:22:13AM +0000, Will Deacon wrote:
> > On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> > > This series adds API for 128-bit memory IO access and enables it for 
> > > ARM64.
> > > The original motivation for 128-bit API came from new Cavium network 
> > > device
> > > driver. The hardware requires 128-bit access to make things work. See
> > > description in patch 3 for details.
> > > 
> > > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > > API for 128-bit access would be helpful in core arm64 code.
> > 
> > Only for normal, cacheable memory, so they're not suitable for IO accesses
> > as you're proposing here.
> 
> Hi Will,
> 
> Thanks for clarification.
> 
> Could you elaborate, do you find 128-bit read/write API useless, or
> you just correct my comment?
> 
> I think that ordered uniform 128-bit access API would be helpful, even
> if not atomic.

Sorry, but I strongly disagree here. Having an IO accessor that isn't
guaranteed to be atomic is a recipe for disaster if it's not called out
explicitly. You're much better off implementing something along the lines
of  using 2x64-bit accessors like we already
have for the 2x32-bit case.

However, that doesn't solve your problem and is somewhat of a distraction.
I'd suggest that in your case, where you have a device that relies on
128-bit atomic access that is assumedly tightly integrated into your SoC,
then the driver just codes it's own local implementation of the accessor,
given that there isn't a way to guarantee the atomicity architecturally
(and even within your SoC it might not be atomic to all endpoints).

Will


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Will Deacon
On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> This series adds API for 128-bit memory IO access and enables it for ARM64.
> The original motivation for 128-bit API came from new Cavium network device
> driver. The hardware requires 128-bit access to make things work. See
> description in patch 3 for details.
> 
> Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> API for 128-bit access would be helpful in core arm64 code.

Only for normal, cacheable memory, so they're not suitable for IO accesses
as you're proposing here.

Will


Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-01-27 Thread Will Deacon
On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote:
> On 27 January 2017 at 10:40, Matthias Brugger  wrote:
> > Older compilers may not be able to detect the crc32 extended cpu type.
> 
> What do you mean 'detect'? Could you describe the failure in more detail
> please?
> 
> > Anyway only inline assembler code is used, which gets passed to the
> > assembler. This patch moves the crc detection to the assembler.
> >
> > Suggested-by: Alexander Graf 
> > Signed-off-by: Matthias Brugger 
> > ---
> >  arch/arm64/crypto/Makefile  | 2 --
> >  arch/arm64/crypto/crc32-arm64.c | 3 +++
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> > index aad7b744..0d779dac75cd 100644
> > --- a/arch/arm64/crypto/Makefile
> > +++ b/arch/arm64/crypto/Makefile
> > @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o  := -DUSE_V8_CRYPTO_EXTENSIONS
> >
> >  obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o
> >
> > -CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
> > -
> >  $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
> > $(call if_changed_rule,cc_o_c)
> >
> > diff --git a/arch/arm64/crypto/crc32-arm64.c 
> > b/arch/arm64/crypto/crc32-arm64.c
> > index 6a37c3c6b11d..10f5dd075323 100644
> > --- a/arch/arm64/crypto/crc32-arm64.c
> > +++ b/arch/arm64/crypto/crc32-arm64.c
> > @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam ");
> >  MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions");
> >  MODULE_LICENSE("GPL v2");
> >
> > +/* Request crc extension capabilities from the assembler */
> > +asm(".arch_extension crc");
> > +
> 
> Will should confirm, but I think this is a recent feature in GAS for
> AArch64, so this may break older toolchains as well.

Yes, the .arch_extension directive isn't universally supported by AArch64
gas so we can't rely on it unconditionally. The best bet is to check for
the support and, if it's not present, then disable whatever feature relies
on it. See the lseinstr variable in Makefile.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Will Deacon
On Mon, Nov 28, 2016 at 02:17:34PM +0100, Ard Biesheuvel wrote:
> On 28 November 2016 at 13:05, Will Deacon  wrote:
> > On Sun, Nov 20, 2016 at 11:42:01AM +, Ard Biesheuvel wrote:
> >> This integrates both the accelerated scalar and the NEON implementations
> >> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
> >>
> >> Relative performance compared to the respective generic C versions:
> >>
> >>  |  SHA256-scalar  | SHA256-NEON* |  SHA512  |
> >>  +-+--+--+
> >>  Cortex-A53  |  1.63x  | 1.63x|   2.34x  |
> >>  Cortex-A57  |  1.43x  | 1.59x|   1.95x  |
> >>  Cortex-A73  |  1.26x  | 1.56x| ?|
> >>
> >> The core crypto code was authored by Andy Polyakov of the OpenSSL
> >> project, in collaboration with whom the upstream code was adapted so
> >> that this module can be built from the same version of sha512-armv8.pl.
> >>
> >> The version in this patch was taken from OpenSSL commit 32bbb62ea634
> >> ("sha/asm/sha512-armv8.pl: fix big-endian support in __KERNEL__ case.")
> >>
> >> * The core SHA algorithm is fundamentally sequential, but there is a
> >>   secondary transformation involved, called the schedule update, which
> >>   can be performed independently. The NEON version of SHA-224/SHA-256
> >>   only implements this part of the algorithm using NEON instructions,
> >>   the sequential part is always done using scalar instructions.
> >>
> >> Signed-off-by: Ard Biesheuvel 
> >> ---
> >>  arch/arm64/crypto/Kconfig   |8 +
> >>  arch/arm64/crypto/Makefile  |   17 +
> >>  arch/arm64/crypto/sha256-core.S_shipped | 2061 
> >>  arch/arm64/crypto/sha256-glue.c |  185 ++
> >>  arch/arm64/crypto/sha512-armv8.pl   |  778 
> >>  arch/arm64/crypto/sha512-core.S_shipped | 1085 +++
> >>  arch/arm64/crypto/sha512-glue.c |   94 +
> >>  7 files changed, 4228 insertions(+)
> >
> > If I build a kernel with this applied and CRYPTO_SHA{256,512}_ARM64=y,
> > then I end up with untracked .S files according to git:
> >
> > $ git status
> > Untracked files:
> > arch/arm64/crypto/sha256-core.S
> > arch/arm64/crypto/sha512-core.S
> >
> 
> Ah right, I forgot to add a .gitignore for these: that is required
> with .S_shipped files. I didn't spot this myself because I always
> build out of tree
> 
> Would you mind taking a separate patch for that?

I think this should all go via herbert, so I guess just send him the extra
patch.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-28 Thread Will Deacon
On Sun, Nov 20, 2016 at 11:42:01AM +, Ard Biesheuvel wrote:
> This integrates both the accelerated scalar and the NEON implementations
> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
> 
> Relative performance compared to the respective generic C versions:
> 
>  |  SHA256-scalar  | SHA256-NEON* |  SHA512  |
>  +-+--+--+
>  Cortex-A53  |  1.63x  | 1.63x|   2.34x  |
>  Cortex-A57  |  1.43x  | 1.59x|   1.95x  |
>  Cortex-A73  |  1.26x  | 1.56x| ?|
> 
> The core crypto code was authored by Andy Polyakov of the OpenSSL
> project, in collaboration with whom the upstream code was adapted so
> that this module can be built from the same version of sha512-armv8.pl.
> 
> The version in this patch was taken from OpenSSL commit 32bbb62ea634
> ("sha/asm/sha512-armv8.pl: fix big-endian support in __KERNEL__ case.")
> 
> * The core SHA algorithm is fundamentally sequential, but there is a
>   secondary transformation involved, called the schedule update, which
>   can be performed independently. The NEON version of SHA-224/SHA-256
>   only implements this part of the algorithm using NEON instructions,
>   the sequential part is always done using scalar instructions.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/crypto/Kconfig   |8 +
>  arch/arm64/crypto/Makefile  |   17 +
>  arch/arm64/crypto/sha256-core.S_shipped | 2061 
>  arch/arm64/crypto/sha256-glue.c |  185 ++
>  arch/arm64/crypto/sha512-armv8.pl   |  778 
>  arch/arm64/crypto/sha512-core.S_shipped | 1085 +++
>  arch/arm64/crypto/sha512-glue.c |   94 +
>  7 files changed, 4228 insertions(+)

If I build a kernel with this applied and CRYPTO_SHA{256,512}_ARM64=y,
then I end up with untracked .S files according to git:

$ git status
Untracked files:
arch/arm64/crypto/sha256-core.S
arch/arm64/crypto/sha512-core.S

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-12 Thread Will Deacon
Hi Ard,

On Sat, Nov 12, 2016 at 01:32:33PM +0100, Ard Biesheuvel wrote:
> This integrates both the accelerated scalar and the NEON implementations
> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
> 
> Relative performance compared to the respective generic C versions:
> 
>  |  SHA256-scalar  | SHA256-NEON* |  SHA512  |
>  +-+--+--+
>  Cortex-A53  |  1.63x  | 1.63x|   2.34x  |
>  Cortex-A57  |  1.43x  | 1.59x|   1.95x  |
>  Cortex-A73  |  1.26x  | 1.56x| ?|
> 
> The core crypto code was authored by Andy Polyakov of the OpenSSL
> project, in collaboration with whom the upstream code was adapted so
> that this module can be built from the same version of sha512-armv8.pl.
> 
> The version in this patch was taken from OpenSSL commit
> 
>866e505e0d66 sha/asm/sha512-armv8.pl: add NEON version of SHA256.
> 
> * The core SHA algorithm is fundamentally sequential, but there is a
>   secondary transformation involved, called the schedule update, which
>   can be performed independently. The NEON version of SHA-224/SHA-256
>   only implements this part of the algorithm using NEON instructions,
>   the sequential part is always done using scalar instructions.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> v3: at Will's request, the generated assembly files are now included
> as .S_shipped files, for which generic build rules are defined
> already. Note that this has caused issues in the past with
> patchwork, so for Herbert's convenience, the patch can be pulled
> from http://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git,
> branch arm64-sha256 (based on today's cryptodev)

Thanks.

Looking at the generated code, I see references to __ARMEB__ and __ILP32__.
The former is probably a bug, whilst the second is not required. There are
also some commented out instructions, which is weird.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512

2016-11-11 Thread Will Deacon
On Fri, Nov 11, 2016 at 09:51:13PM +0800, Ard Biesheuvel wrote:
> This integrates both the accelerated scalar and the NEON implementations
> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
> 
> Relative performance compared to the respective generic C versions:
> 
>  |  SHA256-scalar  | SHA256-NEON* |  SHA512  |
>  +-+--+--+
>  Cortex-A53  |  1.63x  | 1.63x|   2.34x  |
>  Cortex-A57  |  1.43x  | 1.59x|   1.95x  |
>  Cortex-A73  |  1.26x  | 1.56x| ?|
> 
> The core crypto code was authored by Andy Polyakov of the OpenSSL
> project, in collaboration with whom the upstream code was adapted so
> that this module can be built from the same version of sha512-armv8.pl.
> 
> The version in this patch was taken from OpenSSL commit
> 
>866e505e0d66 sha/asm/sha512-armv8.pl: add NEON version of SHA256.
> 
> * The core SHA algorithm is fundamentally sequential, but there is a
>   secondary transformation involved, called the schedule update, which
>   can be performed independently. The NEON version of SHA-224/SHA-256
>   only implements this part of the algorithm using NEON instructions,
>   the sequential part is always done using scalar instructions.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> 
> This supersedes the SHA-256-NEON-only patch I sent out about 6 weeks ago.
> 
> Will, Catalin: note that this pulls in a .pl script, and adds a build rule
> locally in arch/arm64/crypto to generate .S files on the fly from Perl
> scripts. I will leave it to you to decide whether you are ok with this as
> is, or whether you prefer .S_shipped files, in which case the Perl script
> is only included as a reference (this is how we did it for arch/arm in the
> past, but given that it adds about 3000 lines of generated code to the patch,
> I think we may want to simply keep it as below)

I think we should include the shipped files too. 3000 lines isn't that much
in the grand scheme of things, and there will be people who complain about
the unconditional perl dependency.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-19 Thread Will Deacon
On Wed, Oct 19, 2016 at 09:49:46AM +0100, Ard Biesheuvel wrote:
> On 19 October 2016 at 09:46, Will Deacon  wrote:
> > On Wed, Oct 19, 2016 at 11:03:33AM +0800, Herbert Xu wrote:
> >> I was planning merging these for 4.10.  But I'm fine with them
> >> going through the arm tree.  Let me know what you guys want to
> >> do.
> >
> > I assumed you'd take them through crypto, as per usual, so I didn't
> > queue anything in the arm64 tree.
> >
> > Ard -- were you planning to get these in for 4.9?
> >
> 
> These are arguably bug fixes, but I spotted them by accident, they
> weren't reported to me or anything. But it seems strange to add a cc
> stable and then hold off until the next merge window.
> 
> In any case, I don't care deeply either way, as long as they get
> merged in the end. I think it makes sense to keep them together (arm64
> + ARM), so Herbert's tree is a more natural route for them to take. I
> will leave it up to Herbert whether they are sent onward as fixes or
> as part of v4.10

Sounds good to me.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] crypto: ARM/arm64 - big endian fixes

2016-10-19 Thread Will Deacon
On Wed, Oct 19, 2016 at 11:03:33AM +0800, Herbert Xu wrote:
> On Tue, Oct 18, 2016 at 01:14:38PM +0100, Ard Biesheuvel wrote:
> > On 18 October 2016 at 12:49, Catalin Marinas  
> > wrote:
> > > On Tue, Oct 11, 2016 at 07:15:12PM +0100, Ard Biesheuvel wrote:
> > >> As it turns out, none of the accelerated crypto routines under 
> > >> arch/arm64/crypto
> > >> currently work, or have ever worked correctly when built for big endian. 
> > >> So this
> > >> series fixes all of them. This v2 now includes a similar fix for 32-bit 
> > >> ARM as
> > >> well, and an additional fix for XTS which escaped my attention before.
> > >>
> > >> Each of these patches carries a fixes tag, and could be backported to 
> > >> stable.
> > >> However, for patches #1 and #5, the fixes tag denotes the oldest commit 
> > >> that the
> > >> fix is compatible with, not the patch that introduced the algorithm.
> > >
> > > I think for future reference, the Fixes tag should denote the commit
> > > that introduced the issue. An explicit Cc: stable tag would state how
> > > far back it should be applied.
> > >
> > 
> > OK, that sounds reasonable.
> > 
> > >> Ard Biesheuvel (8):
> > >>   crypto: arm64/aes-ce - fix for big endian
> > >>   crypto: arm64/ghash-ce - fix for big endian
> > >>   crypto: arm64/sha1-ce - fix for big endian
> > >>   crypto: arm64/sha2-ce - fix for big endian
> > >>   crypto: arm64/aes-ccm-ce: fix for big endian
> > >>   crypto: arm64/aes-neon - fix for big endian
> > >>   crypto: arm64/aes-xts-ce: fix for big endian
> > >>   crypto: arm/aes-ce - fix for big endian
> > >
> > > The changes look fine to me but I can't claim I fully understand these
> > > algorithms. FWIW:
> > >
> > > Acked-by: Catalin Marinas 
> > >
> > > (Will may pick them up for 4.9-rcX)
> > 
> > Thanks, although I was kind of expecting Herbert to pick these up,
> > given that #8 affects ARM not arm64.
> > 
> > But if you (or Will) can pick up #1 to #7, that is also fine, then I
> > can drop #8 into rmk's patch database.
> 
> I was planning merging these for 4.10.  But I'm fine with them
> going through the arm tree.  Let me know what you guys want to
> do.

I assumed you'd take them through crypto, as per usual, so I didn't
queue anything in the arm64 tree.

Ard -- were you planning to get these in for 4.9?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm/arm64/crypto: assure that ECB modes don't require an IV

2016-02-15 Thread Will Deacon
On Fri, Feb 12, 2016 at 06:00:01PM +0100, Ard Biesheuvel wrote:
> On 12 February 2016 at 16:47, Jeremy Linton  wrote:
> > ECB modes don't use an initialization vector. The kernel
> > /proc/crypto interface doesn't reflect this properly.
> >
> > Signed-off-by: Jeremy Linton 
> 
> Thanks for spotting that!
> 
> Acked-by: Ard Biesheuvel 

Thanks, I'll queue this for -rc5.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] arm64: add ioread64be and iowrite64be macros

2015-09-02 Thread Will Deacon
On Fri, Aug 28, 2015 at 12:49:47PM +0100, Horia Geantă wrote:
> This will allow device drivers to consistently use io{read,write}XXbe
> macros also for 64-bit accesses.
> 
> Signed-off-by: Alex Porosanu 
> Signed-off-by: Horia Geantă 
> ---
>  arch/arm64/include/asm/io.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 44be1e03ed65..9b6e408cfa51 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -174,13 +174,15 @@ extern void __iomem *ioremap_cache(phys_addr_t 
> phys_addr, size_t size);
>  #define iounmap  __iounmap
>  
>  /*
> - * io{read,write}{16,32}be() macros
> + * io{read,write}{16,32,64}be() macros
>   */
>  #define ioread16be(p)({ __u16 __v = be16_to_cpu((__force 
> __be16)__raw_readw(p)); __iormb(); __v; })
>  #define ioread32be(p)({ __u32 __v = be32_to_cpu((__force 
> __be32)__raw_readl(p)); __iormb(); __v; })
> +#define ioread64be(p)({ __u64 __v = be64_to_cpu((__force 
> __be64)__raw_readq(p)); __iormb(); __v; })
>  
>  #define iowrite16be(v,p) ({ __iowmb(); __raw_writew((__force 
> __u16)cpu_to_be16(v), p); })
>  #define iowrite32be(v,p) ({ __iowmb(); __raw_writel((__force 
> __u32)cpu_to_be32(v), p); })
> +#define iowrite64be(v,p) ({ __iowmb(); __raw_writeq((__force 
> __u64)cpu_to_be64(v), p); })

This looks like the first instance of this in the tree, so perhaps it's
also worth adding a version to asm-generic/io.h predicated on
CONFIG_64BIT?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm64 allmodconfig failure in expansion of macro ‘_X’

2015-06-26 Thread Will Deacon
On Fri, Jun 26, 2015 at 11:09:44AM +0100, Herbert Xu wrote:
> On Fri, Jun 26, 2015 at 11:08:05AM +0100, Will Deacon wrote:
> > Hi all,
> > 
> > arm64 allmodconfig fails to build with mainline due to the following:
> > 
> > 
> >   In file included from include/acpi/platform/aclinux.h:74:0,
> >from include/acpi/platform/acenv.h:173,
> >from include/acpi/acpi.h:56,
> >from include/linux/acpi.h:37,
> >from ./arch/arm64/include/asm/dma-mapping.h:21,
> >from include/linux/dma-mapping.h:86,
> >from include/linux/skbuff.h:34,
> >from include/crypto/algapi.h:18,
> >from crypto/asymmetric_keys/rsa.c:16:
> >   include/linux/ctype.h:15:12: error: expected ‘;’, ‘,’ or ‘)’ before 
> > numeric constant
> >#define _X 0x40 /* hex digit */
> >   ^
> >   crypto/asymmetric_keys/rsa.c:123:47: note: in expansion of macro ‘_X’
> >static int RSA_I2OSP(MPI x, size_t xLen, u8 **_X)
> >  ^
> >   crypto/asymmetric_keys/rsa.c: In function ‘RSA_verify_signature’:
> >   crypto/asymmetric_keys/rsa.c:256:2: error: implicit declaration of 
> > function ‘RSA_I2OSP’ [-Werror=implicit-function-declaration]
> > ret = RSA_I2OSP(m, k, &EM);
> > 
> > 
> > This is thanks to the following function type:
> > 
> >   static int RSA_I2OSP(MPI x, size_t xLen, u8 **__X)
> > 
> > conflicting with the _X #define in linux/ctype.h:
> > 
> >   #define _X  0x40/* hex digit */
> > 
> > Simple patch below, but solving this problem with more underscores feels
> > slightly inhumane.
> 
> Thanks but I've already merged a similar patch yesterday.

Perfect, sorry for the noise.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


arm64 allmodconfig failure in expansion of macro ‘_X’

2015-06-26 Thread Will Deacon
Hi all,

arm64 allmodconfig fails to build with mainline due to the following:


  In file included from include/acpi/platform/aclinux.h:74:0,
   from include/acpi/platform/acenv.h:173,
   from include/acpi/acpi.h:56,
   from include/linux/acpi.h:37,
   from ./arch/arm64/include/asm/dma-mapping.h:21,
   from include/linux/dma-mapping.h:86,
   from include/linux/skbuff.h:34,
   from include/crypto/algapi.h:18,
   from crypto/asymmetric_keys/rsa.c:16:
  include/linux/ctype.h:15:12: error: expected ‘;’, ‘,’ or ‘)’ before numeric 
constant
   #define _X 0x40 /* hex digit */
  ^
  crypto/asymmetric_keys/rsa.c:123:47: note: in expansion of macro ‘_X’
   static int RSA_I2OSP(MPI x, size_t xLen, u8 **_X)
 ^
  crypto/asymmetric_keys/rsa.c: In function ‘RSA_verify_signature’:
  crypto/asymmetric_keys/rsa.c:256:2: error: implicit declaration of function 
‘RSA_I2OSP’ [-Werror=implicit-function-declaration]
ret = RSA_I2OSP(m, k, &EM);


This is thanks to the following function type:

  static int RSA_I2OSP(MPI x, size_t xLen, u8 **__X)

conflicting with the _X #define in linux/ctype.h:

  #define _X  0x40/* hex digit */

Simple patch below, but solving this problem with more underscores feels
slightly inhumane.

Will

--->8

diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c
index 459cf97a75e2..a4e53239448b 100644
--- a/crypto/asymmetric_keys/rsa.c
+++ b/crypto/asymmetric_keys/rsa.c
@@ -120,7 +120,7 @@ static int RSAVP1(const struct public_key *key, MPI s, MPI 
*_m)
 /*
  * Integer to Octet String conversion [RFC3447 sec 4.1]
  */
-static int RSA_I2OSP(MPI x, size_t xLen, u8 **_X)
+static int RSA_I2OSP(MPI x, size_t xLen, u8 **__X)
 {
unsigned X_size, x_size;
int X_sign;
@@ -147,7 +147,7 @@ static int RSA_I2OSP(MPI x, size_t xLen, u8 **_X)
return -EBADMSG;
}
 
-   *_X = X;
+   *__X = X;
return 0;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64/crypto: issue aese/aesmc instructions in pairs

2015-03-17 Thread Will Deacon
On Tue, Mar 17, 2015 at 06:05:13PM +, Ard Biesheuvel wrote:
> This changes the AES core transform implementations to issue aese/aesmc
> (and aesd/aesimc) in pairs. This enables a micro-architectural optimization
> in recent Cortex-A5x cores that improves performance by 50-90%.
> 
> Measured performance in cycles per byte (Cortex-A57):
> 
> CBC enc CBC dec CTR
>   before3.641.341.32
>   after 1.950.850.93
> 
> Note that this results in a ~5% performance decrease for older cores.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
> 
> Will,
> 
> This is the optimization you yourself mentioned to me about a year ago
> (or even longer perhaps?) Anyway, we have now been able to confirm it
> on a sample 'in the wild', (i.e., a Galaxy S6 phone)

I barely remember one day to the next, but hey! I'll queue this for 4.1.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: crypto: increase AES interleave to 4x

2015-02-20 Thread Will Deacon
On Thu, Feb 19, 2015 at 05:25:16PM +, Ard Biesheuvel wrote:
> This patch increases the interleave factor for parallel AES modes
> to 4x. This improves performance on Cortex-A57 by ~35%. This is
> due to the 3-cycle latency of AES instructions on the A57's
> relatively deep pipeline (compared to Cortex-A53 where the AES
> instruction latency is only 2 cycles).
> 
> At the same time, disable inline expansion of the core AES functions,
> as the performance benefit of this feature is negligible.
> 
>   Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
> 
>   Baseline (2x interleave, inline expansion)
>   --
>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>   test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
>   test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
> 
>   This patch (4x interleave, no inline expansion)
>   ---
>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>   test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
>   test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds

Fine by me. Shall I queue this via the arm64 tree?

Will

> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/crypto/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index 5720608c50b1..abb79b3cfcfe 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>  
> -AFLAGS_aes-ce.o  := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
> +AFLAGS_aes-ce.o  := -DINTERLEAVE=4
>  AFLAGS_aes-neon.o:= -DINTERLEAVE=4
>  
>  CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
> -- 
> 1.8.3.2
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ (part1)

2014-07-01 Thread Will Deacon
Hi Mans,

On Tue, Jul 01, 2014 at 06:24:43PM +0100, Måns Rullgård wrote:
> Russell King - ARM Linux  writes:
> > As you point out, "bx lr" /may/ be treated specially (I've actually been
> 
> Most, if not all, Cortex-A cores do this according the public TRMs.
> They also do the same thing for "mov pc, lr" so there will probably be
> no performance gain from this change.  It's still a good idea though,
> since we don't know what future cores will do.

Funnily enough, that's not actually true (and is more or less what prompted
this patch after discussion with Russell). There are cores out there that
don't predict mov pc, lr at all (let alone do anything with the return
stack).

> > discussing this with Will Deacon over the last couple of days, who has
> > also been talking to the hardware people in ARM, and Will is happy with
> > this patch as in its current form.)  This is why I've changed all
> > "mov pc, reg" instructions which return in some way to use this macro,
> > and left others (those which are used to call some function and return
> > back to the same point) alone.
> 
> In that case the patch should be fine.  Your patch description didn't
> make it clear that only actual returns were being changed.

I'm led to believe that some predictors require lr in order to update the
return stack, whilst others don't. That part is all horribly
micro-architectural, so the current patch is doing the right thing by
sticking to the ARM ARM but enabling us to hook into other registers later
on if we choose.

Cheers,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes

2013-10-04 Thread Will Deacon
Hi Ard,

On Thu, Oct 03, 2013 at 10:59:23PM +0100, Ard Biesheuvel wrote:
> Note to reviewers:
> Reviewing the file aesbs-core.S may be a bit overwhelming, so if there are any
> questions or concerns, please refer the file bsaes-armv7.pl which can be found
> at the link below. This is the original Perl script that gets called by
> OpenSSL's build system during their build to generate the .S file on the fly.
> [In the case of OpenSSL, this is used in some cases to target different
> assemblers or ABIs]. This arrangement is not suitable (or required) for the
> kernel, so I have taken the generated .S file instead.
> 
> http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
> 
> This series still depends on commit a62b01cd (crypto: create generic version 
> of
> ablk_helper) which I omitted this time but which can be found in the cryptodev
> tree or in linux-next.

Why do you consider it unsuitable to ship the perl script with the kernel?
Perl 5 is already documented as a build dependency in Documentation/Changes
and I'd *much* rather the .S file was generated rather than shipped and
hacked. That amount of opaque assembly code for something like crypto feels 
really
dangerous from both a review and a maintenance perspective.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html