Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-22 Thread Alex Bennée

Dave Martin  writes:

> On Tue, Aug 22, 2017 at 05:21:19PM +0100, Alex Bennée wrote:
>>
>> Dave Martin  writes:
>
> [...]
>
>> > --- a/arch/arm64/include/asm/processor.h
>> > +++ b/arch/arm64/include/asm/processor.h
>> > @@ -85,6 +85,8 @@ struct thread_struct {
>> >unsigned long   tp2_value;
>> >  #endif
>> >struct fpsimd_state fpsimd_state;
>> > +  void*sve_state; /* SVE registers, if any */
>> > +  u16 sve_vl; /* SVE vector length */
>>
>> sve_vl is implicitly cast to unsigned int bellow - it should be
>> consistent.
>
> Agreed -- I think this can just be unsigned int here, which is the type
> I use everywhere except when encoding the vector length in the signal
> frame and ptrace data.
>
> During development, there was an additional flags field here (now
> merged into the thread_info flags).  u16 was big enough for the largest
> architectural vector length, so it seemed "tidy" to make both u16s.
> Elsewhere, this created a risk of overflow if you try to compute say
> the size of the whole register file from a u16, so I generally used
> unsigned int for local variables in functions that handle the vl.
>
>> Given the allocation functions rely on sve_vl being valid it might be
>> worth noting where this is set/live from?
>
> Agreed, I need to look more closely at exactly how to justify the
> soundness of this in order to thin out BUG_ONs.
>
> If you need any pointers, please ping me.  In the meantime, I would
> rather not colour your judgement by infecting you with my assumptions ;)
>
>> >unsigned long   fault_address;  /* fault info */
>> >unsigned long   fault_code; /* ESR_EL1 value */
>> >struct debug_info   debug;  /* debugging */
>> > diff --git a/arch/arm64/include/asm/thread_info.h 
>> > b/arch/arm64/include/asm/thread_info.h
>> > index 46c3b93..1a4b30b 100644
>> > --- a/arch/arm64/include/asm/thread_info.h
>> > +++ b/arch/arm64/include/asm/thread_info.h
>> 
>>
>> And I see there are other comments from Ard.
>
> Yup, I've already picked those up.
>
> I'll be posting a v2 with feedback applied once I've reviewed the
> existing BUG_ON()s -- should happen sometime this week.

We shall see how far I get tomorrow - you won't get any more from me
after that until I get back from holiday so don't delay v2 on my account
;-)

>
> Cheers
> ---Dave


--
Alex Bennée
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-22 Thread Dave Martin
On Tue, Aug 22, 2017 at 05:21:19PM +0100, Alex Bennée wrote:
> 
> Dave Martin  writes:

[...]

> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -85,6 +85,8 @@ struct thread_struct {
> > unsigned long   tp2_value;
> >  #endif
> > struct fpsimd_state fpsimd_state;
> > +   void*sve_state; /* SVE registers, if any */
> > +   u16 sve_vl; /* SVE vector length */
> 
> sve_vl is implicitly cast to unsigned int bellow - it should be
> consistent.

Agreed -- I think this can just be unsigned int here, which is the type
I use everywhere except when encoding the vector length in the signal
frame and ptrace data.

During development, there was an additional flags field here (now
merged into the thread_info flags).  u16 was big enough for the largest
architectural vector length, so it seemed "tidy" to make both u16s.
Elsewhere, this created a risk of overflow if you try to compute say
the size of the whole register file from a u16, so I generally used
unsigned int for local variables in functions that handle the vl.

> Given the allocation functions rely on sve_vl being valid it might be
> worth noting where this is set/live from?

Agreed, I need to look more closely at exactly how to justify the
soundness of this in order to thin out BUG_ONs.

If you need any pointers, please ping me.  In the meantime, I would
rather not colour your judgement by infecting you with my assumptions ;)

> > unsigned long   fault_address;  /* fault info */
> > unsigned long   fault_code; /* ESR_EL1 value */
> > struct debug_info   debug;  /* debugging */
> > diff --git a/arch/arm64/include/asm/thread_info.h 
> > b/arch/arm64/include/asm/thread_info.h
> > index 46c3b93..1a4b30b 100644
> > --- a/arch/arm64/include/asm/thread_info.h
> > +++ b/arch/arm64/include/asm/thread_info.h
> 
> 
> And I see there are other comments from Ard.

Yup, I've already picked those up.

I'll be posting a v2 with feedback applied once I've reviewed the
existing BUG_ON()s -- should happen sometime this week.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-22 Thread Alex Bennée

Dave Martin  writes:

> This patch adds the core support for switching and managing the SVE
> architectural state of user tasks.
>
> Calls to the existing FPSIMD low-level save/restore functions are
> factored out as new functions task_fpsimd_{save,load}(), since SVE
> now dynamically may or may not need to be handled at these points
> depending on the kernel configuration, hardware features discovered
> at boot, and the runtime state of the task.  To make these
> decisions as fast as possible, const cpucaps are used where
> feasible, via the system_supports_sve() helper.
>
> The SVE registers are only tracked for threads that have explicitly
> used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
> FPSIMD view of the architectural state is stored in
> thread.fpsimd_state as usual.
>
> When in use, the SVE registers are not stored directly in
> thread_struct due to their potentially large and variable size.
> Because the task_struct slab allocator must be configured very
> early during kernel boot, it is also tricky to configure it
> correctly to match the maximum vector length provided by the
> hardware, since this depends on examining secondary CPUs as well as
> the primary.  Instead, a pointer sve_state in thread_struct points
> to a dynamically allocated buffer containing the SVE register data,
> and code is added to allocate, duplicate and free this buffer at
> appropriate times.
>
> TIF_SVE is set when taking an SVE access trap from userspace, if
> suitable hardware support has been detected.  This enables SVE for
> the thread: a subsequent return to userspace will disable the trap
> accordingly.  If such a trap is taken without sufficient hardware
> support, SIGILL is sent to the thread instead as if an undefined
> instruction had been executed: this may happen if userspace tries
> to use SVE in a system where not all CPUs support it for example.
>
> The kernel may clear TIF_SVE and disable SVE for the thread
> whenever an explicit syscall is made by userspace, though this is
> considered an optimisation opportunity rather than a deterministic
> guarantee: the kernel may not do this on every syscall, but it is
> permitted to do so.  For backwards compatibility reasons and
> conformance with the spirit of the base AArch64 procedure call
> standard, the subset of the SVE register state that aliases the
> FPSIMD registers is still preserved across a syscall even if this
> happens.
>
> TIF_SVE is also cleared, and SVE disabled, on exec: this is an
> obvious slow path and a hint that we are running a new binary that
> may not use SVE.
>
> Code is added to sync data between thread.fpsimd_state and
> thread.sve_state whenever enabling/disabling SVE, in a manner
> consistent with the SVE architectural programmer's model.
>
> Signed-off-by: Dave Martin 
> ---
>  arch/arm64/include/asm/fpsimd.h  |  19 +++
>  arch/arm64/include/asm/processor.h   |   2 +
>  arch/arm64/include/asm/thread_info.h |   1 +
>  arch/arm64/include/asm/traps.h   |   2 +
>  arch/arm64/kernel/entry.S|  14 +-
>  arch/arm64/kernel/fpsimd.c   | 241 
> ++-
>  arch/arm64/kernel/process.c  |   6 +-
>  arch/arm64/kernel/traps.c|   4 +-
>  8 files changed, 279 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 026a7c7..72090a1 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -20,6 +20,8 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include 
> +
>  /*
>   * FP/SIMD storage area has:
>   *  - FPSR and FPCR
> @@ -72,6 +74,23 @@ extern void sve_load_state(void const *state, u32 const 
> *pfpsr,
>  unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>
> +#ifdef CONFIG_ARM64_SVE
> +
> +extern size_t sve_state_size(struct task_struct const *task);
> +
> +extern void sve_alloc(struct task_struct *task);
> +extern void fpsimd_release_thread(struct task_struct *task);
> +extern void fpsimd_dup_sve(struct task_struct *dst,
> +struct task_struct const *src);
> +
> +#else /* ! CONFIG_ARM64_SVE */
> +
> +static void __maybe_unused sve_alloc(struct task_struct *task) { }
> +static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { 
> }
> +static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
> +   struct task_struct const *src) { }
> +#endif /* ! CONFIG_ARM64_SVE */
> +
>  /* For use by EFI runtime services calls only */
>  extern void __efi_fpsimd_begin(void);
>  extern void __efi_fpsimd_end(void);
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index b7334f1..969feed 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -85,6 +85,8 @@ struct thread_struct {
>   unsigned long   tp2_value;
>  

Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-17 Thread Ard Biesheuvel
On 17 August 2017 at 17:42, Dave Martin  wrote:
> On Tue, Aug 15, 2017 at 06:31:05PM +0100, Ard Biesheuvel wrote:
>> Hi Dave,
>>
>> On 9 August 2017 at 13:05, Dave Martin  wrote:
>> > This patch adds the core support for switching and managing the SVE
>> > architectural state of user tasks.
>
> [...]
>
>> > +static u64 sve_cpacr_trap_on(u64 cpacr)
>> > +{
>> > +   return cpacr & ~(u64)CPACR_EL1_ZEN_EL0EN;
>> > +}
>> > +
>> > +static u64 sve_cpacr_trap_off(u64 cpacr)
>> > +{
>> > +   return cpacr | CPACR_EL1_ZEN_EL0EN;
>> > +}
>> > +
>> > +static void change_cpacr(u64 old, u64 new)
>> > +{
>> > +   if (old != new)
>> > +   write_sysreg(new, CPACR_EL1);
>> > +}
>
> [...]
>
>> > +static void task_fpsimd_load(void)
>> > +{
>> > +   if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
>> > +   unsigned int vl = current->thread.sve_vl;
>> > +
>> > +   BUG_ON(!sve_vl_valid(vl));
>> > +   sve_load_state(sve_pffr(current),
>> > +  >thread.fpsimd_state.fpsr,
>> > +  sve_vq_from_vl(vl) - 1);
>> > +   } else
>> > +   fpsimd_load_state(>thread.fpsimd_state);
>> > +
>>
>> Please use braces consistently on all branches of an if ()
>>
>> > +   if (system_supports_sve()) {
>> > +   u64 cpacr = read_sysreg(CPACR_EL1);
>> > +   u64 new_cpacr;
>> > +
>> > +   /* Toggle SVE trapping for userspace if needed */
>> > +   if (test_thread_flag(TIF_SVE))
>> > +   new_cpacr = sve_cpacr_trap_off(cpacr);
>> > +   else
>> > +   new_cpacr = sve_cpacr_trap_on(cpacr);
>> > +
>> > +   change_cpacr(cpacr, new_cpacr);
>>
>> I understand you want to avoid setting CPACR to the same value, but
>> this does look a bit clunky IMO. Wouldn't it be much better to have a
>> generic accessor with a mask and a value that encapsulates this?
>
> For this I now have:
>
> static void change_cpacr(u64 val, u64 mask)
> {
> u64 cpacr = read_sysreg(CPACR_EL1);
> u64 new = (cpacr & ~mask) | val;
>
> if (new != cpacr)
> write_sysreg(new, CPACR_EL1);
> }
>
> static void sve_cpacr_trap_on(void)
> {
> change_cpacr(0, CPACR_EL1_ZEN_EL0EN);
> }
>
> static void sve_cpacr_trap_off(void)
> {
> change_cpacr(CPACR_EL1_ZEN_EL0EN, CPACR_EL1_ZEN_EL0EN);
> }
>
>
> This is stilla little verbose, but fairly clean.  Possibly this was the
> sort of thing you meant by a generic accessor.
>
> What do you think?
>

In my opinion, this does look a lot better. Having mask/val pairs like
this is a fairly common pattern, so it should be quite obvious to most
readers what is going on.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-17 Thread Dave Martin
On Tue, Aug 15, 2017 at 06:31:05PM +0100, Ard Biesheuvel wrote:
> Hi Dave,
> 
> On 9 August 2017 at 13:05, Dave Martin  wrote:
> > This patch adds the core support for switching and managing the SVE
> > architectural state of user tasks.

[...]

> > +static u64 sve_cpacr_trap_on(u64 cpacr)
> > +{
> > +   return cpacr & ~(u64)CPACR_EL1_ZEN_EL0EN;
> > +}
> > +
> > +static u64 sve_cpacr_trap_off(u64 cpacr)
> > +{
> > +   return cpacr | CPACR_EL1_ZEN_EL0EN;
> > +}
> > +
> > +static void change_cpacr(u64 old, u64 new)
> > +{
> > +   if (old != new)
> > +   write_sysreg(new, CPACR_EL1);
> > +}

[...]

> > +static void task_fpsimd_load(void)
> > +{
> > +   if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
> > +   unsigned int vl = current->thread.sve_vl;
> > +
> > +   BUG_ON(!sve_vl_valid(vl));
> > +   sve_load_state(sve_pffr(current),
> > +  >thread.fpsimd_state.fpsr,
> > +  sve_vq_from_vl(vl) - 1);
> > +   } else
> > +   fpsimd_load_state(>thread.fpsimd_state);
> > +
> 
> Please use braces consistently on all branches of an if ()
> 
> > +   if (system_supports_sve()) {
> > +   u64 cpacr = read_sysreg(CPACR_EL1);
> > +   u64 new_cpacr;
> > +
> > +   /* Toggle SVE trapping for userspace if needed */
> > +   if (test_thread_flag(TIF_SVE))
> > +   new_cpacr = sve_cpacr_trap_off(cpacr);
> > +   else
> > +   new_cpacr = sve_cpacr_trap_on(cpacr);
> > +
> > +   change_cpacr(cpacr, new_cpacr);
> 
> I understand you want to avoid setting CPACR to the same value, but
> this does look a bit clunky IMO. Wouldn't it be much better to have a
> generic accessor with a mask and a value that encapsulates this?

For this I now have:

static void change_cpacr(u64 val, u64 mask)
{
u64 cpacr = read_sysreg(CPACR_EL1);
u64 new = (cpacr & ~mask) | val;

if (new != cpacr)
write_sysreg(new, CPACR_EL1);
}

static void sve_cpacr_trap_on(void)
{
change_cpacr(0, CPACR_EL1_ZEN_EL0EN);
}

static void sve_cpacr_trap_off(void)
{
change_cpacr(CPACR_EL1_ZEN_EL0EN, CPACR_EL1_ZEN_EL0EN);
}


This is stilla little verbose, but fairly clean.  Possibly this was the
sort of thing you meant by a generic accessor.

What do you think?

[...]

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-16 Thread Dave Martin
On Tue, Aug 15, 2017 at 06:31:05PM +0100, Ard Biesheuvel wrote:
> Hi Dave,
> 
> On 9 August 2017 at 13:05, Dave Martin  wrote:
> > This patch adds the core support for switching and managing the SVE
> > architectural state of user tasks.
> >
> > Calls to the existing FPSIMD low-level save/restore functions are
> > factored out as new functions task_fpsimd_{save,load}(), since SVE
> > now dynamically may or may not need to be handled at these points
> > depending on the kernel configuration, hardware features discovered
> > at boot, and the runtime state of the task.  To make these
> > decisions as fast as possible, const cpucaps are used where
> > feasible, via the system_supports_sve() helper.
> >
> > The SVE registers are only tracked for threads that have explicitly
> > used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
> > FPSIMD view of the architectural state is stored in
> > thread.fpsimd_state as usual.
> >
> > When in use, the SVE registers are not stored directly in
> > thread_struct due to their potentially large and variable size.
> > Because the task_struct slab allocator must be configured very
> > early during kernel boot, it is also tricky to configure it
> > correctly to match the maximum vector length provided by the
> > hardware, since this depends on examining secondary CPUs as well as
> > the primary.  Instead, a pointer sve_state in thread_struct points
> > to a dynamically allocated buffer containing the SVE register data,
> > and code is added to allocate, duplicate and free this buffer at
> > appropriate times.
> >
> > TIF_SVE is set when taking an SVE access trap from userspace, if
> > suitable hardware support has been detected.  This enables SVE for
> > the thread: a subsequent return to userspace will disable the trap
> > accordingly.  If such a trap is taken without sufficient hardware
> > support, SIGILL is sent to the thread instead as if an undefined
> > instruction had been executed: this may happen if userspace tries
> > to use SVE in a system where not all CPUs support it for example.
> >
> > The kernel may clear TIF_SVE and disable SVE for the thread
> > whenever an explicit syscall is made by userspace, though this is
> > considered an optimisation opportunity rather than a deterministic
> > guarantee: the kernel may not do this on every syscall, but it is
> > permitted to do so.  For backwards compatibility reasons and
> > conformance with the spirit of the base AArch64 procedure call
> > standard, the subset of the SVE register state that aliases the
> > FPSIMD registers is still preserved across a syscall even if this
> > happens.
> >
> > TIF_SVE is also cleared, and SVE disabled, on exec: this is an
> > obvious slow path and a hint that we are running a new binary that
> > may not use SVE.
> >
> > Code is added to sync data between thread.fpsimd_state and
> > thread.sve_state whenever enabling/disabling SVE, in a manner
> > consistent with the SVE architectural programmer's model.
> >
> > Signed-off-by: Dave Martin 
> > ---
> >  arch/arm64/include/asm/fpsimd.h  |  19 +++
> >  arch/arm64/include/asm/processor.h   |   2 +
> >  arch/arm64/include/asm/thread_info.h |   1 +
> >  arch/arm64/include/asm/traps.h   |   2 +
> >  arch/arm64/kernel/entry.S|  14 +-
> >  arch/arm64/kernel/fpsimd.c   | 241 
> > ++-
> >  arch/arm64/kernel/process.c  |   6 +-
> >  arch/arm64/kernel/traps.c|   4 +-
> >  8 files changed, 279 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h 
> > b/arch/arm64/include/asm/fpsimd.h
> > index 026a7c7..72090a1 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -20,6 +20,8 @@
> >
> >  #ifndef __ASSEMBLY__
> >
> > +#include 
> > +
> >  /*
> >   * FP/SIMD storage area has:
> >   *  - FPSR and FPCR
> > @@ -72,6 +74,23 @@ extern void sve_load_state(void const *state, u32 const 
> > *pfpsr,
> >unsigned long vq_minus_1);
> >  extern unsigned int sve_get_vl(void);
> >
> > +#ifdef CONFIG_ARM64_SVE
> > +
> > +extern size_t sve_state_size(struct task_struct const *task);
> > +
> > +extern void sve_alloc(struct task_struct *task);
> > +extern void fpsimd_release_thread(struct task_struct *task);
> > +extern void fpsimd_dup_sve(struct task_struct *dst,
> > +  struct task_struct const *src);
> > +
> > +#else /* ! CONFIG_ARM64_SVE */
> > +
> > +static void __maybe_unused sve_alloc(struct task_struct *task) { }
> > +static void __maybe_unused fpsimd_release_thread(struct task_struct *task) 
> > { }
> > +static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
> > + struct task_struct const *src) { }
> > +#endif /* ! CONFIG_ARM64_SVE */
> > +
> >  /* For use by EFI runtime services calls only */
> >  extern void __efi_fpsimd_begin(void);
> >  extern void 

Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-15 Thread Ard Biesheuvel
Hi Dave,

On 9 August 2017 at 13:05, Dave Martin  wrote:
> This patch adds the core support for switching and managing the SVE
> architectural state of user tasks.
>
> Calls to the existing FPSIMD low-level save/restore functions are
> factored out as new functions task_fpsimd_{save,load}(), since SVE
> now dynamically may or may not need to be handled at these points
> depending on the kernel configuration, hardware features discovered
> at boot, and the runtime state of the task.  To make these
> decisions as fast as possible, const cpucaps are used where
> feasible, via the system_supports_sve() helper.
>
> The SVE registers are only tracked for threads that have explicitly
> used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
> FPSIMD view of the architectural state is stored in
> thread.fpsimd_state as usual.
>
> When in use, the SVE registers are not stored directly in
> thread_struct due to their potentially large and variable size.
> Because the task_struct slab allocator must be configured very
> early during kernel boot, it is also tricky to configure it
> correctly to match the maximum vector length provided by the
> hardware, since this depends on examining secondary CPUs as well as
> the primary.  Instead, a pointer sve_state in thread_struct points
> to a dynamically allocated buffer containing the SVE register data,
> and code is added to allocate, duplicate and free this buffer at
> appropriate times.
>
> TIF_SVE is set when taking an SVE access trap from userspace, if
> suitable hardware support has been detected.  This enables SVE for
> the thread: a subsequent return to userspace will disable the trap
> accordingly.  If such a trap is taken without sufficient hardware
> support, SIGILL is sent to the thread instead as if an undefined
> instruction had been executed: this may happen if userspace tries
> to use SVE in a system where not all CPUs support it for example.
>
> The kernel may clear TIF_SVE and disable SVE for the thread
> whenever an explicit syscall is made by userspace, though this is
> considered an optimisation opportunity rather than a deterministic
> guarantee: the kernel may not do this on every syscall, but it is
> permitted to do so.  For backwards compatibility reasons and
> conformance with the spirit of the base AArch64 procedure call
> standard, the subset of the SVE register state that aliases the
> FPSIMD registers is still preserved across a syscall even if this
> happens.
>
> TIF_SVE is also cleared, and SVE disabled, on exec: this is an
> obvious slow path and a hint that we are running a new binary that
> may not use SVE.
>
> Code is added to sync data between thread.fpsimd_state and
> thread.sve_state whenever enabling/disabling SVE, in a manner
> consistent with the SVE architectural programmer's model.
>
> Signed-off-by: Dave Martin 
> ---
>  arch/arm64/include/asm/fpsimd.h  |  19 +++
>  arch/arm64/include/asm/processor.h   |   2 +
>  arch/arm64/include/asm/thread_info.h |   1 +
>  arch/arm64/include/asm/traps.h   |   2 +
>  arch/arm64/kernel/entry.S|  14 +-
>  arch/arm64/kernel/fpsimd.c   | 241 
> ++-
>  arch/arm64/kernel/process.c  |   6 +-
>  arch/arm64/kernel/traps.c|   4 +-
>  8 files changed, 279 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 026a7c7..72090a1 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -20,6 +20,8 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include 
> +
>  /*
>   * FP/SIMD storage area has:
>   *  - FPSR and FPCR
> @@ -72,6 +74,23 @@ extern void sve_load_state(void const *state, u32 const 
> *pfpsr,
>unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>
> +#ifdef CONFIG_ARM64_SVE
> +
> +extern size_t sve_state_size(struct task_struct const *task);
> +
> +extern void sve_alloc(struct task_struct *task);
> +extern void fpsimd_release_thread(struct task_struct *task);
> +extern void fpsimd_dup_sve(struct task_struct *dst,
> +  struct task_struct const *src);
> +
> +#else /* ! CONFIG_ARM64_SVE */
> +
> +static void __maybe_unused sve_alloc(struct task_struct *task) { }
> +static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { 
> }
> +static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
> + struct task_struct const *src) { }
> +#endif /* ! CONFIG_ARM64_SVE */
> +
>  /* For use by EFI runtime services calls only */
>  extern void __efi_fpsimd_begin(void);
>  extern void __efi_fpsimd_end(void);
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index b7334f1..969feed 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -85,6 +85,8 @@ struct thread_struct {
> 

[PATCH 11/27] arm64/sve: Core task context handling

2017-08-09 Thread Dave Martin
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.

Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task.  To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.

The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.

When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary.  Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate, duplicate and free this buffer at
appropriate times.

TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected.  This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly.  If such a trap is taken without sufficient hardware
support, SIGILL is sent to the thread instead as if an undefined
instruction had been executed: this may happen if userspace tries
to use SVE in a system where not all CPUs support it for example.

The kernel may clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace, though this is
considered an optimisation opportunity rather than a deterministic
guarantee: the kernel may not do this on every syscall, but it is
permitted to do so.  For backwards compatibility reasons and
conformance with the spirit of the base AArch64 procedure call
standard, the subset of the SVE register state that aliases the
FPSIMD registers is still preserved across a syscall even if this
happens.

TIF_SVE is also cleared, and SVE disabled, on exec: this is an
obvious slow path and a hint that we are running a new binary that
may not use SVE.

Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.

Signed-off-by: Dave Martin 
---
 arch/arm64/include/asm/fpsimd.h  |  19 +++
 arch/arm64/include/asm/processor.h   |   2 +
 arch/arm64/include/asm/thread_info.h |   1 +
 arch/arm64/include/asm/traps.h   |   2 +
 arch/arm64/kernel/entry.S|  14 +-
 arch/arm64/kernel/fpsimd.c   | 241 ++-
 arch/arm64/kernel/process.c  |   6 +-
 arch/arm64/kernel/traps.c|   4 +-
 8 files changed, 279 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 026a7c7..72090a1 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -20,6 +20,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 /*
  * FP/SIMD storage area has:
  *  - FPSR and FPCR
@@ -72,6 +74,23 @@ extern void sve_load_state(void const *state, u32 const 
*pfpsr,
   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
+#ifdef CONFIG_ARM64_SVE
+
+extern size_t sve_state_size(struct task_struct const *task);
+
+extern void sve_alloc(struct task_struct *task);
+extern void fpsimd_release_thread(struct task_struct *task);
+extern void fpsimd_dup_sve(struct task_struct *dst,
+  struct task_struct const *src);
+
+#else /* ! CONFIG_ARM64_SVE */
+
+static void __maybe_unused sve_alloc(struct task_struct *task) { }
+static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { }
+static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
+ struct task_struct const *src) { }
+#endif /* ! CONFIG_ARM64_SVE */
+
 /* For use by EFI runtime services calls only */
 extern void __efi_fpsimd_begin(void);
 extern void __efi_fpsimd_end(void);
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index b7334f1..969feed 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -85,6 +85,8 @@ struct thread_struct {
unsigned long   tp2_value;
 #endif
struct fpsimd_state fpsimd_state;
+   void*sve_state; /* SVE registers, if any */
+   u16 sve_vl; /* SVE vector length */
unsigned long   fault_address;  /*