Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

2018-04-09 Thread Dave Martin
On Sat, Apr 07, 2018 at 10:54:22AM +0100, Marc Zyngier wrote:
> On Fri, 06 Apr 2018 16:01:04 +0100,
> Dave Martin wrote:
> 
> Hi Dave,
> 
> > 
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64.  This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> > 
> > Four hooks are defined in order to enable this:
> > 
> >  * kvm_arch_vcpu_run_map_fp():
> >Called on PID change to map necessary bits of current to Hyp.
> > 
> >  * kvm_arch_vcpu_load_fp():
> >Set up FP/SIMD for entering the KVM run loop (parse as
> >"vcpu_load fp").
> > 
> >  * kvm_arch_vcpu_park_fp():
> >Get FP/SIMD into a safe state for re-enabling interrupts after a
> >guest exit back to the run loop.
> 
> It would be good if you could outline what this hook does, and what
> "safe state" means in this context.

[...]

> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index d3af3f4..cf0f457 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -362,10 +362,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
> > cpu)
> > kvm_arm_set_running_vcpu(vcpu);
> > kvm_vgic_load(vcpu);
> > kvm_timer_vcpu_load(vcpu);
> > +   kvm_arch_vcpu_load_fp(vcpu);
> 
> Can't find this function.

[...]

> > @@ -772,6 +774,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > if (static_branch_unlikely(_irqchip_in_use))
> > kvm_timer_sync_hwstate(vcpu);
> >  
> > +   kvm_arch_vcpu_park_fp(vcpu);
> 
> This isn't defined either. I have the feeling that you've missed a
> "git add" at some point.
> 
> > +
> > /*
> >  * We may have taken a host interrupt in HYP mode (ie
> >  * while executing the guest). This interrupt is still
> > @@ -816,16 +820,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > return ret;
> >  }

[...]

> The structure seems quite sensible, and I'm looking forward to seeing
> the missing bits. Also, it looks like this was done on top of
> 4.16. I'm afraid 4.17 is going to bring a number of conflicts, but
> nothing that should have any material impact.

Hmmm, looks like I dropped an added file when flattening my draft series.

See separate repost -- sorry about that.


I've kept it on v4.16 just while things stabilise.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

2018-04-09 Thread Dave Martin
On Mon, Apr 09, 2018 at 11:48:18AM +0200, Christoffer Dall wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 8605e04..797b259 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >  {
> > @@ -47,24 +48,40 @@ bool __hyp_text __fpsimd_enabled(void)
> > return __fpsimd_is_enabled()();
> >  }
> >  
> > -static void __hyp_text __activate_traps_vhe(void)
> > +static bool update_fp_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +   if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE) {
> > +   vcpu->arch.host_fpsimd_state = NULL;
> 
> I can't see where host_fpsimd_state gets set to anything else than NULL,
> what am I missing?

It's set in kvm_arch_vcpu_load_fp().  Once we enter the run loop the
pointer can only be changed to NULL, because the host state only needs
to be saved once.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

2018-04-09 Thread Christoffer Dall
On Fri, Apr 06, 2018 at 04:01:04PM +0100, Dave Martin wrote:
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>Set up FP/SIMD for entering the KVM run loop (parse as
>"vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>Get FP/SIMD into a safe state for re-enabling interrupts after a
>guest exit back to the run loop.
> 
>  * kvm_arch_vcpu_put_fp():
>Save guest FP/SIMD state back to memory and dissociate from the
>CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>mark this CPU as having context fp (which may belong to a vcpu)
>currently loaded in its registers.  This is the non-task
>equivalent of the static function fpsimd_bind_to_cpu() in
>fpsimd.c.
> 
>  * task_fpsimd_save():
>exported to allow KVM to save the guest's FPSIMD state back to
>memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>invalidate any context's FPSIMD state that is currently loaded.
>Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin 
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++
>  arch/arm64/include/asm/fpsimd.h   |  5 +
>  arch/arm64/include/asm/kvm_host.h | 18 +++
>  arch/arm64/kernel/fpsimd.c| 31 --
>  arch/arm64/kvm/Makefile   |  2 +-
>  arch/arm64/kvm/hyp/switch.c   | 46 
> ---
>  virt/kvm/arm/arm.c| 14 
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const 
> *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>  
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git 

Re: [RFC PATCH v2 3/3] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing

2018-04-07 Thread Marc Zyngier
On Fri, 06 Apr 2018 16:01:04 +0100,
Dave Martin wrote:

Hi Dave,

> 
> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
> 
> Four hooks are defined in order to enable this:
> 
>  * kvm_arch_vcpu_run_map_fp():
>Called on PID change to map necessary bits of current to Hyp.
> 
>  * kvm_arch_vcpu_load_fp():
>Set up FP/SIMD for entering the KVM run loop (parse as
>"vcpu_load fp").
> 
>  * kvm_arch_vcpu_park_fp():
>Get FP/SIMD into a safe state for re-enabling interrupts after a
>guest exit back to the run loop.

It would be good if you could outline what this hook does, and what
"safe state" means in this context.

> 
>  * kvm_arch_vcpu_put_fp():
>Save guest FP/SIMD state back to memory and dissociate from the
>CPU ("vcpu_put fp").
> 
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
> 
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>mark this CPU as having context fp (which may belong to a vcpu)
>currently loaded in its registers.  This is the non-task
>equivalent of the static function fpsimd_bind_to_cpu() in
>fpsimd.c.
> 
>  * task_fpsimd_save():
>exported to allow KVM to save the guest's FPSIMD state back to
>memory on exit from the run loop.
> 
>  * fpsimd_flush_state():
>invalidate any context's FPSIMD state that is currently loaded.
>Used to disassociate the vcpu from the CPU regs on run loop exit.
> 
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
> 
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
> 
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
> 
> Signed-off-by: Dave Martin 
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 +++
>  arch/arm64/include/asm/fpsimd.h   |  5 +
>  arch/arm64/include/asm/kvm_host.h | 18 +++
>  arch/arm64/kernel/fpsimd.c| 31 --
>  arch/arm64/kvm/Makefile   |  2 +-
>  arch/arm64/kvm/hyp/switch.c   | 46 
> ---
>  virt/kvm/arm/arm.c| 14 
>  7 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 248b930..11cd64a3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  struct kvm_device_attr *attr);
>  
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_park_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>  
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 1bfc920..dbe7340 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -40,6 +40,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  
> +extern void task_fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> @@ -48,7 +50,10 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const 
> *state);
>  
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void