Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-30 Thread Marc Zyngier
On 26/11/17 18:58, Yury Norov wrote:
> On Sun, Nov 26, 2017 at 05:17:16PM +0100, Christoffer Dall wrote:
>> Hi Yury,
>>
>> On Sat, Nov 25, 2017 at 10:52:21AM +0300, Yury Norov wrote:
>>>
>>> On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
 Avoid saving the guest VFP registers and restoring the host VFP
 registers on every exit from the VM.  Only when we're about to run
 userspace or other threads in the kernel do we really have to switch the
 state back to the host state.

 We still initially configure the VFP registers to trap when entering the
 VM, but the difference is that we now leave the guest state in the
 hardware registers while running the VM.

 Signed-off-by: Christoffer Dall 
 ---
  arch/arm64/include/asm/kvm_emulate.h |  5 
  arch/arm64/include/asm/kvm_host.h|  3 +++
  arch/arm64/kernel/asm-offsets.c  |  1 +
  arch/arm64/kvm/hyp/entry.S   |  3 +++
  arch/arm64/kvm/hyp/switch.c  | 47 
 +++-
  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
  6 files changed, 44 insertions(+), 36 deletions(-)

 diff --git a/arch/arm64/include/asm/kvm_emulate.h 
 b/arch/arm64/include/asm/kvm_emulate.h
 index 1fbfe96..630dd60 100644
 --- a/arch/arm64/include/asm/kvm_emulate.h
 +++ b/arch/arm64/include/asm/kvm_emulate.h
 @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
 *vcpu)
return (unsigned long *)>arch.hcr_el2;
  }
  
 +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 +{
 +  return (!(vcpu->arch.hcr_el2 & HCR_RW));
 +}
 +
  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
  {
return (unsigned long *)_gp_regs(vcpu)->regs.pc;
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index 7d3bfa7..5e09eb9 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ b/arch/arm64/include/asm/kvm_host.h
 @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
/* Guest debug state */
u64 debug_flags;
  
 +  /* 1 if the guest VFP state is loaded into the hardware */
 +  u64 guest_vfp_loaded;
>>>
>>> May it be just u8/bool?
>>>
>> This particular field is accessed from assembly code, and I'm not sure
>> what guarantees the compiler makes in terms of how a u8/bool is
>> allocated with respect to padding and alignment, and I think that's why
>> we've been using u64 fields in the past.
>>
>> I don't actually remember the details, but I'd rather err on the side of
>> caution than trying to save a few bytes.  However, if someone can
>> convince me there's a completely safe way to do this, then I'm happy to
>> change it.
> 
> 'strb w0, [x3, #VCPU_GUEST_VFP_LOADED]' would work. See
> C6.6.181 STRB (register) in ARM64 ARM.
> 
> The only thing I would recommend is to reorder fields in kvm_vcpu_arch
> to avoid unneeded holes in the structure. It already spend 10 bytes for
> nothing in 3 holes.

Terrifying. How many vcpu are you going to run before this becomes a
real bottleneck? KVM on a 6502? ;-)

Now, when it comes to reordering fields, please keep in mind that the
order of the fields in the structure does matter. We want the hottest
fields grouped together so that they are fetched in the same cache line.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-27 Thread Andrew Jones
On Sun, Nov 26, 2017 at 05:17:56PM +0100, Christoffer Dall wrote:
> Hi Drew,
> 
> On Wed, Nov 15, 2017 at 05:04:40PM +0100, Andrew Jones wrote:
> > On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > > Avoid saving the guest VFP registers and restoring the host VFP
> > > registers on every exit from the VM.  Only when we're about to run
> > > userspace or other threads in the kernel do we really have to switch the
> > > state back to the host state.
> > 
> > Rik van Riel's recently post patch "[PATCH v2 0/2] x86,kvm: move qemu/guest
> > FPU switching out to kvm_arch_vcpu_ioctl_run" indicates that for x86 they
> > only need to swap guest and userspace VFP registers before exiting VCPU_RUN
> > to userspace, not for running other threads. I imagine that's the same for
> > ARM as well.
> > 
> > If so, then I think this hunk
> > 
> > > @@ -209,4 +212,16 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > >   */
> > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > >  {
> > > + struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > + struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
> > > +
> > > + /* Restore host FP/SIMD state */
> > > + if (vcpu->arch.guest_vfp_loaded) {
> > > + if (vcpu_el1_is_32bit(vcpu))
> > > + kvm_call_hyp(__fpsimd32_save_state,
> > > +  kern_hyp_va(guest_ctxt));
> > > + __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
> > > + __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
> > > + vcpu->arch.guest_vfp_loaded = 0;
> > > + }
> > >  }
> > 
> > could be moved to the return of kvm_arch_vcpu_ioctl_run().
> > 
> That sounds cool.
> 
> I'll keep this patch as it is now, and look at Rik's patches and post a
> follow up later, does that sound ok?

Works for me. I'm guessing there's also going to be some overlap with
Dave Martin's SVE work. So whichever series goes second can probably
be the one to put the most consideration into it.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-26 Thread Yury Norov
On Sun, Nov 26, 2017 at 08:18:34PM +0100, Christoffer Dall wrote:
> On Sun, Nov 26, 2017 at 09:58:52PM +0300, Yury Norov wrote:
> > On Sun, Nov 26, 2017 at 05:17:16PM +0100, Christoffer Dall wrote:
> > > Hi Yury,
> > > 
> > > On Sat, Nov 25, 2017 at 10:52:21AM +0300, Yury Norov wrote:
> > > > 
> > > > On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > userspace or other threads in the kernel do we really have to switch 
> > > > > the
> > > > > state back to the host state.
> > > > > 
> > > > > We still initially configure the VFP registers to trap when entering 
> > > > > the
> > > > > VM, but the difference is that we now leave the guest state in the
> > > > > hardware registers while running the VM.
> > > > > 
> > > > > Signed-off-by: Christoffer Dall 
> > > > > ---
> > > > >  arch/arm64/include/asm/kvm_emulate.h |  5 
> > > > >  arch/arm64/include/asm/kvm_host.h|  3 +++
> > > > >  arch/arm64/kernel/asm-offsets.c  |  1 +
> > > > >  arch/arm64/kvm/hyp/entry.S   |  3 +++
> > > > >  arch/arm64/kvm/hyp/switch.c  | 47 
> > > > > +++-
> > > > >  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
> > > > >  6 files changed, 44 insertions(+), 36 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > > > > b/arch/arm64/include/asm/kvm_emulate.h
> > > > > index 1fbfe96..630dd60 100644
> > > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > > @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct 
> > > > > kvm_vcpu *vcpu)
> > > > >   return (unsigned long *)>arch.hcr_el2;
> > > > >  }
> > > > >  
> > > > > +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + return (!(vcpu->arch.hcr_el2 & HCR_RW));
> > > > > +}
> > > > > +
> > > > >  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
> > > > >  {
> > > > >   return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> > > > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > > > b/arch/arm64/include/asm/kvm_host.h
> > > > > index 7d3bfa7..5e09eb9 100644
> > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
> > > > >   /* Guest debug state */
> > > > >   u64 debug_flags;
> > > > >  
> > > > > + /* 1 if the guest VFP state is loaded into the hardware */
> > > > > + u64 guest_vfp_loaded;
> > > > 
> > > > May it be just u8/bool?
> > > > 
> > > This particular field is accessed from assembly code, and I'm not sure
> > > what guarantees the compiler makes in terms of how a u8/bool is
> > > allocated with respect to padding and alignment, and I think that's why
> > > we've been using u64 fields in the past.
> > > 
> > > I don't actually remember the details, but I'd rather err on the side of
> > > caution than trying to save a few bytes.  However, if someone can
> > > convince me there's a completely safe way to do this, then I'm happy to
> > > change it.
> > 
> > 'strb w0, [x3, #VCPU_GUEST_VFP_LOADED]' would work. See
> > C6.6.181 STRB (register) in ARM64 ARM.
> 
> I'm well aware of this instruction.  Thank you though.
> 
> The concern was that we haven't done this in the past.  I think that was
> because the size of a _Bool is not well-defined and we really didn't
> care about a couple of handful of bytes when talking about vcpu
> structures.  Really.
> 
> A u8 should work though, but probably this will all be moot if I combine
> the flags into a single field.
> 
> > 
> > The only thing I would recommend is to reorder fields in kvm_vcpu_arch
> > to avoid unneeded holes in the structure. It already spend 10 bytes for
> > nothing in 3 holes.
> > 
> Patches are welcome.

Heh :) I meant reordering only this field if it is changed.

If you want me to reorder the whole structure and remove all holes...

Patches of that sort (I mean moving fields here and there just to save
couple of bytes) are looking weird. At most because there is general
assumption that the hole exists because author prefers to have clean
logic in field order even with the cost of few holes. But if you give
me indulgence...

Nevertheless, for this specific structure:
Before:
/* size: 8176, cachelines: 128, members: 23 */
/* sum members: 8152, holes: 3, sum holes: 10 */
/* padding: 14 */
/* last cacheline: 48 bytes */

After:
/* size: 8160, cachelines: 128, members: 23 */
/* padding: 8 */
/* last cacheline: 32 bytes */

The patch is below.

Yury

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index dcded44b4180..3739471c39ac 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,10 +200,6 @@ 

Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-26 Thread Christoffer Dall
On Sun, Nov 26, 2017 at 09:58:52PM +0300, Yury Norov wrote:
> On Sun, Nov 26, 2017 at 05:17:16PM +0100, Christoffer Dall wrote:
> > Hi Yury,
> > 
> > On Sat, Nov 25, 2017 at 10:52:21AM +0300, Yury Norov wrote:
> > > 
> > > On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > registers on every exit from the VM.  Only when we're about to run
> > > > userspace or other threads in the kernel do we really have to switch the
> > > > state back to the host state.
> > > > 
> > > > We still initially configure the VFP registers to trap when entering the
> > > > VM, but the difference is that we now leave the guest state in the
> > > > hardware registers while running the VM.
> > > > 
> > > > Signed-off-by: Christoffer Dall 
> > > > ---
> > > >  arch/arm64/include/asm/kvm_emulate.h |  5 
> > > >  arch/arm64/include/asm/kvm_host.h|  3 +++
> > > >  arch/arm64/kernel/asm-offsets.c  |  1 +
> > > >  arch/arm64/kvm/hyp/entry.S   |  3 +++
> > > >  arch/arm64/kvm/hyp/switch.c  | 47 
> > > > +++-
> > > >  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
> > > >  6 files changed, 44 insertions(+), 36 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > > > b/arch/arm64/include/asm/kvm_emulate.h
> > > > index 1fbfe96..630dd60 100644
> > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct 
> > > > kvm_vcpu *vcpu)
> > > > return (unsigned long *)>arch.hcr_el2;
> > > >  }
> > > >  
> > > > +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +   return (!(vcpu->arch.hcr_el2 & HCR_RW));
> > > > +}
> > > > +
> > > >  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
> > > >  {
> > > > return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > > b/arch/arm64/include/asm/kvm_host.h
> > > > index 7d3bfa7..5e09eb9 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
> > > > /* Guest debug state */
> > > > u64 debug_flags;
> > > >  
> > > > +   /* 1 if the guest VFP state is loaded into the hardware */
> > > > +   u64 guest_vfp_loaded;
> > > 
> > > May it be just u8/bool?
> > > 
> > This particular field is accessed from assembly code, and I'm not sure
> > what guarantees the compiler makes in terms of how a u8/bool is
> > allocated with respect to padding and alignment, and I think that's why
> > we've been using u64 fields in the past.
> > 
> > I don't actually remember the details, but I'd rather err on the side of
> > caution than trying to save a few bytes.  However, if someone can
> > convince me there's a completely safe way to do this, then I'm happy to
> > change it.
> 
> 'strb w0, [x3, #VCPU_GUEST_VFP_LOADED]' would work. See
> C6.6.181 STRB (register) in ARM64 ARM.

I'm well aware of this instruction.  Thank you though.

The concern was that we haven't done this in the past.  I think that was
because the size of a _Bool is not well-defined and we really didn't
care about a couple of handful of bytes when talking about vcpu
structures.  Really.

A u8 should work though, but probably this will all be moot if I combine
the flags into a single field.

> 
> The only thing I would recommend is to reorder fields in kvm_vcpu_arch
> to avoid unneeded holes in the structure. It already spend 10 bytes for
> nothing in 3 holes.
> 
Patches are welcome.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-26 Thread Yury Norov
On Sun, Nov 26, 2017 at 05:17:16PM +0100, Christoffer Dall wrote:
> Hi Yury,
> 
> On Sat, Nov 25, 2017 at 10:52:21AM +0300, Yury Norov wrote:
> > 
> > On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > > Avoid saving the guest VFP registers and restoring the host VFP
> > > registers on every exit from the VM.  Only when we're about to run
> > > userspace or other threads in the kernel do we really have to switch the
> > > state back to the host state.
> > > 
> > > We still initially configure the VFP registers to trap when entering the
> > > VM, but the difference is that we now leave the guest state in the
> > > hardware registers while running the VM.
> > > 
> > > Signed-off-by: Christoffer Dall 
> > > ---
> > >  arch/arm64/include/asm/kvm_emulate.h |  5 
> > >  arch/arm64/include/asm/kvm_host.h|  3 +++
> > >  arch/arm64/kernel/asm-offsets.c  |  1 +
> > >  arch/arm64/kvm/hyp/entry.S   |  3 +++
> > >  arch/arm64/kvm/hyp/switch.c  | 47 
> > > +++-
> > >  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
> > >  6 files changed, 44 insertions(+), 36 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > > b/arch/arm64/include/asm/kvm_emulate.h
> > > index 1fbfe96..630dd60 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
> > > *vcpu)
> > >   return (unsigned long *)>arch.hcr_el2;
> > >  }
> > >  
> > > +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> > > +{
> > > + return (!(vcpu->arch.hcr_el2 & HCR_RW));
> > > +}
> > > +
> > >  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
> > >  {
> > >   return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 7d3bfa7..5e09eb9 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
> > >   /* Guest debug state */
> > >   u64 debug_flags;
> > >  
> > > + /* 1 if the guest VFP state is loaded into the hardware */
> > > + u64 guest_vfp_loaded;
> > 
> > May it be just u8/bool?
> > 
> This particular field is accessed from assembly code, and I'm not sure
> what guarantees the compiler makes in terms of how a u8/bool is
> allocated with respect to padding and alignment, and I think that's why
> we've been using u64 fields in the past.
> 
> I don't actually remember the details, but I'd rather err on the side of
> caution than trying to save a few bytes.  However, if someone can
> convince me there's a completely safe way to do this, then I'm happy to
> change it.

'strb w0, [x3, #VCPU_GUEST_VFP_LOADED]' would work. See
C6.6.181 STRB (register) in ARM64 ARM.

The only thing I would recommend is to reorder fields in kvm_vcpu_arch
to avoid unneeded holes in the structure. It already spend 10 bytes for
nothing in 3 holes.

Yury
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-26 Thread Christoffer Dall
On Tue, Nov 07, 2017 at 02:15:50PM +0100, Andrew Jones wrote:
> On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > Avoid saving the guest VFP registers and restoring the host VFP
> > registers on every exit from the VM.  Only when we're about to run
> > userspace or other threads in the kernel do we really have to switch the
> > state back to the host state.
> > 
> > We still initially configure the VFP registers to trap when entering the
> > VM, but the difference is that we now leave the guest state in the
> > hardware registers while running the VM.
> 
> running the host.
> 

I actually did mean the VM, but I should clarify to mean as long as
we're running the VCPU on this physical CPU, even if we trap to the
host.

> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  5 
> >  arch/arm64/include/asm/kvm_host.h|  3 +++
> >  arch/arm64/kernel/asm-offsets.c  |  1 +
> >  arch/arm64/kvm/hyp/entry.S   |  3 +++
> >  arch/arm64/kvm/hyp/switch.c  | 47 
> > +++-
> >  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
> >  6 files changed, 44 insertions(+), 36 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 1fbfe96..630dd60 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
> > *vcpu)
> > return (unsigned long *)>arch.hcr_el2;
> >  }
> >  
> > +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> > +{
> > +   return (!(vcpu->arch.hcr_el2 & HCR_RW));
> 
> nit: no need for the outer ().
> 
> > +}
> > +
> >  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
> >  {
> > return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 7d3bfa7..5e09eb9 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
> > /* Guest debug state */
> > u64 debug_flags;
> >  
> > +   /* 1 if the guest VFP state is loaded into the hardware */
> > +   u64 guest_vfp_loaded;
> > +
> 
> Is there a chance we'll want other flags like this? Should we just make
> this a lazy state flags field with the (currently only) flag VFP? If not,
> then a bool would be nicer, although I see below the u64 was chosen in
> order for the 'str' to be used.
> 

See my reply to Yury.  In terms of merging flags I thought about merging
it with the debug flags, but I didn't think it would look very nice, and
I couldn't come up with a name for the variable that would describe the
logic.

Honestly, I didn't care about the few extra bytes per CPU, and much prefer
clarity, but it may make sense to combine this with for example the
sysreg and timer state later, I'll have a look.

> > /*
> >  * We maintain more than a single set of debug registers to support
> >  * debugging the guest from the host and to maintain separate host and
> > diff --git a/arch/arm64/kernel/asm-offsets.c 
> > b/arch/arm64/kernel/asm-offsets.c
> > index 612021d..9946732 100644
> > --- a/arch/arm64/kernel/asm-offsets.c
> > +++ b/arch/arm64/kernel/asm-offsets.c
> > @@ -133,6 +133,7 @@ int main(void)
> >DEFINE(CPU_GP_REGS,  offsetof(struct kvm_cpu_context, 
> > gp_regs));
> >DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
> >DEFINE(CPU_FP_REGS,  offsetof(struct kvm_regs, fp_regs));
> > +  DEFINE(VCPU_GUEST_VFP_LOADED,offsetof(struct kvm_vcpu, 
> > arch.guest_vfp_loaded));
> >DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, 
> > arch.ctxt.sys_regs[FPEXC32_EL2]));
> >DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
> > arch.host_cpu_context));
> >DEFINE(HOST_CONTEXT_VCPU,offsetof(struct kvm_cpu_context, 
> > __hyp_running_vcpu));
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index 76cd48f..b3e7191 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -185,6 +185,9 @@ alternative_endif
> > add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > bl  __fpsimd_restore_state
> >  
> > +   mov x0, #1
> > +   str x0, [x3, #VCPU_GUEST_VFP_LOADED]
> > +
> > // Skip restoring fpexc32 for AArch64 guests
> > mrs x1, hcr_el2
> > tbnzx1, #HCR_RW_SHIFT, 1f
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 7703d63..ef05c59 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -23,43 +23,31 @@
> >  #include 
> >  #include 
> >  
> > -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> > -{
> > -   return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> > -}
> > -
> > -static bool __hyp_text 

Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-26 Thread Christoffer Dall
Hi Yury,

On Sat, Nov 25, 2017 at 10:52:21AM +0300, Yury Norov wrote:
> 
> On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> > Avoid saving the guest VFP registers and restoring the host VFP
> > registers on every exit from the VM.  Only when we're about to run
> > userspace or other threads in the kernel do we really have to switch the
> > state back to the host state.
> > 
> > We still initially configure the VFP registers to trap when entering the
> > VM, but the difference is that we now leave the guest state in the
> > hardware registers while running the VM.
> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  5 
> >  arch/arm64/include/asm/kvm_host.h|  3 +++
> >  arch/arm64/kernel/asm-offsets.c  |  1 +
> >  arch/arm64/kvm/hyp/entry.S   |  3 +++
> >  arch/arm64/kvm/hyp/switch.c  | 47 
> > +++-
> >  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
> >  6 files changed, 44 insertions(+), 36 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 1fbfe96..630dd60 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
> > *vcpu)
> > return (unsigned long *)>arch.hcr_el2;
> >  }
> >  
> > +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> > +{
> > +   return (!(vcpu->arch.hcr_el2 & HCR_RW));
> > +}
> > +
> >  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
> >  {
> > return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 7d3bfa7..5e09eb9 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
> > /* Guest debug state */
> > u64 debug_flags;
> >  
> > +   /* 1 if the guest VFP state is loaded into the hardware */
> > +   u64 guest_vfp_loaded;
> 
> May it be just u8/bool?
> 
This particular field is accessed from assembly code, and I'm not sure
what guarantees the compiler makes in terms of how a u8/bool is
allocated with respect to padding and alignment, and I think that's why
we've been using u64 fields in the past.

I don't actually remember the details, but I'd rather err on the side of
caution than trying to save a few bytes.  However, if someone can
convince me there's a completely safe way to do this, then I'm happy to
change it.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-24 Thread Yury Norov
Hi Christoffer,

On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers while running the VM.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  5 
>  arch/arm64/include/asm/kvm_host.h|  3 +++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kvm/hyp/entry.S   |  3 +++
>  arch/arm64/kvm/hyp/switch.c  | 47 
> +++-
>  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
>  6 files changed, 44 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 1fbfe96..630dd60 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
> *vcpu)
>   return (unsigned long *)>arch.hcr_el2;
>  }
>  
> +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> +{
> + return (!(vcpu->arch.hcr_el2 & HCR_RW));
> +}
> +
>  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
>  {
>   return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 7d3bfa7..5e09eb9 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
>   /* Guest debug state */
>   u64 debug_flags;
>  
> + /* 1 if the guest VFP state is loaded into the hardware */
> + u64 guest_vfp_loaded;

May it be just u8/bool?

Yury

> +
>   /*
>* We maintain more than a single set of debug registers to support
>* debugging the guest from the host and to maintain separate host and
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 612021d..9946732 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -133,6 +133,7 @@ int main(void)
>DEFINE(CPU_GP_REGS,offsetof(struct kvm_cpu_context, 
> gp_regs));
>DEFINE(CPU_USER_PT_REGS,   offsetof(struct kvm_regs, regs));
>DEFINE(CPU_FP_REGS,offsetof(struct kvm_regs, fp_regs));
> +  DEFINE(VCPU_GUEST_VFP_LOADED,  offsetof(struct kvm_vcpu, 
> arch.guest_vfp_loaded));
>DEFINE(VCPU_FPEXC32_EL2,   offsetof(struct kvm_vcpu, 
> arch.ctxt.sys_regs[FPEXC32_EL2]));
>DEFINE(VCPU_HOST_CONTEXT,  offsetof(struct kvm_vcpu, 
> arch.host_cpu_context));
>DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 76cd48f..b3e7191 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -185,6 +185,9 @@ alternative_endif
>   add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>   bl  __fpsimd_restore_state
>  
> + mov x0, #1
> + str x0, [x3, #VCPU_GUEST_VFP_LOADED]
> +
>   // Skip restoring fpexc32 for AArch64 guests
>   mrs x1, hcr_el2
>   tbnzx1, #HCR_RW_SHIFT, 1f
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 7703d63..ef05c59 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,43 +23,31 @@
>  #include 
>  #include 
>  
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> -{
> - return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> -
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> -{
> - return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> -}
> -
> -static hyp_alternate_select(__fpsimd_is_enabled,
> - __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> - ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)
> -{
> - return __fpsimd_is_enabled()();
> -}
> -
> -static void __hyp_text __activate_traps_vhe(void)
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
>   val = read_sysreg(cpacr_el1);
>   val |= CPACR_EL1_TTA;
> - val &= ~CPACR_EL1_FPEN;
> + if (vcpu->arch.guest_vfp_loaded)
> + val |= CPACR_EL1_FPEN;
> + else
> + val &= ~CPACR_EL1_FPEN;
>   write_sysreg(val, cpacr_el1);
>  
>   write_sysreg(__kvm_hyp_vector, vbar_el1);
>  }
>  
> -static void __hyp_text __activate_traps_nvhe(void)
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
>   val = CPTR_EL2_DEFAULT;
> - val |= CPTR_EL2_TTA | 

Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-15 Thread Andrew Jones
On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.

Rik van Riel's recently post patch "[PATCH v2 0/2] x86,kvm: move qemu/guest
FPU switching out to kvm_arch_vcpu_ioctl_run" indicates that for x86 they
only need to swap guest and userspace VFP registers before exiting VCPU_RUN
to userspace, not for running other threads. I imagine that's the same for
ARM as well.

If so, then I think this hunk

> @@ -209,4 +212,16 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   */
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  {
> + struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> + struct kvm_cpu_context *guest_ctxt = >arch.ctxt;
> +
> + /* Restore host FP/SIMD state */
> + if (vcpu->arch.guest_vfp_loaded) {
> + if (vcpu_el1_is_32bit(vcpu))
> + kvm_call_hyp(__fpsimd32_save_state,
> +  kern_hyp_va(guest_ctxt));
> + __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
> + __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
> + vcpu->arch.guest_vfp_loaded = 0;
> + }
>  }

could be moved to the return of kvm_arch_vcpu_ioctl_run().

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/37] KVM: arm64: Defer restoring host VFP state to vcpu_put

2017-11-07 Thread Andrew Jones
On Thu, Oct 12, 2017 at 12:41:12PM +0200, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers while running the VM.

running the host.

> 
> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  5 
>  arch/arm64/include/asm/kvm_host.h|  3 +++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kvm/hyp/entry.S   |  3 +++
>  arch/arm64/kvm/hyp/switch.c  | 47 
> +++-
>  arch/arm64/kvm/hyp/sysreg-sr.c   | 21 +---
>  6 files changed, 44 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 1fbfe96..630dd60 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -56,6 +56,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu 
> *vcpu)
>   return (unsigned long *)>arch.hcr_el2;
>  }
>  
> +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> +{
> + return (!(vcpu->arch.hcr_el2 & HCR_RW));

nit: no need for the outer ().

> +}
> +
>  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
>  {
>   return (unsigned long *)_gp_regs(vcpu)->regs.pc;
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 7d3bfa7..5e09eb9 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -210,6 +210,9 @@ struct kvm_vcpu_arch {
>   /* Guest debug state */
>   u64 debug_flags;
>  
> + /* 1 if the guest VFP state is loaded into the hardware */
> + u64 guest_vfp_loaded;
> +

Is there a chance we'll want other flags like this? Should we just make
this a lazy state flags field with the (currently only) flag VFP? If not,
then a bool would be nicer, although I see below the u64 was chosen in
order for the 'str' to be used.

>   /*
>* We maintain more than a single set of debug registers to support
>* debugging the guest from the host and to maintain separate host and
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 612021d..9946732 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -133,6 +133,7 @@ int main(void)
>DEFINE(CPU_GP_REGS,offsetof(struct kvm_cpu_context, 
> gp_regs));
>DEFINE(CPU_USER_PT_REGS,   offsetof(struct kvm_regs, regs));
>DEFINE(CPU_FP_REGS,offsetof(struct kvm_regs, fp_regs));
> +  DEFINE(VCPU_GUEST_VFP_LOADED,  offsetof(struct kvm_vcpu, 
> arch.guest_vfp_loaded));
>DEFINE(VCPU_FPEXC32_EL2,   offsetof(struct kvm_vcpu, 
> arch.ctxt.sys_regs[FPEXC32_EL2]));
>DEFINE(VCPU_HOST_CONTEXT,  offsetof(struct kvm_vcpu, 
> arch.host_cpu_context));
>DEFINE(HOST_CONTEXT_VCPU,  offsetof(struct kvm_cpu_context, 
> __hyp_running_vcpu));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 76cd48f..b3e7191 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -185,6 +185,9 @@ alternative_endif
>   add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>   bl  __fpsimd_restore_state
>  
> + mov x0, #1
> + str x0, [x3, #VCPU_GUEST_VFP_LOADED]
> +
>   // Skip restoring fpexc32 for AArch64 guests
>   mrs x1, hcr_el2
>   tbnzx1, #HCR_RW_SHIFT, 1f
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 7703d63..ef05c59 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,43 +23,31 @@
>  #include 
>  #include 
>  
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> -{
> - return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> -
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> -{
> - return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> -}
> -
> -static hyp_alternate_select(__fpsimd_is_enabled,
> - __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> - ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)
> -{
> - return __fpsimd_is_enabled()();
> -}
> -
> -static void __hyp_text __activate_traps_vhe(void)
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>   u64 val;
>  
>   val = read_sysreg(cpacr_el1);
>   val |= CPACR_EL1_TTA;
> - val &= ~CPACR_EL1_FPEN;
> + if (vcpu->arch.guest_vfp_loaded)
> + val |= CPACR_EL1_FPEN;
> + else
> + val &= ~CPACR_EL1_FPEN;
>   write_sysreg(val, cpacr_el1);
>  
>