Re: [PATCH v3 01/10] kvm: arm64: vgic: Introduce vgic_check_iorange

2021-09-29 Thread Ricardo Koller
On Wed, Sep 29, 2021 at 06:29:21PM +0200, Eric Auger wrote:
> Hi Ricardo,
> 
> On 9/28/21 8:47 PM, Ricardo Koller wrote:
> > Add the new vgic_check_iorange helper that checks that an iorange is
> > sane: the start address and size have valid alignments, the range is
> > within the addressable PA range, start+size doesn't overflow, and the
> > start wasn't already defined.
> >
> > No functional change.
> >
> > Signed-off-by: Ricardo Koller 
> > ---
> >  arch/arm64/kvm/vgic/vgic-kvm-device.c | 22 ++
> >  arch/arm64/kvm/vgic/vgic.h|  4 
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c 
> > b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > index 7740995de982..f714aded67b2 100644
> > --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > @@ -29,6 +29,28 @@ int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t 
> > *ioaddr,
> > return 0;
> >  }
> >  
> > +int vgic_check_iorange(struct kvm *kvm, phys_addr_t *ioaddr,
> > +  phys_addr_t addr, phys_addr_t alignment,
> > +  phys_addr_t size)
> > +{
> > +   int ret;
> > +
> > +   ret = vgic_check_ioaddr(kvm, ioaddr, addr, alignment);
> nit: not related to this patch but I am just wondering why we are
> passing phys_addr_t *ioaddr downto vgic_check_ioaddr and thus to
> 
> vgic_check_iorange()? This must be a leftover of some old code?
> 

It's used to check that the base of a region is not already set.
kvm_vgic_addr() uses it to make that check;
vgic_v3_alloc_redist_region() does not:

  rdreg->base = VGIC_ADDR_UNDEF; // so the "not already defined" check passes
  ret = vgic_check_ioaddr(kvm, >base, base, SZ_64K);

Thanks,
Ricardo

> > +   if (ret)
> > +   return ret;
> > +
> > +   if (!IS_ALIGNED(size, alignment))
> > +   return -EINVAL;
> > +
> > +   if (addr + size < addr)
> > +   return -EINVAL;
> > +
> > +   if (addr + size > kvm_phys_size(kvm))
> > +   return -E2BIG;
> > +
> > +   return 0;
> > +}
> > +
> >  static int vgic_check_type(struct kvm *kvm, int type_needed)
> >  {
> > if (kvm->arch.vgic.vgic_model != type_needed)
> > diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> > index 14a9218641f5..c4df4dcef31f 100644
> > --- a/arch/arm64/kvm/vgic/vgic.h
> > +++ b/arch/arm64/kvm/vgic/vgic.h
> > @@ -175,6 +175,10 @@ void vgic_irq_handle_resampling(struct vgic_irq *irq,
> >  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> >   phys_addr_t addr, phys_addr_t alignment);
> >  
> > +int vgic_check_iorange(struct kvm *kvm, phys_addr_t *ioaddr,
> > +  phys_addr_t addr, phys_addr_t alignment,
> > +  phys_addr_t size);
> > +
> >  void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu);
> >  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int 
> > lr);
> >  void vgic_v2_clear_lr(struct kvm_vcpu *vcpu, int lr);
> Besides
> Reviewed-by: Eric Auger 
> Eric
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 05/10] KVM: arm64: selftests: Make vgic_init gic version agnostic

2021-09-29 Thread Ricardo Koller
On Wed, Sep 29, 2021 at 07:12:59PM +0200, Eric Auger wrote:
> 
> 
> On 9/28/21 8:47 PM, Ricardo Koller wrote:
> > As a preparation for the next commits which will add some tests for
> > GICv2, make aarch64/vgic_init GIC version agnostic. Add a new generic
> > run_tests function(gic_dev_type) that starts all applicable tests using
> > GICv3 or GICv2. GICv2 tests are attempted if GICv3 is not available in
> > the system. There are currently no GICv2 tests, but the test passes now
> > in GICv2 systems.
> >
> > Signed-off-by: Ricardo Koller 
> > ---
> >  .../testing/selftests/kvm/aarch64/vgic_init.c | 156 +++---
> >  1 file changed, 95 insertions(+), 61 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
> > b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> > index 623f31a14326..896a29f2503d 100644
> > --- a/tools/testing/selftests/kvm/aarch64/vgic_init.c
> > +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> > @@ -22,6 +22,9 @@
> >  
> >  #define GICR_TYPER 0x8
> >  
> > +#define VGIC_DEV_IS_V2(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V2)
> > +#define VGIC_DEV_IS_V3(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V3)
> > +
> >  struct vm_gic {
> > struct kvm_vm *vm;
> > int gic_fd;
> > @@ -30,8 +33,8 @@ struct vm_gic {
> >  static int max_ipa_bits;
> >  
> >  /* helper to access a redistributor register */
> > -static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
> > -uint32_t *val, bool write)
> > +static int access_v3_redist_reg(int gicv3_fd, int vcpu, int offset,
> > +   uint32_t *val, bool write)
> >  {
> > uint64_t attr = REG_OFFSET(vcpu, offset);
> >  
> > @@ -58,7 +61,7 @@ static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
> > return 0;
> >  }
> >  
> > -static struct vm_gic vm_gic_create(void)
> > +static struct vm_gic vm_gic_v3_create(void)
> >  {
> > struct vm_gic v;
> >  
> > @@ -80,7 +83,7 @@ static void vm_gic_destroy(struct vm_gic *v)
> >   * device gets created, a legacy RDIST region is set at @0x0
> >   * and a DIST region is set @0x6
> >   */
> > -static void subtest_dist_rdist(struct vm_gic *v)
> > +static void subtest_v3_dist_rdist(struct vm_gic *v)
> >  {
> > int ret;
> > uint64_t addr;
> > @@ -145,7 +148,7 @@ static void subtest_dist_rdist(struct vm_gic *v)
> >  }
> >  
> >  /* Test the new REDIST region API */
> > -static void subtest_redist_regions(struct vm_gic *v)
> > +static void subtest_v3_redist_regions(struct vm_gic *v)
> >  {
> > uint64_t addr, expected_addr;
> > int ret;
> > @@ -249,7 +252,7 @@ static void subtest_redist_regions(struct vm_gic *v)
> >   * VGIC KVM device is created and initialized before the secondary CPUs
> >   * get created
> >   */
> > -static void test_vgic_then_vcpus(void)
> > +static void test_v3_vgic_then_vcpus(uint32_t gic_dev_type)
> >  {
> > struct vm_gic v;
> > int ret, i;
> > @@ -257,7 +260,7 @@ static void test_vgic_then_vcpus(void)
> > v.vm = vm_create_default(0, 0, guest_code);
> > v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false);
> >  
> > -   subtest_dist_rdist();
> > +   subtest_v3_dist_rdist();
> >  
> > /* Add the rest of the VCPUs */
> > for (i = 1; i < NR_VCPUS; ++i)
> > @@ -270,14 +273,14 @@ static void test_vgic_then_vcpus(void)
> >  }
> >  
> >  /* All the VCPUs are created before the VGIC KVM device gets initialized */
> > -static void test_vcpus_then_vgic(void)
> > +static void test_v3_vcpus_then_vgic(uint32_t gic_dev_type)
> >  {
> > struct vm_gic v;
> > int ret;
> >  
> > -   v = vm_gic_create();
> > +   v = vm_gic_v3_create();
> >  
> > -   subtest_dist_rdist();
> > +   subtest_v3_dist_rdist();
> >  
> > ret = run_vcpu(v.vm, 3);
> > TEST_ASSERT(ret == -EINVAL, "dist/rdist overlap detected on 1st vcpu 
> > run");
> > @@ -285,15 +288,15 @@ static void test_vcpus_then_vgic(void)
> > vm_gic_destroy();
> >  }
> >  
> > -static void test_new_redist_regions(void)
> > +static void test_v3_new_redist_regions(void)
> >  {
> > void *dummy = NULL;
> > struct vm_gic v;
> > uint64_t addr;
> > int ret;
> >  
> > -   v = vm_gic_create();
> > -   subtest_redist_regions();
> > +   v = vm_gic_v3_create();
> > +   subtest_v3_redist_regions();
> > kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> >   KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true);
> >  
> > @@ -303,8 +306,8 @@ static void test_new_redist_regions(void)
> >  
> > /* step2 */
> >  
> > -   v = vm_gic_create();
> > -   subtest_redist_regions();
> > +   v = vm_gic_v3_create();
> > +   subtest_v3_redist_regions();
> >  
> > addr = REDIST_REGION_ATTR_ADDR(1, 0x28, 0, 2);
> > kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> > @@ -317,8 +320,8 @@ static void test_new_redist_regions(void)
> >  
> > /* step 3 */
> >  
> > -   v = vm_gic_create();
> > -   subtest_redist_regions();
> > +   v = vm_gic_v3_create();
> > +   

Re: [PATCH v3 02/10] KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size

2021-09-29 Thread Ricardo Koller
Hi Eric,

On Wed, Sep 29, 2021 at 06:23:04PM +0200, Eric Auger wrote:
> Hi Ricardo,
> 
> On 9/28/21 8:47 PM, Ricardo Koller wrote:
> > Verify that the redistributor regions do not extend beyond the
> > VM-specified IPA range (phys_size). This can happen when using
> > KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
> > with:
> >
> >   base + size > phys_size AND base < phys_size
> >
> > Add the missing check into vgic_v3_alloc_redist_region() which is called
> > when setting the regions, and into vgic_v3_check_base() which is called
> > when attempting the first vcpu-run. The vcpu-run check does not apply to
> > KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
> > before the first vcpu-run. Note that using the REDIST_REGIONS API
> > results in a different check, which already exists, at first vcpu run:
> > that the number of redist regions is enough for all vcpus.
> >
> > Finally, this patch also enables some extra tests in
> > vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
> > redist api: like checking that the REDIST region can fit all the already
> > created vcpus.
> >
> > Signed-off-by: Ricardo Koller 
> > ---
> >  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 6 --
> >  arch/arm64/kvm/vgic/vgic-v3.c  | 4 
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> > b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > index a09cdc0b953c..9be02bf7865e 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > @@ -796,7 +796,9 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, 
> > uint32_t index,
> > struct vgic_dist *d = >arch.vgic;
> > struct vgic_redist_region *rdreg;
> > struct list_head *rd_regions = >rd_regions;
> > -   size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
> > +   int nr_vcpus = atomic_read(>online_vcpus);
> > +   size_t size = count ? count * KVM_VGIC_V3_REDIST_SIZE
> > +   : nr_vcpus * KVM_VGIC_V3_REDIST_SIZE;
> 
> This actually fixes theĀ  vgic_dist_overlap(kvm, base, size=0)
> 
> in case the number of online-vcpus at that time is less than the final
> one (1st run), if count=0 (legacy API) do we ever check that the RDIST
> (with accumulated vcpu rdists) does not overlap with dist.
> in other words shouldn't we call vgic_dist_overlap(kvm, base, size)
> again in vgic_v3_check_base().
> 

I think we're good; that's checked by vgic_v3_rdist_overlap(dist_base)
in vgic_v3_check_base(). This function uses the only region (legacy
case) using a size based on the online_vcpus (in
vgic_v3_rd_region_size()).  This exact situation is tested by
test_vgic_then_vcpus() in the vgic_init selftest.

Thanks,
Ricardo

> Thanks
> 
> Eric
> 
> > int ret;
> >  
> > /* cross the end of memory ? */
> > @@ -840,7 +842,7 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, 
> > uint32_t index,
> >  
> > rdreg->base = VGIC_ADDR_UNDEF;
> >  
> > -   ret = vgic_check_ioaddr(kvm, >base, base, SZ_64K);
> > +   ret = vgic_check_iorange(kvm, >base, base, SZ_64K, size);
> > if (ret)
> > goto free;
> >  
> > diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> > index 21a6207fb2ee..27ee674631b3 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> > @@ -486,6 +486,10 @@ bool vgic_v3_check_base(struct kvm *kvm)
> > if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) <
> > rdreg->base)
> > return false;
> > +
> > +   if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) >
> > +   kvm_phys_size(kvm))
> > +   return false;
> > }
> >  
> > if (IS_VGIC_ADDR_UNDEF(d->vgic_dist_base))
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: KVM/arm64: Guest ABI changes do not appear rollback-safe

2021-09-29 Thread Oliver Upton
On Fri, Aug 27, 2021 at 12:40 AM Andrew Jones  wrote:
>
> On Thu, Aug 26, 2021 at 06:49:27PM +, Oliver Upton wrote:
> > On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 19:14:59 +0100,
> > > Oliver Upton  wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones  wrote:
> > >
> > > [...]
> > >
> > > > > Thanks for including me Marc. I think you've mentioned all the 
> > > > > examples
> > > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > > can think of. While some of the examples like get-reg-list could
> > > > > eventually be eliminated if we had CPU models to tighten our machine 
> > > > > type
> > > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > > >
> > > > > I agree with giving userspace control over the exposer of the 
> > > > > hypercalls
> > > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > > CAPs also seems reasonable to me.
> > > > >
> > > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > > point out that the opt-out approach doesn't help much with expanding
> > > > > our migration support unless we require the VMM to be upgraded first.
> > > > >
> > > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > > work as expected, since the source enforce opt-out, but the 
> > > > > destination
> > > > > won't.
> > > >
> > > > Right, there's going to need to be a fence in both kernel and VMM
> > > > versions. Before the fence, you can't rollback with either component.
> > > > Once on the other side of the fence, the user may freely migrate
> > > > between kernel + VMM combinations.
> > > >
> > > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > > leaving it to the user to ensure they consider everything. opt-in
> > > > > usually only needs the user to consider what machine type they want to
> > > > > launch.
> > > >
> > > > Going the register route will implicitly require opt-out for all old
> > > > hypercalls. We exposed them unconditionally to the guest before, and
> > > > we must uphold that behavior. The default value for the bitmap will
> > > > have those features set. Any hypercalls added after that register
> > > > interface will then require explicit opt-in from userspace.
> > >
> > > I disagree here. This makes the ABI inconsistent, and means that no
> > > feature can be implemented without changing userspace. If you can deal
> > > with the existing features, you should be able to deal with the next
> > > lot.
> > >
> > > > With regards to the pseudoregister interface, how would a VMM discover
> > > > new bits? From my perspective, you need to have two bitmaps that the
> > > > VMM can get at: the set of supported feature bits and the active
> > > > bitmap of features for a running guest.
> > >
> > > My proposal is that we have a single pseudo-register exposing the list
> > > of implemented by the kernel. Clear the bits you don't want, and write
> > > back the result. As long as you haven't written anything, you have the
> > > full feature set. That's pretty similar to the virtio feature
> > > negotiation.
> >
> > Ah, yes I agree. Thinking about it more we will not need something
> > similar to KVM_GET_SUPPORTED_CPUID.
> >
> > So then, for any register where userspace/KVM need to negotiate
> > features, the default value will return the maximum feature set that is
> > supported. If userspace wants to constrain features, read out the
> > register, make sure everything you want is there, and write it back
> > blowing away the superfluous bits. Given this should we enforce ordering
> > on feature registers, such that a VMM can only write to the registers
> > before a VM is started?
>
> That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
> we can model the feature register control off that.
>
> >
> > Also, Reiji is working on making the identity registers writable for the
> > sake of feature restriction. The suggested negotiation interface would
> > be applicable there too, IMO.
>
> This this interesting news. I'll look forward to the posting.
>
> >
> > Many thanks to both you and Drew for working this out with me.
> >
>
> Thanks,
> drew
>

Hey folks,

I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so 

Re: [PATCH 2/2] KVM: arm64: Depend on HAVE_KVM => OF instead of directly on OF

2021-09-29 Thread Sean Christopherson
On Wed, Sep 29, 2021, Will Deacon wrote:
> On Tue, Sep 21, 2021 at 03:22:31PM -0700, Sean Christopherson wrote:
> > Select HAVE_KVM if the KVM dependency is met (OF / Open Firmware), and
> > make KVM depend on HAVE_KVM instead of directly on OF.  This fixes the
> > oddity where arm64 configs can end up with KVM=y and HAVE_KVM=n, and will
> > hopefully prevent breakage if there are future users of HAVE_KVM.
> > 
> > Note, arm64 unconditionally selects OF, and has always done so (see
> > commit 8c2c3df31e3b ("arm64: Build infrastructure").  Keep the somewhat
> > pointless HAVE_KVM dependency on OF to document that KVM requires Open
> > Firmware support.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson 
> > ---
> >  arch/arm64/Kconfig | 1 +
> >  arch/arm64/kvm/Kconfig | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index b5b13a932561..38c0f36a5ed4 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -187,6 +187,7 @@ config ARM64
> > select HAVE_GCC_PLUGINS
> > select HAVE_HW_BREAKPOINT if PERF_EVENTS
> > select HAVE_IRQ_TIME_ACCOUNTING
> > +   select HAVE_KVM if OF
> 
> Honestly, I'd just drop the 'if OF' here. We select it unconditionally a
> few lines below and so I think it's more confusing to have the check.

Work for me.  I all but flipped a coin when deciding whether or not to keep the
OF dependency.

Thanks!
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-09-29 Thread Marcelo Tosatti
Oliver,

Do you have any numbers for the improvement in guests CLOCK_REALTIME
accuracy across migration, when this is in place?

On Thu, Sep 16, 2021 at 06:15:35PM +, Oliver Upton wrote:
> Handling the migration of TSCs correctly is difficult, in part because
> Linux does not provide userspace with the ability to retrieve a (TSC,
> realtime) clock pair for a single instant in time. In lieu of a more
> convenient facility, KVM can report similar information in the kvm_clock
> structure.
> 
> Provide userspace with a host TSC & realtime pair iff the realtime clock
> is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> realtime value, advance the KVM clock by the amount of elapsed time. Do
> not step the KVM clock backwards, though, as it is a monotonic
> oscillator.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Oliver Upton 
> ---
>  Documentation/virt/kvm/api.rst  | 42 ++---
>  arch/x86/include/asm/kvm_host.h |  3 +++
>  arch/x86/kvm/x86.c  | 36 +---
>  include/uapi/linux/kvm.h|  7 +-
>  4 files changed, 70 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a6729c8cf063..d0b9c986cf6c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -993,20 +993,34 @@ such as migration.
>  When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
>  set of bits that KVM can return in struct kvm_clock_data's flag member.
>  
> -The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
> -value is the exact kvmclock value seen by all VCPUs at the instant
> -when KVM_GET_CLOCK was called.  If clear, the returned value is simply
> -CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> -with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
> -but the exact value read by each VCPU could differ, because the host
> -TSC is not stable.
> +FLAGS:
> +
> +KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
> +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
> +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
> +offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
> +to make all VCPUs follow this clock, but the exact value read by each
> +VCPU could differ, because the host TSC is not stable.
> +
> +KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
> +structure is populated with the value of the host's real time
> +clocksource at the instant when KVM_GET_CLOCK was called. If clear,
> +the `realtime` field does not contain a value.
> +
> +KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
> +structure is populated with the value of the host's timestamp counter (TSC)
> +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
> +does not contain a value.
>  
>  ::
>  
>struct kvm_clock_data {
>   __u64 clock;  /* kvmclock current value */
>   __u32 flags;
> - __u32 pad[9];
> + __u32 pad0;
> + __u64 realtime;
> + __u64 host_tsc;
> + __u32 pad[4];
>};
>  
>  
> @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value 
> specified in its parameter.
>  In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on 
> scenarios
>  such as migration.
>  
> +FLAGS:
> +
> +KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` 
> field
> +with the value of the host's real time clocksource at the instant when
> +KVM_SET_CLOCK was called. The difference in elapsed time is added to the 
> final
> +kvmclock value that will be provided to guests.
> +
>  ::
>  
>struct kvm_clock_data {
>   __u64 clock;  /* kvmclock current value */
>   __u32 flags;
> - __u32 pad[9];
> + __u32 pad0;
> + __u64 realtime;
> + __u64 host_tsc;
> + __u32 pad[4];
>};
>  
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index be6805fc0260..9c34b5b63e39 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
>  
>  int alloc_all_memslots_rmaps(struct kvm *kvm);
>  
> +#define KVM_CLOCK_VALID_FLAGS
> \
> + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 523c4e5c109f..cb5d5cad5124 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct 
> kvm_clock_data *data)
>   get_cpu();
>  
>   if (__this_cpu_read(cpu_tsc_khz)) {
> +#ifdef CONFIG_X86_64
> + struct timespec64 ts;
> +
> + if (kvm_get_walltime_and_clockread(, >host_tsc)) {
> + 

Re: [PATCH 4/5] KVM: arm64: Prevent re-finalisation of pKVM for a given CPU

2021-09-29 Thread Quentin Perret
On Thursday 23 Sep 2021 at 12:22:55 (+0100), Will Deacon wrote:
> __pkvm_prot_finalize() completes the deprivilege of the host when pKVM
> is in use by installing a stage-2 translation table for the calling CPU.
> 
> Issuing the hypercall multiple times for a given CPU makes little sense,
> but in such a case just return early with -EPERM rather than go through
> the whole page-table dance again.
> 
> Cc: Marc Zyngier 
> Cc: Quentin Perret 
> Signed-off-by: Will Deacon 
> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index bacd493a4eac..cafe17e5fa8f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -123,6 +123,9 @@ int __pkvm_prot_finalize(void)
>   struct kvm_s2_mmu *mmu = _kvm.arch.mmu;
>   struct kvm_nvhe_init_params *params = this_cpu_ptr(_init_params);
>  
> + if (params->hcr_el2 & HCR_VM)
> + return -EPERM;

And you check this rather than the static key because we flip it upfront
I guess. Makes sense to me, but maybe a little comment would be useful :)
In any case:

Reviewed-by: Quentin Perret 

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/5] KVM: arm64: Reject stub hypercalls after pKVM has been initialised

2021-09-29 Thread Quentin Perret
On Thursday 23 Sep 2021 at 12:22:53 (+0100), Will Deacon wrote:
> The stub hypercalls provide mechanisms to reset and replace the EL2 code,
> so uninstall them once pKVM has been initialised in order to ensure the
> integrity of the hypervisor code.
> 
> To ensure pKVM initialisation remains functional, split cpu_hyp_reinit()
> into two helper functions to separate usage of the stub from usage of
> pkvm hypercalls either side of __pkvm_init on the boot CPU.
> 
> Cc: Marc Zyngier 
> Cc: Quentin Perret 
> Signed-off-by: Will Deacon 

Reviewed-by: Quentin Perret 

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/5] KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall

2021-09-29 Thread Quentin Perret
On Thursday 23 Sep 2021 at 12:22:54 (+0100), Will Deacon wrote:
> If the __pkvm_prot_finalize hypercall returns an error, we WARN but fail
> to propagate the failure code back to kvm_arch_init().
> 
> Pass a pointer to a zero-initialised return variable so that failure
> to finalise the pKVM protections on a host CPU can be reported back to
> KVM.
> 
> Cc: Marc Zyngier 
> Cc: Quentin Perret 
> Signed-off-by: Will Deacon 
> ---
>  arch/arm64/kvm/arm.c | 30 +++---
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 9506cf88fa0e..13bbf35896cd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1986,9 +1986,25 @@ static int init_hyp_mode(void)
>   return err;
>  }
>  
> -static void _kvm_host_prot_finalize(void *discard)
> +static void _kvm_host_prot_finalize(void *arg)
>  {
> - WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
> + int *err = arg;
> +
> + if (WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize)))
> + WRITE_ONCE(*err, -EINVAL);
> +}

I was going to suggest to propagate the hypercall's error code directly,
but this becomes very racy so n/m...

But this got me thinking about what we should do when the hyp init fails
while the protected mode has been explicitly enabled on the kernel
cmdline. That is, if we continue and boot the kernel w/o KVM support,
then I don't know how e.g. EL3 can know that it shouldn't give keys to
VMs because the kernel (and EL2) can't be trusted. It feels like it is
the kernel's responsibility to do something while it _is_ still
trustworthy.

I guess we could make any error code fatal in kvm_arch_init() when
is_protected_kvm_enabled() is on, or something along those lines? Maybe
dependent on CONFIG_NVHE_EL2_DEBUG=n?

It's probably a bit theoretical because there really shouldn't be any
reason to fail hyp init in production when using a signed kernel image
etc etc, but then if that is the case the additional check I'm
suggesting shouldn't hurt and will give us some peace of mind. Thoughts?

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 05/10] KVM: arm64: selftests: Make vgic_init gic version agnostic

2021-09-29 Thread Eric Auger



On 9/28/21 8:47 PM, Ricardo Koller wrote:
> As a preparation for the next commits which will add some tests for
> GICv2, make aarch64/vgic_init GIC version agnostic. Add a new generic
> run_tests function(gic_dev_type) that starts all applicable tests using
> GICv3 or GICv2. GICv2 tests are attempted if GICv3 is not available in
> the system. There are currently no GICv2 tests, but the test passes now
> in GICv2 systems.
>
> Signed-off-by: Ricardo Koller 
> ---
>  .../testing/selftests/kvm/aarch64/vgic_init.c | 156 +++---
>  1 file changed, 95 insertions(+), 61 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> index 623f31a14326..896a29f2503d 100644
> --- a/tools/testing/selftests/kvm/aarch64/vgic_init.c
> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
> @@ -22,6 +22,9 @@
>  
>  #define GICR_TYPER 0x8
>  
> +#define VGIC_DEV_IS_V2(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V2)
> +#define VGIC_DEV_IS_V3(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V3)
> +
>  struct vm_gic {
>   struct kvm_vm *vm;
>   int gic_fd;
> @@ -30,8 +33,8 @@ struct vm_gic {
>  static int max_ipa_bits;
>  
>  /* helper to access a redistributor register */
> -static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
> -  uint32_t *val, bool write)
> +static int access_v3_redist_reg(int gicv3_fd, int vcpu, int offset,
> + uint32_t *val, bool write)
>  {
>   uint64_t attr = REG_OFFSET(vcpu, offset);
>  
> @@ -58,7 +61,7 @@ static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
>   return 0;
>  }
>  
> -static struct vm_gic vm_gic_create(void)
> +static struct vm_gic vm_gic_v3_create(void)
>  {
>   struct vm_gic v;
>  
> @@ -80,7 +83,7 @@ static void vm_gic_destroy(struct vm_gic *v)
>   * device gets created, a legacy RDIST region is set at @0x0
>   * and a DIST region is set @0x6
>   */
> -static void subtest_dist_rdist(struct vm_gic *v)
> +static void subtest_v3_dist_rdist(struct vm_gic *v)
>  {
>   int ret;
>   uint64_t addr;
> @@ -145,7 +148,7 @@ static void subtest_dist_rdist(struct vm_gic *v)
>  }
>  
>  /* Test the new REDIST region API */
> -static void subtest_redist_regions(struct vm_gic *v)
> +static void subtest_v3_redist_regions(struct vm_gic *v)
>  {
>   uint64_t addr, expected_addr;
>   int ret;
> @@ -249,7 +252,7 @@ static void subtest_redist_regions(struct vm_gic *v)
>   * VGIC KVM device is created and initialized before the secondary CPUs
>   * get created
>   */
> -static void test_vgic_then_vcpus(void)
> +static void test_v3_vgic_then_vcpus(uint32_t gic_dev_type)
>  {
>   struct vm_gic v;
>   int ret, i;
> @@ -257,7 +260,7 @@ static void test_vgic_then_vcpus(void)
>   v.vm = vm_create_default(0, 0, guest_code);
>   v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false);
>  
> - subtest_dist_rdist();
> + subtest_v3_dist_rdist();
>  
>   /* Add the rest of the VCPUs */
>   for (i = 1; i < NR_VCPUS; ++i)
> @@ -270,14 +273,14 @@ static void test_vgic_then_vcpus(void)
>  }
>  
>  /* All the VCPUs are created before the VGIC KVM device gets initialized */
> -static void test_vcpus_then_vgic(void)
> +static void test_v3_vcpus_then_vgic(uint32_t gic_dev_type)
>  {
>   struct vm_gic v;
>   int ret;
>  
> - v = vm_gic_create();
> + v = vm_gic_v3_create();
>  
> - subtest_dist_rdist();
> + subtest_v3_dist_rdist();
>  
>   ret = run_vcpu(v.vm, 3);
>   TEST_ASSERT(ret == -EINVAL, "dist/rdist overlap detected on 1st vcpu 
> run");
> @@ -285,15 +288,15 @@ static void test_vcpus_then_vgic(void)
>   vm_gic_destroy();
>  }
>  
> -static void test_new_redist_regions(void)
> +static void test_v3_new_redist_regions(void)
>  {
>   void *dummy = NULL;
>   struct vm_gic v;
>   uint64_t addr;
>   int ret;
>  
> - v = vm_gic_create();
> - subtest_redist_regions();
> + v = vm_gic_v3_create();
> + subtest_v3_redist_regions();
>   kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true);
>  
> @@ -303,8 +306,8 @@ static void test_new_redist_regions(void)
>  
>   /* step2 */
>  
> - v = vm_gic_create();
> - subtest_redist_regions();
> + v = vm_gic_v3_create();
> + subtest_v3_redist_regions();
>  
>   addr = REDIST_REGION_ATTR_ADDR(1, 0x28, 0, 2);
>   kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> @@ -317,8 +320,8 @@ static void test_new_redist_regions(void)
>  
>   /* step 3 */
>  
> - v = vm_gic_create();
> - subtest_redist_regions();
> + v = vm_gic_v3_create();
> + subtest_v3_redist_regions();
>  
>   _kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
> KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, dummy, true);
> @@ -338,7 +341,7 @@ static void test_new_redist_regions(void)
>   

Re: [PATCH v3 03/10] KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size

2021-09-29 Thread Eric Auger
Hi Ricardo,

On 9/28/21 8:47 PM, Ricardo Koller wrote:
> Verify that the GICv2 CPU interface does not extend beyond the
> VM-specified IPA range (phys_size).
>
>   base + size > phys_size AND base < phys_size
>
> Add the missing check into kvm_vgic_addr() which is called when setting
> the region. This patch also enables some superfluous checks for the
> distributor (vgic_check_ioaddr was enough as alignment == size for the
> distributors).
>
> Signed-off-by: Ricardo Koller 
> ---
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c 
> b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index f714aded67b2..b379eb81fddb 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -79,7 +79,7 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 
> *addr, bool write)
>  {
>   int r = 0;
>   struct vgic_dist *vgic = >arch.vgic;
> - phys_addr_t *addr_ptr, alignment;
> + phys_addr_t *addr_ptr, alignment, size;
>   u64 undef_value = VGIC_ADDR_UNDEF;
>  
>   mutex_lock(>lock);
> @@ -88,16 +88,19 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
> u64 *addr, bool write)
>   r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
>   addr_ptr = >vgic_dist_base;
>   alignment = SZ_4K;
> + size = KVM_VGIC_V2_DIST_SIZE;
>   break;
>   case KVM_VGIC_V2_ADDR_TYPE_CPU:
>   r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
>   addr_ptr = >vgic_cpu_base;
>   alignment = SZ_4K;
> + size = KVM_VGIC_V2_CPU_SIZE;
>   break;
>   case KVM_VGIC_V3_ADDR_TYPE_DIST:
>   r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V3);
>   addr_ptr = >vgic_dist_base;
>   alignment = SZ_64K;
> + size = KVM_VGIC_V3_DIST_SIZE;
>   break;
>   case KVM_VGIC_V3_ADDR_TYPE_REDIST: {
>   struct vgic_redist_region *rdreg;
> @@ -162,7 +165,7 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
> u64 *addr, bool write)
>   goto out;
>  
>   if (write) {
> - r = vgic_check_ioaddr(kvm, addr_ptr, *addr, alignment);
> + r = vgic_check_iorange(kvm, addr_ptr, *addr, alignment, size);
>   if (!r)
>   *addr_ptr = *addr;
>   } else {
Looks god to me

Reviewed-by: Eric Auger 


Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 01/10] kvm: arm64: vgic: Introduce vgic_check_iorange

2021-09-29 Thread Eric Auger
Hi Ricardo,

On 9/28/21 8:47 PM, Ricardo Koller wrote:
> Add the new vgic_check_iorange helper that checks that an iorange is
> sane: the start address and size have valid alignments, the range is
> within the addressable PA range, start+size doesn't overflow, and the
> start wasn't already defined.
>
> No functional change.
>
> Signed-off-by: Ricardo Koller 
> ---
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 22 ++
>  arch/arm64/kvm/vgic/vgic.h|  4 
>  2 files changed, 26 insertions(+)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c 
> b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index 7740995de982..f714aded67b2 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -29,6 +29,28 @@ int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
>   return 0;
>  }
>  
> +int vgic_check_iorange(struct kvm *kvm, phys_addr_t *ioaddr,
> +phys_addr_t addr, phys_addr_t alignment,
> +phys_addr_t size)
> +{
> + int ret;
> +
> + ret = vgic_check_ioaddr(kvm, ioaddr, addr, alignment);
nit: not related to this patch but I am just wondering why we are
passing phys_addr_t *ioaddr downto vgic_check_ioaddr and thus to

vgic_check_iorange()? This must be a leftover of some old code?

> + if (ret)
> + return ret;
> +
> + if (!IS_ALIGNED(size, alignment))
> + return -EINVAL;
> +
> + if (addr + size < addr)
> + return -EINVAL;
> +
> + if (addr + size > kvm_phys_size(kvm))
> + return -E2BIG;
> +
> + return 0;
> +}
> +
>  static int vgic_check_type(struct kvm *kvm, int type_needed)
>  {
>   if (kvm->arch.vgic.vgic_model != type_needed)
> diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> index 14a9218641f5..c4df4dcef31f 100644
> --- a/arch/arm64/kvm/vgic/vgic.h
> +++ b/arch/arm64/kvm/vgic/vgic.h
> @@ -175,6 +175,10 @@ void vgic_irq_handle_resampling(struct vgic_irq *irq,
>  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> phys_addr_t addr, phys_addr_t alignment);
>  
> +int vgic_check_iorange(struct kvm *kvm, phys_addr_t *ioaddr,
> +phys_addr_t addr, phys_addr_t alignment,
> +phys_addr_t size);
> +
>  void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu);
>  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int 
> lr);
>  void vgic_v2_clear_lr(struct kvm_vcpu *vcpu, int lr);
Besides
Reviewed-by: Eric Auger 
Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 04/10] KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size

2021-09-29 Thread Eric Auger



On 9/28/21 8:47 PM, Ricardo Koller wrote:
> Verify that the ITS region does not extend beyond the VM-specified IPA
> range (phys_size).
>
>   base + size > phys_size AND base < phys_size
>
> Add the missing check into vgic_its_set_attr() which is called when
> setting the region.
>
> Signed-off-by: Ricardo Koller 
> ---
>  arch/arm64/kvm/vgic/vgic-its.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 61728c543eb9..321743b87002 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2710,8 +2710,8 @@ static int vgic_its_set_attr(struct kvm_device *dev,
>   if (copy_from_user(, uaddr, sizeof(addr)))
>   return -EFAULT;
>  
> - ret = vgic_check_ioaddr(dev->kvm, >vgic_its_base,
> - addr, SZ_64K);
> + ret = vgic_check_iorange(dev->kvm, >vgic_its_base,
> +  addr, SZ_64K, KVM_VGIC_V3_ITS_SIZE);
>   if (ret)
>   return ret;
>  
Reviewed-by: Eric Auger 

Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 02/10] KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size

2021-09-29 Thread Eric Auger
Hi Ricardo,

On 9/28/21 8:47 PM, Ricardo Koller wrote:
> Verify that the redistributor regions do not extend beyond the
> VM-specified IPA range (phys_size). This can happen when using
> KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
> with:
>
>   base + size > phys_size AND base < phys_size
>
> Add the missing check into vgic_v3_alloc_redist_region() which is called
> when setting the regions, and into vgic_v3_check_base() which is called
> when attempting the first vcpu-run. The vcpu-run check does not apply to
> KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
> before the first vcpu-run. Note that using the REDIST_REGIONS API
> results in a different check, which already exists, at first vcpu run:
> that the number of redist regions is enough for all vcpus.
>
> Finally, this patch also enables some extra tests in
> vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
> redist api: like checking that the REDIST region can fit all the already
> created vcpus.
>
> Signed-off-by: Ricardo Koller 
> ---
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 6 --
>  arch/arm64/kvm/vgic/vgic-v3.c  | 4 
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index a09cdc0b953c..9be02bf7865e 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -796,7 +796,9 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, 
> uint32_t index,
>   struct vgic_dist *d = >arch.vgic;
>   struct vgic_redist_region *rdreg;
>   struct list_head *rd_regions = >rd_regions;
> - size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
> + int nr_vcpus = atomic_read(>online_vcpus);
> + size_t size = count ? count * KVM_VGIC_V3_REDIST_SIZE
> + : nr_vcpus * KVM_VGIC_V3_REDIST_SIZE;

This actually fixes theĀ  vgic_dist_overlap(kvm, base, size=0)

in case the number of online-vcpus at that time is less than the final
one (1st run), if count=0 (legacy API) do we ever check that the RDIST
(with accumulated vcpu rdists) does not overlap with dist.
in other words shouldn't we call vgic_dist_overlap(kvm, base, size)
again in vgic_v3_check_base().

Thanks

Eric

>   int ret;
>  
>   /* cross the end of memory ? */
> @@ -840,7 +842,7 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, 
> uint32_t index,
>  
>   rdreg->base = VGIC_ADDR_UNDEF;
>  
> - ret = vgic_check_ioaddr(kvm, >base, base, SZ_64K);
> + ret = vgic_check_iorange(kvm, >base, base, SZ_64K, size);
>   if (ret)
>   goto free;
>  
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 21a6207fb2ee..27ee674631b3 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -486,6 +486,10 @@ bool vgic_v3_check_base(struct kvm *kvm)
>   if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) <
>   rdreg->base)
>   return false;
> +
> + if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) >
> + kvm_phys_size(kvm))
> + return false;
>   }
>  
>   if (IS_VGIC_ADDR_UNDEF(d->vgic_dist_base))

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/5] KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3

2021-09-29 Thread Marc Zyngier
Hi Alex,

On Wed, 29 Sep 2021 16:29:09 +0100,
Alexandru Elisei  wrote:
> 
> Hi Marc,
> 
> On 9/24/21 09:25, Marc Zyngier wrote:
> > Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value
> > visible on the host, even if we were running a GICv2-enabled VM
> > on a GICv3+compat host.
> >
> > That's fine, but we also now have the case of a host that does not
> > expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is
> > confusing. Thank you M1.
> >
> > Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1
> > when a GICv3 is exposed to the guest. This also hides a GICv4.1
> > CPU interface from the guest which has no business knowing about
> > the v4.1 extension.
> 
> Had a look at the gic-v3 driver, and as far as I can tell it does
> not check that a GICv3 is advertised in ID_AA64PFR0_EL1. If I didn't
> get this wrong, then this patch is to ensure architectural
> compliance for a guest even if the hardware is not necessarily
> compliant, right?

Indeed. Not having this made some of my own tests fail on M1 as they
rely on ID_AA64PFR0_EL1.GIC being correct. I also pondered setting it
to 0 when emulating a GICv2, but that'd be a change in behaviour, and
I want to think a bit more about the effects of that.

> 
> GICv4.1 is an extension to GICv4 (which itself was an extension to
> GICv3) to add support for virtualization features (virtual SGIs), so
> I don't see any harm in hiding it from the guest, since the guest
> cannot virtual SGIs.

Indeed. The guest already has another way to look into this by
checking whether the distributor allows active-less SGIs.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Depend on HAVE_KVM => OF instead of directly on OF

2021-09-29 Thread Will Deacon
On Tue, Sep 21, 2021 at 03:22:31PM -0700, Sean Christopherson wrote:
> Select HAVE_KVM if the KVM dependency is met (OF / Open Firmware), and
> make KVM depend on HAVE_KVM instead of directly on OF.  This fixes the
> oddity where arm64 configs can end up with KVM=y and HAVE_KVM=n, and will
> hopefully prevent breakage if there are future users of HAVE_KVM.
> 
> Note, arm64 unconditionally selects OF, and has always done so (see
> commit 8c2c3df31e3b ("arm64: Build infrastructure").  Keep the somewhat
> pointless HAVE_KVM dependency on OF to document that KVM requires Open
> Firmware support.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/arm64/Kconfig | 1 +
>  arch/arm64/kvm/Kconfig | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b5b13a932561..38c0f36a5ed4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -187,6 +187,7 @@ config ARM64
>   select HAVE_GCC_PLUGINS
>   select HAVE_HW_BREAKPOINT if PERF_EVENTS
>   select HAVE_IRQ_TIME_ACCOUNTING
> + select HAVE_KVM if OF

Honestly, I'd just drop the 'if OF' here. We select it unconditionally a
few lines below and so I think it's more confusing to have the check.

With that:

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/5] KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3

2021-09-29 Thread Alexandru Elisei
Hi Marc,

On 9/24/21 09:25, Marc Zyngier wrote:
> Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value
> visible on the host, even if we were running a GICv2-enabled VM
> on a GICv3+compat host.
>
> That's fine, but we also now have the case of a host that does not
> expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is
> confusing. Thank you M1.
>
> Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1
> when a GICv3 is exposed to the guest. This also hides a GICv4.1
> CPU interface from the guest which has no business knowing about
> the v4.1 extension.

Had a look at the gic-v3 driver, and as far as I can tell it does not check 
that a
GICv3 is advertised in ID_AA64PFR0_EL1. If I didn't get this wrong, then this
patch is to ensure architectural compliance for a guest even if the hardware is
not necessarily compliant, right?

GICv4.1 is an extension to GICv4 (which itself was an extension to GICv3) to add
support for virtualization features (virtual SGIs), so I don't see any harm in
hiding it from the guest, since the guest cannot virtual SGIs.

Thanks,

Alex

>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1d46e185f31e..0e8fc29df19c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1075,6 +1075,11 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>   val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV2), 
> (u64)vcpu->kvm->arch.pfr0_csv2);
>   val &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_CSV3);
>   val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV3), 
> (u64)vcpu->kvm->arch.pfr0_csv3);
> + if (irqchip_in_kernel(vcpu->kvm) &&
> + vcpu->kvm->arch.vgic.vgic_model == 
> KVM_DEV_TYPE_ARM_VGIC_V3) {
> + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_GIC);
> + val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_GIC), 
> 1);
> + }
>   break;
>   case SYS_ID_AA64PFR1_EL1:
>   val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE);
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 3/7] KVM: x86: Fix potential race in KVM_GET_CLOCK

2021-09-29 Thread Marcelo Tosatti
On Thu, Sep 16, 2021 at 06:15:34PM +, Oliver Upton wrote:
> Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock
> outside of the pvclock sync lock. This is problematic, as the clock
> value written to the user may or may not actually correspond to a stable
> TSC.
> 
> Fix the race by populating the entire kvm_clock_data structure behind
> the pvclock_gtod_sync_lock.
> 
> Suggested-by: Sean Christopherson 
> Signed-off-by: Oliver Upton 

ACK patches 1-3, still reviewing the remaining ones...

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-09-29 Thread Paolo Bonzini

On 28/09/21 20:53, Marcelo Tosatti wrote:

+KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
+structure is populated with the value of the host's timestamp counter (TSC)
+at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
+does not contain a value.

If the host TSCs are not stable, then KVM_CLOCK_HOST_TSC bit (and
host_tsc field) are ambiguous. Shouldnt exposing them be conditional on
stable TSC for the host ?



Yes, good point.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 06/14] KVM: Drop obsolete kvm_arch_vcpu_block_finish()

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:20PM -0700, Sean Christopherson wrote:
> Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
> nops.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  arch/arm64/include/asm/kvm_host.h   | 1 -
>  arch/mips/include/asm/kvm_host.h| 1 -
>  arch/powerpc/include/asm/kvm_host.h | 1 -
>  arch/s390/include/asm/kvm_host.h| 2 --
>  arch/s390/kvm/kvm-s390.c| 5 -
>  arch/x86/include/asm/kvm_host.h | 2 --
>  virt/kvm/kvm_main.c | 1 -
>  7 files changed, 13 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index f8be56d5342b..4e0ad0fff540 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -716,7 +716,6 @@ void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/mips/include/asm/kvm_host.h 
> b/arch/mips/include/asm/kvm_host.h
> index 696f6b009377..72b90d45a46e 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -897,7 +897,6 @@ static inline void kvm_arch_memslots_updated(struct kvm 
> *kvm, u64 gen) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
>  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
>  int kvm_arch_flush_remote_tlb(struct kvm *kvm);
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 59cb38b04ede..8a84448d77a6 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -864,6 +864,5 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu 
> *vcpu, int cpu) {}
>  static inline void kvm_arch_exit(void) {}
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/s390/include/asm/kvm_host.h 
> b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..a22c9266ea05 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -1010,6 +1010,4 @@ static inline void kvm_arch_flush_shadow_memslot(struct 
> kvm *kvm,
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
> -void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu);
> -
>  #endif
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 08ed68639a21..17fabb260c35 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -5080,11 +5080,6 @@ static inline unsigned long nonhyp_mask(int i)
>   return 0xUL >> (nonhyp_fai << 4);
>  }
>  
> -void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu)
> -{
> -
> -}
> -
>  static int __init kvm_s390_init(void)
>  {
>   int i;
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1e0e909b98b2..4e8c21083bdb 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1916,8 +1916,6 @@ static inline void kvm_arch_vcpu_unblocking(struct 
> kvm_vcpu *vcpu)
>   static_call_cond(kvm_x86_vcpu_unblocking)(vcpu);
>  }
>  
> -static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> -
>  static inline int kvm_cpu_get_apicid(int mps_cpu)
>  {
>  #ifdef CONFIG_X86_LOCAL_APIC
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2ba22b68af3b..2015a1f532ce 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3301,7 +3301,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   }
>  
>   trace_kvm_vcpu_wakeup(block_ns, waited, vcpu_valid_wakeup(vcpu));
> - kvm_arch_vcpu_block_finish(vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>  
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 14/14] KVM: x86: Invoke kvm_vcpu_block() directly for non-HALTED wait states

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:28PM -0700, Sean Christopherson wrote:
> Call kvm_vcpu_block() directly for all wait states except HALTED so that
> kvm_vcpu_halt() is no longer a misnomer on x86.
> 
> Functionally, this means KVM will never attempt halt-polling or adjust
> vcpu->halt_poll_ns for INIT_RECEIVED (a.k.a. Wait-For-SIPI (WFS)) or
> AP_RESET_HOLD; UNINITIALIZED is handled in kvm_arch_vcpu_ioctl_run(),
> and x86 doesn't use any other "wait" states.
> 
> As mentioned above, the motivation of this is purely so that "halt" isn't
> overloaded on x86, e.g. in KVM's stats.  Skipping halt-polling for WFS
> (and RESET_HOLD) has no meaningful effect on guest performance as there
> are typically single-digit numbers of INIT-SIPI sequences per AP vCPU,
> per boot, versus thousands of HLTs just to boot to console.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  arch/x86/kvm/x86.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b444f9315766..a0f313c4bc49 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9893,7 +9893,10 @@ static inline int vcpu_block(struct kvm *kvm, struct 
> kvm_vcpu *vcpu)
>   if (!kvm_arch_vcpu_runnable(vcpu) &&
>   (!kvm_x86_ops.pre_block || static_call(kvm_x86_pre_block)(vcpu) == 
> 0)) {
>   srcu_read_unlock(>srcu, vcpu->srcu_idx);
> - kvm_vcpu_halt(vcpu);
> + if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
> + kvm_vcpu_halt(vcpu);
> + else
> + kvm_vcpu_block(vcpu);
>   vcpu->srcu_idx = srcu_read_lock(>srcu);
>  
>   if (kvm_x86_ops.post_block)
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 02/14] KVM: Update halt-polling stats if and only if halt-polling was attempted

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:16PM -0700, Sean Christopherson wrote:
> Don't update halt-polling stats if halt-polling wasn't attempted.  This
> is a nop as @poll_ns is guaranteed to be '0' (poll_end == start), but it
> will allow a future patch to move the histogram stats into the helper to
> resolve a discrepancy in what is considered a "successful" halt-poll.
> 
> No functional change intended.
> 
> Cc: David Matlack 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  virt/kvm/kvm_main.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 768a4cbb26a6..8b33f5045b4d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3214,6 +3214,7 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 
> poll_ns, bool waited)
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
>   bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
> + bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
>   ktime_t start, cur, poll_end;
>   bool waited = false;
>   u64 block_ns;
> @@ -3221,7 +3222,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   kvm_arch_vcpu_blocking(vcpu);
>  
>   start = cur = poll_end = ktime_get();
> - if (vcpu->halt_poll_ns && halt_poll_allowed) {
> + if (do_halt_poll) {
>   ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
>  
>   ++vcpu->stat.generic.halt_attempted_poll;
> @@ -3273,8 +3274,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   kvm_arch_vcpu_unblocking(vcpu);
>   block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
>  
> - update_halt_poll_stats(
> - vcpu, ktime_to_ns(ktime_sub(poll_end, start)), waited);
> + if (do_halt_poll)
> + update_halt_poll_stats(
> + vcpu, ktime_to_ns(ktime_sub(poll_end, start)), waited);
>  
>   if (halt_poll_allowed) {
>   if (!vcpu_valid_wakeup(vcpu)) {
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 12/14] KVM: Don't redo ktime_get() when calculating halt-polling stop/deadline

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:26PM -0700, Sean Christopherson wrote:
> Calculate the halt-polling "stop" time using "cur" instead of redoing
> ktime_get().  In the happy case where hardware correctly predicts
> do_halt_poll, "cur" is only a few cycles old.  And if the branch is
> mispredicted, arguably that extra latency should count toward the
> halt-polling time.
> 
> In all likelihood, the numbers involved are in the noise and either
> approach is perfectly ok.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  virt/kvm/kvm_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2980d2b88559..80f78daa6b8d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3267,7 +3267,7 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  
>   start = cur = poll_end = ktime_get();
>   if (do_halt_poll) {
> - ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
> + ktime_t stop = ktime_add_ns(cur, vcpu->halt_poll_ns);
>  
>   do {
>   /*
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 13/14] KVM: x86: Directly block (instead of "halting") UNINITIALIZED vCPUs

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:27PM -0700, Sean Christopherson wrote:
> Go directly to kvm_vcpu_block() when handling the case where userspace
> attempts to run an UNINITIALIZED vCPU.  The vCPU isn't halted and its time
> spent in limbo arguably should not be factored into halt-polling as the
> behavior of the VM at this point is not at all indicative of the behavior
> of the VM once it is up and running, i.e. executing HLT in idle tasks.
> 
> Note, because this case is encountered only on the first run of an AP vCPU,
> vcpu->halt_poll_ns is guaranteed to be '0', and so KVM will not attempt
> halt-polling, i.e. this really only affects the post-block bookkeeping.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  arch/x86/kvm/x86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0d71c73a61bb..b444f9315766 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10127,7 +10127,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>   r = -EINTR;
>   goto out;
>   }
> - kvm_vcpu_halt(vcpu);
> + kvm_vcpu_block(vcpu);
>   if (kvm_apic_accept_events(vcpu) < 0) {
>   r = 0;
>   goto out;
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/14] KVM: Refactor and document halt-polling stats update helper

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:17PM -0700, Sean Christopherson wrote:
> Add a comment to document that halt-polling is considered successful even
> if the polling loop itself didn't detect a wake event, i.e. if a wake
> event was detect in the final kvm_vcpu_check_block().  Invert the param
> to the update helper so that the helper is a dumb function that is "told"
> whether or not polling was successful, as opposed to having it determinine
> success/failure based on blocking behavior.
> 
> Opportunistically tweak the params to the update helper to reduce the
> line length for the call site so that it fits on a single line, and so
> that the prototype conforms to the more traditional kernel style.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  virt/kvm/kvm_main.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b33f5045b4d..12fe91a0a4c8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3199,13 +3199,15 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>   return ret;
>  }
>  
> -static inline void
> -update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
> +static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t 
> start,
> +   ktime_t end, bool success)
>  {
> - if (waited)
> - vcpu->stat.generic.halt_poll_fail_ns += poll_ns;
> - else
> + u64 poll_ns = ktime_to_ns(ktime_sub(end, start));
> +
> + if (success)
>   vcpu->stat.generic.halt_poll_success_ns += poll_ns;
> + else
> + vcpu->stat.generic.halt_poll_fail_ns += poll_ns;
>  }
>  
>  /*
> @@ -3274,9 +3276,13 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   kvm_arch_vcpu_unblocking(vcpu);
>   block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
>  
> + /*
> +  * Note, halt-polling is considered successful so long as the vCPU was
> +  * never actually scheduled out, i.e. even if the wake event arrived
> +  * after of the halt-polling loop itself, but before the full wait.
> +  */
>   if (do_halt_poll)
> - update_halt_poll_stats(
> - vcpu, ktime_to_ns(ktime_sub(poll_end, start)), waited);
> + update_halt_poll_stats(vcpu, start, poll_end, !waited);
>  
>   if (halt_poll_allowed) {
>   if (!vcpu_valid_wakeup(vcpu)) {
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/14] KVM: Split out a kvm_vcpu_block() helper from kvm_vcpu_halt()

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:24PM -0700, Sean Christopherson wrote:
> Factor out the "block" part of kvm_vcpu_halt() so that x86 can emulate
> non-halt wait/sleep/block conditions that should not be subjected to
> halt-polling.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c  | 50 
>  2 files changed, 36 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d2a8be3fb9ba..655c2b24db2d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -966,6 +966,7 @@ void kvm_sigset_activate(struct kvm_vcpu *vcpu);
>  void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);
>  
>  void kvm_vcpu_halt(struct kvm_vcpu *vcpu);
> +bool kvm_vcpu_block(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
>  bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 280cf1dca7db..fe34457530c2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3199,6 +3199,34 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>   return ret;
>  }
>  
> +/*
> + * Block the vCPU until the vCPU is runnable, an event arrives, or a signal 
> is
> + * pending.  This is mostly used when halting a vCPU, but may also be used
> + * directly for other vCPU non-runnable states, e.g. x86's Wait-For-SIPI.
> + */
> +bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
> +{
> + bool waited = false;
> +
> + kvm_arch_vcpu_blocking(vcpu);
> +
> + prepare_to_rcuwait(>wait);
> + for (;;) {
> + set_current_state(TASK_INTERRUPTIBLE);
> +
> + if (kvm_vcpu_check_block(vcpu) < 0)
> + break;
> +
> + waited = true;
> + schedule();
> + }
> + finish_rcuwait(>wait);
> +
> + kvm_arch_vcpu_unblocking(vcpu);
> +
> + return waited;
> +}
> +
>  static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t 
> start,
> ktime_t end, bool success)
>  {
> @@ -3221,6 +3249,12 @@ static inline void update_halt_poll_stats(struct 
> kvm_vcpu *vcpu, ktime_t start,
>   }
>  }
>  
> +/*
> + * Emulate a vCPU halt condition, e.g. HLT on x86, WFI on arm, etc...  If 
> halt
> + * polling is enabled, busy wait for a short time before blocking to avoid 
> the
> + * expensive block+unblock sequence if a wake event arrives soon after the 
> vCPU
> + * is halted.
> + */
>  void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  {
>   bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
> @@ -3245,21 +3279,7 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>   } while (kvm_vcpu_can_poll(cur, stop));
>   }
>  
> - kvm_arch_vcpu_blocking(vcpu);
> -
> - prepare_to_rcuwait(>wait);
> - for (;;) {
> - set_current_state(TASK_INTERRUPTIBLE);
> -
> - if (kvm_vcpu_check_block(vcpu) < 0)
> - break;
> -
> - waited = true;
> - schedule();
> - }
> - finish_rcuwait(>wait);
> -
> - kvm_arch_vcpu_unblocking(vcpu);
> + waited = kvm_vcpu_block(vcpu);
>  
>   cur = ktime_get();
>   if (waited) {
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/14] KVM: Reconcile discrepancies in halt-polling stats

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:18PM -0700, Sean Christopherson wrote:
> Move the halt-polling "success" and histogram stats update into the
> dedicated helper to fix a discrepancy where the success/fail "time" stats
> consider polling successful so long as the wait is avoided, but the main
> "success" and histogram stats consider polling successful if and only if
> a wake event was detected by the halt-polling loop.
> 
> Move halt_attempted_poll to the helper as well so that all the stats are
> updated in a single location.  While it's a bit odd to update the stat
> well after the fact, practically speaking there's no meaningful advantage
> to updating before polling.
> 
> Note, there is a functional change in addition to the success vs. fail
> change.  The histogram updates previously called ktime_get() instead of
> using "cur".  But that change is desirable as it means all the stats are
> now updated with the same polling time, and avoids the extra ktime_get(),
> which isn't expensive but isn't free either.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  virt/kvm/kvm_main.c | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 12fe91a0a4c8..2ba22b68af3b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3202,12 +3202,23 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>  static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t 
> start,
> ktime_t end, bool success)
>  {
> + struct kvm_vcpu_stat_generic *stats = >stat.generic;
>   u64 poll_ns = ktime_to_ns(ktime_sub(end, start));
>  
> - if (success)
> - vcpu->stat.generic.halt_poll_success_ns += poll_ns;
> - else
> - vcpu->stat.generic.halt_poll_fail_ns += poll_ns;
> + ++vcpu->stat.generic.halt_attempted_poll;
> +
> + if (success) {
> + ++vcpu->stat.generic.halt_successful_poll;
> +
> + if (!vcpu_valid_wakeup(vcpu))
> + ++vcpu->stat.generic.halt_poll_invalid;
> +
> + stats->halt_poll_success_ns += poll_ns;
> + KVM_STATS_LOG_HIST_UPDATE(stats->halt_poll_success_hist, 
> poll_ns);
> + } else {
> + stats->halt_poll_fail_ns += poll_ns;
> + KVM_STATS_LOG_HIST_UPDATE(stats->halt_poll_fail_hist, poll_ns);
> + }
>  }
>  
>  /*
> @@ -3227,30 +3238,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   if (do_halt_poll) {
>   ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
>  
> - ++vcpu->stat.generic.halt_attempted_poll;
>   do {
>   /*
>* This sets KVM_REQ_UNHALT if an interrupt
>* arrives.
>*/
> - if (kvm_vcpu_check_block(vcpu) < 0) {
> - ++vcpu->stat.generic.halt_successful_poll;
> - if (!vcpu_valid_wakeup(vcpu))
> - ++vcpu->stat.generic.halt_poll_invalid;
> -
> - KVM_STATS_LOG_HIST_UPDATE(
> -   vcpu->stat.generic.halt_poll_success_hist,
> -   ktime_to_ns(ktime_get()) -
> -   ktime_to_ns(start));
> + if (kvm_vcpu_check_block(vcpu) < 0)
>   goto out;
> - }
>   cpu_relax();
>   poll_end = cur = ktime_get();
>   } while (kvm_vcpu_can_poll(cur, stop));
> -
> - KVM_STATS_LOG_HIST_UPDATE(
> - vcpu->stat.generic.halt_poll_fail_hist,
> - ktime_to_ns(ktime_get()) - ktime_to_ns(start));
>   }
>  
>  
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 09/14] KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:23PM -0700, Sean Christopherson wrote:
> Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting
> the actual "block" sequences into a separate helper (to be named
> kvm_vcpu_block()).  x86 will use the standalone block-only path to handle
> non-halt cases where the vCPU is not runnable.
> 
> Rename block_ns to halt_ns to match the new function name.
> 
> Opportunistically move an x86-specific comment to x86, and enhance it, too.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  arch/arm64/kvm/arch_timer.c   |  2 +-
>  arch/arm64/kvm/handle_exit.c  |  4 ++--
>  arch/arm64/kvm/psci.c |  2 +-
>  arch/mips/kvm/emulate.c   |  2 +-
>  arch/powerpc/kvm/book3s_pr.c  |  2 +-
>  arch/powerpc/kvm/book3s_pr_papr.c |  2 +-
>  arch/powerpc/kvm/booke.c  |  2 +-
>  arch/powerpc/kvm/powerpc.c|  2 +-
>  arch/s390/kvm/interrupt.c |  2 +-
>  arch/x86/kvm/x86.c| 11 +--
>  include/linux/kvm_host.h  |  2 +-
>  virt/kvm/kvm_main.c   | 20 +---
>  12 files changed, 29 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 3df67c127489..7e8396f74010 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -467,7 +467,7 @@ static void timer_save_state(struct arch_timer_context 
> *ctx)
>  }
>  
>  /*
> - * Schedule the background timer before calling kvm_vcpu_block, so that this
> + * Schedule the background timer before calling kvm_vcpu_halt, so that this
>   * thread is removed from its waitqueue and made runnable when there's a 
> timer
>   * interrupt to handle.
>   */
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 275a27368a04..08f823984712 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -82,7 +82,7 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
>   *
>   * WFE: Yield the CPU and come back to this vcpu when the scheduler
>   * decides to.
> - * WFI: Simply call kvm_vcpu_block(), which will halt execution of
> + * WFI: Simply call kvm_vcpu_halt(), which will halt execution of
>   * world-switches and schedule other host processes until there is an
>   * incoming IRQ or FIQ to the VM.
>   */
> @@ -95,7 +95,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
>   } else {
>   trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
>   vcpu->stat.wfi_exit_stat++;
> - kvm_vcpu_block(vcpu);
> + kvm_vcpu_halt(vcpu);
>   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>   }
>  
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 74c47d420253..e275b2ca08b9 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -46,7 +46,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu 
> *vcpu)
>* specification (ARM DEN 0022A). This means all suspend states
>* for KVM will preserve the register state.
>*/
> - kvm_vcpu_block(vcpu);
> + kvm_vcpu_halt(vcpu);
>   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>  
>   return PSCI_RET_SUCCESS;
> diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
> index 22e745e49b0a..b494d8d39290 100644
> --- a/arch/mips/kvm/emulate.c
> +++ b/arch/mips/kvm/emulate.c
> @@ -952,7 +952,7 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu 
> *vcpu)
>   if (!vcpu->arch.pending_exceptions) {
>   kvm_vz_lose_htimer(vcpu);
>   vcpu->arch.wait = 1;
> - kvm_vcpu_block(vcpu);
> + kvm_vcpu_halt(vcpu);
>  
>   /*
>* We we are runnable, then definitely go off to user space to
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 6bc9425acb32..0ced1b16f0e5 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -492,7 +492,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 
> msr)
>  
>   if (msr & MSR_POW) {
>   if (!vcpu->arch.pending_exceptions) {
> - kvm_vcpu_block(vcpu);
> + kvm_vcpu_halt(vcpu);
>   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>   vcpu->stat.generic.halt_wakeup++;
>  
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
> b/arch/powerpc/kvm/book3s_pr_papr.c
> index ac14239f3424..1f10e7dfcdd0 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -376,7 +376,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>   return kvmppc_h_pr_stuff_tce(vcpu);
>   case H_CEDE:
>   kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
> - kvm_vcpu_block(vcpu);
> + kvm_vcpu_halt(vcpu);
>   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>   

Re: [PATCH 11/14] KVM: stats: Add stat to detect if vcpu is currently blocking

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:25PM -0700, Sean Christopherson wrote:
> From: Jing Zhang 
> 
> Add a "blocking" stat that userspace can use to detect the case where a
> vCPU is not being run because of a vCPU/guest action, e.g. HLT or WFS on
> x86, WFI on arm64, etc...  Current guest/host/halt stats don't show this
> well, e.g. if a guest halts for a long period of time then the vCPU could
> appear pathologically blocked due to a host condition, when in reality the
> vCPU has been put into a not-runnable state by the guest.
> 
> Originally-by: Cannon Matthews 
> Suggested-by: Sean Christopherson 
> Signed-off-by: Jing Zhang 
> [sean: renamed stat to "blocking", massaged changelog]
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 
> ---
>  include/linux/kvm_host.h  | 3 ++-
>  include/linux/kvm_types.h | 1 +
>  virt/kvm/kvm_main.c   | 2 ++
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 655c2b24db2d..9bb1972e396a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1453,7 +1453,8 @@ struct _kvm_stats_desc {
>   STATS_DESC_LOGHIST_TIME_NSEC(VCPU_GENERIC, halt_poll_fail_hist,\
>   HALT_POLL_HIST_COUNT), \
>   STATS_DESC_LOGHIST_TIME_NSEC(VCPU_GENERIC, halt_wait_hist, \
> - HALT_POLL_HIST_COUNT)
> + HALT_POLL_HIST_COUNT), \
> + STATS_DESC_ICOUNTER(VCPU_GENERIC, blocking)
>  
>  extern struct dentry *kvm_debugfs_dir;
>  
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 2237abb93ccd..c4f9257bf32d 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -94,6 +94,7 @@ struct kvm_vcpu_stat_generic {
>   u64 halt_poll_success_hist[HALT_POLL_HIST_COUNT];
>   u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
>   u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
> + u64 blocking;
>  };
>  
>  #define KVM_STATS_NAME_SIZE  48
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fe34457530c2..2980d2b88559 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3208,6 +3208,7 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
>   bool waited = false;
>  
> + vcpu->stat.generic.blocking = 1;
>   kvm_arch_vcpu_blocking(vcpu);
>  
>   prepare_to_rcuwait(>wait);
> @@ -3223,6 +3224,7 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   finish_rcuwait(>wait);
>  
>   kvm_arch_vcpu_unblocking(vcpu);
> + vcpu->stat.generic.blocking = 0;
>  
>   return waited;
>  }
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 08/14] KVM: x86: Tweak halt emulation helper names to free up kvm_vcpu_halt()

2021-09-29 Thread David Matlack
On Fri, Sep 24, 2021 at 05:55:22PM -0700, Sean Christopherson wrote:
> Rename a variety of HLT-related helpers to free up the function name
> "kvm_vcpu_halt" for future use in generic KVM code, e.g. to differentiate
> between "block" and "halt".
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson 

Reviewed-by: David Matlack 

> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/vmx/nested.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c  |  4 ++--
>  arch/x86/kvm/x86.c  | 13 +++--
>  4 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4e8c21083bdb..cfebef10b89c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1679,7 +1679,7 @@ int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
>  int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int 
> in);
>  int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
>  int kvm_emulate_halt(struct kvm_vcpu *vcpu);
> -int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
> +int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
>  int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
>  int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index eedcebf58004..f689e463b678 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3618,7 +3618,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
> launch)
>   !(nested_cpu_has(vmcs12, CPU_BASED_INTR_WINDOW_EXITING) &&
> (vmcs12->guest_rflags & X86_EFLAGS_IF))) {
>   vmx->nested.nested_run_pending = 0;
> - return kvm_vcpu_halt(vcpu);
> + return kvm_emulate_halt_noskip(vcpu);
>   }
>   break;
>   case GUEST_ACTIVITY_WAIT_SIPI:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d118daed0530..858f5f1f1273 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4740,7 +4740,7 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
>   if (kvm_emulate_instruction(vcpu, 0)) {
>   if (vcpu->arch.halt_request) {
>   vcpu->arch.halt_request = 0;
> - return kvm_vcpu_halt(vcpu);
> + return kvm_emulate_halt_noskip(vcpu);
>   }
>   return 1;
>   }
> @@ -5414,7 +5414,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
> *vcpu)
>  
>   if (vcpu->arch.halt_request) {
>   vcpu->arch.halt_request = 0;
> - return kvm_vcpu_halt(vcpu);
> + return kvm_emulate_halt_noskip(vcpu);
>   }
>  
>   /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b0c21d42f453..eade8a2bdccf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8643,7 +8643,7 @@ void kvm_arch_exit(void)
>  #endif
>  }
>  
> -static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
> +static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason)
>  {
>   ++vcpu->stat.halt_exits;
>   if (lapic_in_kernel(vcpu)) {
> @@ -8655,11 +8655,11 @@ static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int 
> state, int reason)
>   }
>  }
>  
> -int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> +int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu)
>  {
> - return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
> + return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
>  }
> -EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
> +EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip);
>  
>  int kvm_emulate_halt(struct kvm_vcpu *vcpu)
>  {
> @@ -8668,7 +8668,7 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
>* TODO: we might be squashing a GUESTDBG_SINGLESTEP-triggered
>* KVM_EXIT_DEBUG here.
>*/
> - return kvm_vcpu_halt(vcpu) && ret;
> + return kvm_emulate_halt_noskip(vcpu) && ret;
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_halt);
>  
> @@ -8676,7 +8676,8 @@ int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
>  {
>   int ret = kvm_skip_emulated_instruction(vcpu);
>  
> - return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, 
> KVM_EXIT_AP_RESET_HOLD) && ret;
> + return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD,
> + KVM_EXIT_AP_RESET_HOLD) && ret;
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
>  
> -- 
> 2.33.0.685.g46640cef36-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line

2021-09-29 Thread Suzuki K Poulose

On 03/09/2021 10:16, Marc Zyngier wrote:

Although KVM can be compiled out of the kernel, it cannot be disabled
at runtime. Allow this possibility by introducing a new mode that
will prevent KVM from initialising.

This is useful in the (limited) circumstances where you don't want
KVM to be available (what is wrong with you?), or when you want
to install another hypervisor instead (good luck with that).

Signed-off-by: Marc Zyngier 
---
  Documentation/admin-guide/kernel-parameters.txt |  3 +++
  arch/arm64/include/asm/kvm_host.h   |  1 +
  arch/arm64/kernel/idreg-override.c  |  1 +
  arch/arm64/kvm/arm.c| 14 +-
  4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 91ba391f9b32..cc5f68846434 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2365,6 +2365,9 @@
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.
  
+			none: Forcefully disable KVM and run in nVHE mode,

+ preventing KVM from ever initialising.
+
nvhe: Standard nVHE-based mode, without support for
  protected guests.
  
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h

index f8be56d5342b..019490c67976 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -58,6 +58,7 @@
  enum kvm_mode {
KVM_MODE_DEFAULT,
KVM_MODE_PROTECTED,
+   KVM_MODE_NONE,
  };
  enum kvm_mode kvm_get_mode(void);
  
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c

index d8e606fe3c21..57013c1b6552 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -95,6 +95,7 @@ static const struct {
charalias[FTR_ALIAS_NAME_LEN];
charfeature[FTR_ALIAS_OPTION_LEN];
  } aliases[] __initconst = {
+   { "kvm-arm.mode=none","id_aa64mmfr1.vh=0" },
{ "kvm-arm.mode=nvhe","id_aa64mmfr1.vh=0" },
{ "kvm-arm.mode=protected",   "id_aa64mmfr1.vh=0" },
{ "arm64.nobti",  "id_aa64pfr1.bt=0" },
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..cdc70e238316 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2064,6 +2064,11 @@ int kvm_arch_init(void *opaque)
return -ENODEV;
}
  
+	if (kvm_get_mode() == KVM_MODE_NONE) {

+   kvm_info("KVM disabled from command line\n");
+   return -ENODEV;
+   }
+
in_hyp_mode = is_kernel_in_hyp_mode();
  
  	if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||

@@ -2137,8 +2142,15 @@ static int __init early_kvm_mode_cfg(char *arg)
return 0;
}
  
-	if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode()))

+   if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) {
+   kvm_mode = KVM_MODE_DEFAULT;
return 0;
+   }
+
+   if (strcmp(arg, "none") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) {


nit: Does this really need to WARN here ? Unlike the "nvhe" case, if the
user wants to keep the KVM out of the picture for, say debugging
something, it is perfectly Ok to allow the kernel to be running at EL2
without having to change the Firmware to alter the landing EL for the
kernel ?

Otherwise,

Acked-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line

2021-09-29 Thread Will Deacon
On Fri, Sep 03, 2021 at 10:16:52AM +0100, Marc Zyngier wrote:
> Although KVM can be compiled out of the kernel, it cannot be disabled
> at runtime. Allow this possibility by introducing a new mode that
> will prevent KVM from initialising.
> 
> This is useful in the (limited) circumstances where you don't want
> KVM to be available (what is wrong with you?), or when you want
> to install another hypervisor instead (good luck with that).
> 
> Signed-off-by: Marc Zyngier 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +++
>  arch/arm64/include/asm/kvm_host.h   |  1 +
>  arch/arm64/kernel/idreg-override.c  |  1 +
>  arch/arm64/kvm/arm.c| 14 +-
>  4 files changed, 18 insertions(+), 1 deletion(-)

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 4/4] KVM: mmu: remove over-aggressive warnings

2021-09-29 Thread David Stevens
From: David Stevens 

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens 
---
 arch/x86/kvm/mmu/mmu.c | 7 ---
 virt/kvm/kvm_main.c| 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5a1adcc9cfbc..3b469df63bcf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -617,13 +617,6 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 
*sptep)
 
pfn = spte_to_pfn(old_spte);
 
-   /*
-* KVM does not hold the refcount of the page used by
-* kvm mmu, before reclaiming the page, we should
-* unmap it from mmu first.
-*/
-   WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
if (is_accessed_spte(old_spte))
kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 421146bd1360..c72ad995a359 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 * page_count() is zero to help detect bad usage of this helper.
 */
-   if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn
+   if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
return false;
 
return is_zone_device_page(pfn_to_page(pfn));
-- 
2.33.0.685.g46640cef36-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/4] KVM: x86/mmu: use gfn_to_pfn_page

2021-09-29 Thread David Stevens
From: David Stevens 

Covert usages of the deprecated gfn_to_pfn functions to the new
gfn_to_pfn_page functions.

Signed-off-by: David Stevens 
---
 arch/x86/kvm/mmu/mmu.c  | 43 -
 arch/x86/kvm/mmu/mmu_internal.h |  3 ++-
 arch/x86/kvm/mmu/paging_tmpl.h  | 23 +++---
 arch/x86/kvm/mmu/tdp_mmu.c  |  6 ++---
 arch/x86/kvm/mmu/tdp_mmu.h  |  4 +--
 arch/x86/kvm/x86.c  |  6 +++--
 6 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2d7e61122af8..5a1adcc9cfbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2783,8 +2783,9 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-bool no_dirty_log)
+static kvm_pfn_t pte_prefetch_gfn_to_pfn_page(struct kvm_vcpu *vcpu,
+ gfn_t gfn, bool no_dirty_log,
+ struct page **page)
 {
struct kvm_memory_slot *slot;
 
@@ -2792,7 +2793,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu 
*vcpu, gfn_t gfn,
if (!slot)
return KVM_PFN_ERR_FAULT;
 
-   return gfn_to_pfn_memslot_atomic(slot, gfn);
+   return gfn_to_pfn_page_memslot_atomic(slot, gfn, page);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2923,7 +2924,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
int max_level, kvm_pfn_t *pfnp,
-   bool huge_page_disallowed, int *req_level)
+   struct page *page, bool huge_page_disallowed,
+   int *req_level)
 {
struct kvm_memory_slot *slot;
kvm_pfn_t pfn = *pfnp;
@@ -2935,6 +2937,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t 
gfn,
if (unlikely(max_level == PG_LEVEL_4K))
return PG_LEVEL_4K;
 
+   if (!page)
+   return PG_LEVEL_4K;
+
if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
return PG_LEVEL_4K;
 
@@ -2984,7 +2989,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int 
cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-   int map_writable, int max_level, kvm_pfn_t pfn,
+   int map_writable, int max_level,
+   kvm_pfn_t pfn, struct page *page,
bool prefault, bool is_tdp)
 {
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2997,7 +3003,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, 
u32 error_code,
gfn_t gfn = gpa >> PAGE_SHIFT;
gfn_t base_gfn = gfn;
 
-   level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, ,
+   level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, , page,
huge_page_disallowed, _level);
 
trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3901,8 +3907,9 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu 
*vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-bool write, bool *writable, int *r)
+gpa_t cr2_or_gpa, kvm_pfn_t *pfn,
+hva_t *hva, bool write, bool *writable,
+struct page **page, int *r)
 {
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
bool async;
@@ -3936,8 +3943,8 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool 
prefault, gfn_t gfn,
}
 
async = false;
-   *pfn = __gfn_to_pfn_memslot(slot, gfn, false, ,
-   write, writable, hva);
+   *pfn = __gfn_to_pfn_page_memslot(slot, gfn, false, ,
+write, writable, hva, page);
if (!async)
return false; /* *pfn has correct page already */
 
@@ -3951,8 +3958,8 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool 
prefault, gfn_t gfn,
goto out_retry;
}
 
-   *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-   write, writable, hva);
+   *pfn = __gfn_to_pfn_page_memslot(slot, gfn, false, NULL,
+write, writable, hva, page);
 
 out_retry:
*r = RET_PF_RETRY;
@@ -3969,6 +3976,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
kvm_pfn_t pfn;
+   struct page *page;
hva_t hva;
int r;
 
@@ -3987,7 +3995,7 @@ static int 

[PATCH v4 3/4] KVM: arm64/mmu: use gfn_to_pfn_page

2021-09-29 Thread David Stevens
From: David Stevens 

Covert usages of the deprecated gfn_to_pfn functions to the new
gfn_to_pfn_page functions.

Signed-off-by: David Stevens 
---
 arch/arm64/kvm/mmu.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1a94a7ca48f2..dc03cc66858e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -829,7 +829,7 @@ static bool fault_supports_stage2_huge_mapping(struct 
kvm_memory_slot *memslot,
 static unsigned long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long hva, kvm_pfn_t *pfnp,
-   phys_addr_t *ipap)
+   struct page **page, phys_addr_t *ipap)
 {
kvm_pfn_t pfn = *pfnp;
 
@@ -838,7 +838,8 @@ transparent_hugepage_adjust(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
 * sure that the HVA and IPA are sufficiently aligned and that the
 * block map is contained within the memslot.
 */
-   if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
+   if (*page &&
+   fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
get_user_mapping_size(kvm, hva) >= PMD_SIZE) {
/*
 * The address we faulted on is backed by a transparent huge
@@ -859,10 +860,11 @@ transparent_hugepage_adjust(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
 * page accordingly.
 */
*ipap &= PMD_MASK;
-   kvm_release_pfn_clean(pfn);
+   put_page(*page);
pfn &= ~(PTRS_PER_PMD - 1);
-   get_page(pfn_to_page(pfn));
*pfnp = pfn;
+   *page = pfn_to_page(pfn);
+   get_page(*page);
 
return PMD_SIZE;
}
@@ -955,6 +957,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
short vma_shift;
gfn_t gfn;
kvm_pfn_t pfn;
+   struct page *page;
bool logging_active = memslot_is_logging(memslot);
unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
unsigned long vma_pagesize, fault_granule;
@@ -1056,8 +1059,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 */
smp_rmb();
 
-   pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-  write_fault, , NULL);
+   pfn = __gfn_to_pfn_page_memslot(memslot, gfn, false, NULL,
+   write_fault, , NULL, );
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
@@ -1102,7 +1105,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
vma_pagesize = fault_granule;
else
vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
-  hva, ,
+  hva,
+  , ,
   _ipa);
}
 
@@ -1142,14 +1146,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
-   kvm_set_pfn_dirty(pfn);
+   if (page)
+   kvm_set_pfn_dirty(pfn);
mark_page_dirty_in_slot(kvm, memslot, gfn);
}
 
 out_unlock:
spin_unlock(>mmu_lock);
-   kvm_set_pfn_accessed(pfn);
-   kvm_release_pfn_clean(pfn);
+   if (page) {
+   kvm_set_pfn_accessed(pfn);
+   put_page(page);
+   }
return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.33.0.685.g46640cef36-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 1/4] KVM: mmu: introduce new gfn_to_pfn_page functions

2021-09-29 Thread David Stevens
From: David Stevens 

Introduce new gfn_to_pfn_page functions that parallel existing
gfn_to_pfn functions. The new functions are identical except they take
an additional out parameter that is used to return the struct page if
the hva was resolved by gup. This allows callers to differentiate the
gup and follow_pte cases, which in turn allows callers to only touch the
page refcount when necessitated by gup.

The old gfn_to_pfn functions are depreciated, and all callers should be
migrated to the new gfn_to_pfn_page functions. In the interim, the
gfn_to_pfn functions are reimplemented as wrappers of the corresponding
gfn_to_pfn_page functions. The wrappers take a reference to the pfn's
page that had previously been taken in hva_to_pfn_remapped.

Signed-off-by: David Stevens 
---
 include/linux/kvm_host.h |  17 
 virt/kvm/kvm_main.c  | 196 +--
 2 files changed, 162 insertions(+), 51 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 041ca7f15ea4..5ee43afa7e27 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -868,6 +868,19 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot 
*slot, gfn_t gfn,
   bool atomic, bool *async, bool write_fault,
   bool *writable, hva_t *hva);
 
+kvm_pfn_t gfn_to_pfn_page(struct kvm *kvm, gfn_t gfn, struct page **page);
+kvm_pfn_t gfn_to_pfn_page_prot(struct kvm *kvm, gfn_t gfn,
+  bool write_fault, bool *writable,
+  struct page **page);
+kvm_pfn_t gfn_to_pfn_page_memslot(struct kvm_memory_slot *slot,
+ gfn_t gfn, struct page **page);
+kvm_pfn_t gfn_to_pfn_page_memslot_atomic(struct kvm_memory_slot *slot,
+gfn_t gfn, struct page **page);
+kvm_pfn_t __gfn_to_pfn_page_memslot(struct kvm_memory_slot *slot,
+   gfn_t gfn, bool atomic, bool *async,
+   bool write_fault, bool *writable,
+   hva_t *hva, struct page **page);
+
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
 void kvm_set_pfn_dirty(kvm_pfn_t pfn);
@@ -948,6 +961,10 @@ struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu 
*vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t 
gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+kvm_pfn_t kvm_vcpu_gfn_to_pfn_page_atomic(struct kvm_vcpu *vcpu, gfn_t gfn,
+ struct page **page);
+kvm_pfn_t kvm_vcpu_gfn_to_pfn_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+  struct page **page);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 439d3b4cd1a9..421146bd1360 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2204,9 +2204,9 @@ static inline int check_user_page_hwpoison(unsigned long 
addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-   bool *writable, kvm_pfn_t *pfn)
+   bool *writable, kvm_pfn_t *pfn,
+   struct page **page)
 {
-   struct page *page[1];
 
/*
 * Fast pin a writable pfn only if it is a write fault request
@@ -2217,7 +2217,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool 
write_fault,
return false;
 
if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-   *pfn = page_to_pfn(page[0]);
+   *pfn = page_to_pfn(*page);
 
if (writable)
*writable = true;
@@ -2232,10 +2232,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool 
write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-  bool *writable, kvm_pfn_t *pfn)
+  bool *writable, kvm_pfn_t *pfn, struct page **page)
 {
unsigned int flags = FOLL_HWPOISON;
-   struct page *page;
int npages = 0;
 
might_sleep();
@@ -2248,7 +2247,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
*async, bool write_fault,
if (async)
flags |= FOLL_NOWAIT;
 
-   npages = get_user_pages_unlocked(addr, 1, , flags);
+   npages = get_user_pages_unlocked(addr, 1, page, flags);
if (npages != 1)
return npages;
 
@@ -2258,11 +2257,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
*async, 

[PATCH v4 0/4] KVM: allow mapping non-refcounted pages

2021-09-29 Thread David Stevens
From: David Stevens 

This patch series adds support for mapping non-refcount VM_IO and
VM_PFNMAP memory into the guest.

Currently, the gfn_to_pfn functions require being able to pin the target
pfn, so they will fail if the pfn returned by follow_pte isn't a
ref-counted page.  However, the KVM secondary MMUs do not require that
the pfn be pinned, since they are integrated with the mmu notifier API.
This series adds a new set of gfn_to_pfn_page functions which parallel
the gfn_to_pfn functions but do not pin the pfn. The new functions
return the page from gup if it was present, so callers can use it and
call put_page when done.

The gfn_to_pfn functions should be depreciated, since as they are unsafe
due to relying on trying to obtain a struct page from a pfn returned by
follow_pte. I added new functions instead of simply adding another
optional parameter to the existing functions to make it easier to track
down users of the deprecated functions.

This series updates x86 and arm64 secondary MMUs to the new API.

v3 -> v4:
 - rebase on kvm next branch again
 - Add some more context to a comment in ensure_pfn_ref
v2 -> v3:
 - rebase on kvm next branch
v1 -> v2:
 - Introduce new gfn_to_pfn_page functions instead of modifying the
   behavior of existing gfn_to_pfn functions, to make the change less
   invasive.
 - Drop changes to mmu_audit.c
 - Include Nicholas Piggin's patch to avoid corrupting refcount in the
   follow_pte case, and use it in depreciated gfn_to_pfn functions.
 - Rebase on kvm/next

David Stevens (4):
  KVM: mmu: introduce new gfn_to_pfn_page functions
  KVM: x86/mmu: use gfn_to_pfn_page
  KVM: arm64/mmu: use gfn_to_pfn_page
  KVM: mmu: remove over-aggressive warnings

 arch/arm64/kvm/mmu.c|  27 +++--
 arch/x86/kvm/mmu/mmu.c  |  50 
 arch/x86/kvm/mmu/mmu_internal.h |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  23 ++--
 arch/x86/kvm/mmu/tdp_mmu.c  |   6 +-
 arch/x86/kvm/mmu/tdp_mmu.h  |   4 +-
 arch/x86/kvm/x86.c  |   6 +-
 include/linux/kvm_host.h|  17 +++
 virt/kvm/kvm_main.c | 198 +++-
 9 files changed, 231 insertions(+), 103 deletions(-)

-- 
2.33.0.685.g46640cef36-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: disabling halt polling broken? (was Re: [PATCH 00/14] KVM: Halt-polling fixes, cleanups and a new stat)

2021-09-29 Thread Christian Borntraeger




Am 27.09.21 um 18:58 schrieb David Matlack:

On Mon, Sep 27, 2021 at 8:17 AM Christian Borntraeger
 wrote:




Am 27.09.21 um 17:03 schrieb Paolo Bonzini:

On 27/09/21 16:59, Sean Christopherson wrote:

commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
Author: David Matlack
AuthorDate: Fri Apr 17 15:14:46 2020 -0700
Commit: Paolo Bonzini
CommitDate: Fri Apr 24 12:53:17 2020 -0400

  kvm: add capability for halt polling

broke the possibility for an admin to disable halt polling for already running 
KVM guests.
In past times doing
echo 0 > /sys/module/kvm/parameters/halt_poll_ns

stopped polling system wide.
Now all KVM guests will use the halt_poll_ns value that was active during
startup - even those that do not use KVM_CAP_HALT_POLL.

I guess this was not intended?


No, but...


I would go so far as to say that halt_poll_ns should be a hard limit on
the capability


... this would not be a good idea I think.  Anything that wants to do a lot of polling 
can just do "for (;;)".


I agree. It would also be a maintenance burden and subtle "gotcha" to
have to increase halt_poll_ns anytime one wants to increase
KVM_CAP_HALT_POLL.


I think the idea of the upper bound is not about preventing wasting CPUs
but to reconfigure existing poll intervals on a global level. So I think
this idea is a bad idea in itself. Especially as the admin might not have
access to the monitor of user QEMUs.



So I think there are two possibilities that makes sense:

* track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow 
that


what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL 
and the private number for those that did.


None of these options would cover Christian's original use-case
though. (Write to module to disable halt-polling system-wide.)

What about adding a writable "enable_halt_polling" module parameter


that would then affect both classes with and without KVM_CAP_HALT_POLL.


that affects all VMs? Once that is in place we could also consider
getting rid of halt_poll_ns entirely.


As far as I can tell QEMU does not yet use KVM_CAP_HALT_POLL.
So having a system wide halt_poll_ns makes sense. And I think for all
processes not using KVM_CAP_HALT_POLL we should really follow what
halt_poll_ns is NOW and not what it used to be.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm