Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 7:30 PM, Marc Zyngier wrote:
> On Thu, 01 Apr 2021 18:03:25 +0100,
> Auger Eric  wrote:
>>
>> Hi Marc,
>>
>> On 4/1/21 3:42 PM, Marc Zyngier wrote:
>>> Hi Eric,
>>>
>>> On Thu, 01 Apr 2021 09:52:37 +0100,
>>> Eric Auger  wrote:

 Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
 reporting of GICR_TYPER.Last for userspace") temporarily fixed
 a bug identified when attempting to access the GICR_TYPER
 register before the redistributor region setting, but dropped
 the support of the LAST bit.

 Emulating the GICR_TYPER.Last bit still makes sense for
 architecture compliance though. This patch restores its support
 (if the redistributor region was set) while keeping the code safe.

 We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
 computes whether a redistributor is the highest one of a series
 of redistributor contributor pages.

 The spec says "Indicates whether this Redistributor is the
 highest-numbered Redistributor in a series of contiguous
 Redistributor pages."

 The code is a bit convulated since there is no guarantee
>>>
>>> nit: convoluted
>>>
 redistributors are added in a given reditributor region in
 ascending order. In that case the current implementation was
 wrong. Also redistributor regions can be contiguous
 and registered in non increasing base address order.

 So the index of redistributors are stored in an array within
 the redistributor region structure.

 With this new implementation we do not need to have a uaccess
 read accessor anymore.

 Signed-off-by: Eric Auger 
>>>
>>> This patch also hurt my head, a lot more than the first one.  See
>>> below.
>>>
 ---
  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
  arch/arm64/kvm/vgic/vgic.h |  1 +
  include/kvm/arm_vgic.h |  3 +
  4 files changed, 73 insertions(+), 35 deletions(-)

 diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
 b/arch/arm64/kvm/vgic/vgic-init.c
 index cf6faa0aeddb2..61150c34c268c 100644
 --- a/arch/arm64/kvm/vgic/vgic-init.c
 +++ b/arch/arm64/kvm/vgic/vgic-init.c
 @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
int i;
  
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
 +  vgic_cpu->index = vcpu->vcpu_id;
>>>
>>> Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
>>> do we need another field? We can always get to the vcpu using a
>>> container_of().
>>>
  
INIT_LIST_HEAD(_cpu->ap_list_head);
raw_spin_lock_init(_cpu->ap_list_lock);
 @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
  
if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
 -  list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
 -  list_del(>list);
 -  kfree(rdreg);
 -  }
 +  list_for_each_entry_safe(rdreg, next, >rd_regions, list)
 +  vgic_v3_free_redist_region(rdreg);
>>>
>>> Consider moving the introduction of vgic_v3_free_redist_region() into
>>> a separate patch. On its own, that's a good readability improvement.
>>>
INIT_LIST_HEAD(>rd_regions);
} else {
dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
 diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
 b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
 index 987e366c80008..f6a7eed1d6adb 100644
 --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
 +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
 @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
 *vcpu,
vgic_enable_lpis(vcpu);
  }
  
 +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
 +{
 +  struct vgic_dist *vgic = >kvm->arch.vgic;
 +  struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
 +  struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
 +
 +  if (!rdreg)
 +  return false;
 +
 +  if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
 +  /* check whether there is no other contiguous rdist region */
 +  struct list_head *rd_regions = >rd_regions;
 +  struct vgic_redist_region *iter;
 +
 +  list_for_each_entry(iter, rd_regions, list) {
 +  if (iter->base == rdreg->base + rdreg->count * 
 KVM_VGIC_V3_REDIST_SIZE &&
 +  iter->free_index > 0) {
 +  /* check the first rdist index of this region, if any */
 +  if (vgic_cpu->index < iter->rdist_indices[0])
 +  return false;
>>>
>>> rdist_indices[] contains the vcpu_id of the vcpu associated with a
>>> given RD in the 

Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Marc Zyngier
On Thu, 01 Apr 2021 18:03:25 +0100,
Auger Eric  wrote:
> 
> Hi Marc,
> 
> On 4/1/21 3:42 PM, Marc Zyngier wrote:
> > Hi Eric,
> > 
> > On Thu, 01 Apr 2021 09:52:37 +0100,
> > Eric Auger  wrote:
> >>
> >> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
> >> reporting of GICR_TYPER.Last for userspace") temporarily fixed
> >> a bug identified when attempting to access the GICR_TYPER
> >> register before the redistributor region setting, but dropped
> >> the support of the LAST bit.
> >>
> >> Emulating the GICR_TYPER.Last bit still makes sense for
> >> architecture compliance though. This patch restores its support
> >> (if the redistributor region was set) while keeping the code safe.
> >>
> >> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
> >> computes whether a redistributor is the highest one of a series
> >> of redistributor contributor pages.
> >>
> >> The spec says "Indicates whether this Redistributor is the
> >> highest-numbered Redistributor in a series of contiguous
> >> Redistributor pages."
> >>
> >> The code is a bit convulated since there is no guarantee
> > 
> > nit: convoluted
> > 
> >> redistributors are added in a given reditributor region in
> >> ascending order. In that case the current implementation was
> >> wrong. Also redistributor regions can be contiguous
> >> and registered in non increasing base address order.
> >>
> >> So the index of redistributors are stored in an array within
> >> the redistributor region structure.
> >>
> >> With this new implementation we do not need to have a uaccess
> >> read accessor anymore.
> >>
> >> Signed-off-by: Eric Auger 
> > 
> > This patch also hurt my head, a lot more than the first one.  See
> > below.
> > 
> >> ---
> >>  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
> >>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
> >>  arch/arm64/kvm/vgic/vgic.h |  1 +
> >>  include/kvm/arm_vgic.h |  3 +
> >>  4 files changed, 73 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
> >> b/arch/arm64/kvm/vgic/vgic-init.c
> >> index cf6faa0aeddb2..61150c34c268c 100644
> >> --- a/arch/arm64/kvm/vgic/vgic-init.c
> >> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> >> @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
> >>int i;
> >>  
> >>vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
> >> +  vgic_cpu->index = vcpu->vcpu_id;
> > 
> > Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
> > do we need another field? We can always get to the vcpu using a
> > container_of().
> > 
> >>  
> >>INIT_LIST_HEAD(_cpu->ap_list_head);
> >>raw_spin_lock_init(_cpu->ap_list_lock);
> >> @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
> >>dist->vgic_dist_base = VGIC_ADDR_UNDEF;
> >>  
> >>if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
> >> -  list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
> >> -  list_del(>list);
> >> -  kfree(rdreg);
> >> -  }
> >> +  list_for_each_entry_safe(rdreg, next, >rd_regions, list)
> >> +  vgic_v3_free_redist_region(rdreg);
> > 
> > Consider moving the introduction of vgic_v3_free_redist_region() into
> > a separate patch. On its own, that's a good readability improvement.
> > 
> >>INIT_LIST_HEAD(>rd_regions);
> >>} else {
> >>dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
> >> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> >> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> index 987e366c80008..f6a7eed1d6adb 100644
> >> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> >> @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
> >> *vcpu,
> >>vgic_enable_lpis(vcpu);
> >>  }
> >>  
> >> +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
> >> +{
> >> +  struct vgic_dist *vgic = >kvm->arch.vgic;
> >> +  struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> >> +  struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
> >> +
> >> +  if (!rdreg)
> >> +  return false;
> >> +
> >> +  if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
> >> +  /* check whether there is no other contiguous rdist region */
> >> +  struct list_head *rd_regions = >rd_regions;
> >> +  struct vgic_redist_region *iter;
> >> +
> >> +  list_for_each_entry(iter, rd_regions, list) {
> >> +  if (iter->base == rdreg->base + rdreg->count * 
> >> KVM_VGIC_V3_REDIST_SIZE &&
> >> +  iter->free_index > 0) {
> >> +  /* check the first rdist index of this region, if any */
> >> +  if (vgic_cpu->index < iter->rdist_indices[0])
> >> +  return false;
> > 
> > rdist_indices[] contains the vcpu_id of the vcpu associated with a
> > given RD in the region. At this stage, you have established 

Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 3:42 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Thu, 01 Apr 2021 09:52:37 +0100,
> Eric Auger  wrote:
>>
>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>> a bug identified when attempting to access the GICR_TYPER
>> register before the redistributor region setting, but dropped
>> the support of the LAST bit.
>>
>> Emulating the GICR_TYPER.Last bit still makes sense for
>> architecture compliance though. This patch restores its support
>> (if the redistributor region was set) while keeping the code safe.
>>
>> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
>> computes whether a redistributor is the highest one of a series
>> of redistributor contributor pages.
>>
>> The spec says "Indicates whether this Redistributor is the
>> highest-numbered Redistributor in a series of contiguous
>> Redistributor pages."
>>
>> The code is a bit convulated since there is no guarantee
> 
> nit: convoluted
> 
>> redistributors are added in a given reditributor region in
>> ascending order. In that case the current implementation was
>> wrong. Also redistributor regions can be contiguous
>> and registered in non increasing base address order.
>>
>> So the index of redistributors are stored in an array within
>> the redistributor region structure.
>>
>> With this new implementation we do not need to have a uaccess
>> read accessor anymore.
>>
>> Signed-off-by: Eric Auger 
> 
> This patch also hurt my head, a lot more than the first one.  See
> below.
> 
>> ---
>>  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
>>  arch/arm64/kvm/vgic/vgic.h |  1 +
>>  include/kvm/arm_vgic.h |  3 +
>>  4 files changed, 73 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
>> b/arch/arm64/kvm/vgic/vgic-init.c
>> index cf6faa0aeddb2..61150c34c268c 100644
>> --- a/arch/arm64/kvm/vgic/vgic-init.c
>> +++ b/arch/arm64/kvm/vgic/vgic-init.c
>> @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>  int i;
>>  
>>  vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
>> +vgic_cpu->index = vcpu->vcpu_id;
> 
> Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
> do we need another field? We can always get to the vcpu using a
> container_of().
> 
>>  
>>  INIT_LIST_HEAD(_cpu->ap_list_head);
>>  raw_spin_lock_init(_cpu->ap_list_lock);
>> @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>>  dist->vgic_dist_base = VGIC_ADDR_UNDEF;
>>  
>>  if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>> -list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
>> -list_del(>list);
>> -kfree(rdreg);
>> -}
>> +list_for_each_entry_safe(rdreg, next, >rd_regions, list)
>> +vgic_v3_free_redist_region(rdreg);
> 
> Consider moving the introduction of vgic_v3_free_redist_region() into
> a separate patch. On its own, that's a good readability improvement.
> 
>>  INIT_LIST_HEAD(>rd_regions);
>>  } else {
>>  dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 987e366c80008..f6a7eed1d6adb 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
>> *vcpu,
>>  vgic_enable_lpis(vcpu);
>>  }
>>  
>> +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_dist *vgic = >kvm->arch.vgic;
>> +struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>> +struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
>> +
>> +if (!rdreg)
>> +return false;
>> +
>> +if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
>> +/* check whether there is no other contiguous rdist region */
>> +struct list_head *rd_regions = >rd_regions;
>> +struct vgic_redist_region *iter;
>> +
>> +list_for_each_entry(iter, rd_regions, list) {
>> +if (iter->base == rdreg->base + rdreg->count * 
>> KVM_VGIC_V3_REDIST_SIZE &&
>> +iter->free_index > 0) {
>> +/* check the first rdist index of this region, if any */
>> +if (vgic_cpu->index < iter->rdist_indices[0])
>> +return false;
> 
> rdist_indices[] contains the vcpu_id of the vcpu associated with a
> given RD in the region. At this stage, you have established that there
> is another region that is contiguous with the one associated with our
> vcpu. You also know that this adjacent region has a vcpu mapped in
> (free_index isn't 0). Isn't that enough to declare that our vcpu isn't
> last?  I 

Re: [PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

2021-04-01 Thread Marc Zyngier
On Thu, 01 Apr 2021 14:55:54 +0100,
Alexandru Elisei  wrote:

[...]

> Had another go at this, and as I was looking at the code, I realized that
> conceptually, trapping debug registers access (MDCR_EL2.TDA) is tied to:
> 
> - KVM_ARM64_DEBUG_DIRTY *not* being set (guest is debugging itself and KVM 
> will
> world-switch the debug registers).
> 
> - KVM_GUESTDBG_USE_HW being set, which also *sets* KVM_ARM64_DEBUG_DIRTY 
> (host is
> debugging the guest using hardware breakpoints).
> 
> So I cannot set the MDCR_EL2.TDA bit based on KVM_ARM64_DEBUG_DIRTY,
> because I would lose one of the two cases. It looks to me that
> keeping kvm_arm_setup_mdcr_el2() unchanged and calling it at the
> start of kvm_arm_setup_debug() is the way to go here.

Just post the revised patch, and we'll take it from there.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 8/9] KVM: x86: add force_intercept_exceptions_mask

2021-04-01 Thread Maxim Levitsky
This parameter will be used by VMX and SVM code to force
interception of a set of exceptions, given by a bitmask
for guest debug and/or kvm debug.

This option is not intended for production.

This is based on an idea first shown here:
https://patchwork.kernel.org/project/kvm/patch/20160301192822.gd22...@pd.tnic/

CC: Borislav Petkov 
Signed-off-by: Maxim Levitsky 
---
 arch/x86/kvm/x86.c | 3 +++
 arch/x86/kvm/x86.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3627ce8fe5bb..1a51031d64d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -176,6 +176,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+uint force_intercept_exceptions_mask;
+module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
+EXPORT_SYMBOL_GPL(force_intercept_exceptions_mask);
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index daccf20fbcd5..644480711ff7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -311,6 +311,8 @@ extern struct static_key kvm_no_apic_vcpu;
 
 extern bool report_ignored_msrs;
 
+extern uint force_intercept_exceptions_mask;
+
 static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
 {
return pvclock_scale_delta(nsec, vcpu->arch.virtual_tsc_mult,
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 6/9] KVM: x86: implement KVM_GUESTDBG_BLOCKEVENTS

2021-04-01 Thread Maxim Levitsky
KVM_GUESTDBG_BLOCKEVENTS is a guest debug feature that
will allow KVM to block all interrupts while running.
It is mostly intended to be used together with single stepping,
to make it more robust, and has the following benefits:

* Resuming from a breakpoint is much more reliable:
  When resuming execution from a breakpoint, with interrupts enabled,
  more often than not, KVM would inject an interrupt and make the CPU
  jump immediately to the interrupt handler and eventually return to
  the breakpoint, only to trigger it again.

  From the gdb's user point of view it looks like the CPU has never
  executed a single instruction and in some cases that can even
  prevent forward progress, for example, when the breakpoint
  is placed by an automated script (e.g lx-symbols), which does
  something in response to the breakpoint and then continues
  the guest automatically.
  If the script execution takes enough time for another interrupt to
  arrive, the guest will be stuck on the same breakpoint forever.

* Normal single stepping is much more predictable, since it won't
  land the debugger into an interrupt handler.

* Chances of RFLAGS.TF being leaked to the guest are reduced:

  KVM sets that flag behind the guest's back to single step it,
  but if the single step lands the vCPU into an
  interrupt/exception handler the RFLAGS.TF will be leaked to the
  guest in the form of being pushed to the stack.
  This doesn't completely eliminate this problem as exceptions
  can still happen, but at least this eliminates the common
  case.

Signed-off-by: Maxim Levitsky 
---
 Documentation/virt/kvm/api.rst  | 1 +
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/include/uapi/asm/kvm.h | 1 +
 arch/x86/kvm/x86.c  | 4 
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 9778b2434c03..a4f2dc84741f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3338,6 +3338,7 @@ flags which can include the following:
   - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
   - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
+  - KVM_GUESTDBG_BLOCKIRQ:  avoid injecting interrupts/NMI/SMI [x86]
 
 For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
 are enabled in memory so we need to ensure breakpoint exceptions are
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cc7c82a449d5..8c529ae9dbbe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -227,7 +227,8 @@ enum x86_intercept_stage;
KVM_GUESTDBG_USE_HW_BP | \
KVM_GUESTDBG_USE_SW_BP | \
KVM_GUESTDBG_INJECT_BP | \
-   KVM_GUESTDBG_INJECT_DB)
+   KVM_GUESTDBG_INJECT_DB | \
+   KVM_GUESTDBG_BLOCKIRQ)
 
 
 #define PFERR_PRESENT_BIT 0
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5a3022c8af82..b0f9945067f7 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -282,6 +282,7 @@ struct kvm_debug_exit_arch {
 #define KVM_GUESTDBG_USE_HW_BP 0x0002
 #define KVM_GUESTDBG_INJECT_DB 0x0004
 #define KVM_GUESTDBG_INJECT_BP 0x0008
+#define KVM_GUESTDBG_BLOCKIRQ  0x0010
 
 /* for KVM_SET_GUEST_DEBUG */
 struct kvm_guest_debug_arch {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 956e8e0bd6af..3627ce8fe5bb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8460,6 +8460,10 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, 
bool *req_immediate_exit
can_inject = false;
}
 
+   /* Don't inject interrupts if the user asked to avoid doing so */
+   if (vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ)
+   return;
+
/*
 * Finally, inject interrupt events.  If an event cannot be injected
 * due to architectural conditions (e.g. IF=0) a window-open exit
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 9/9] KVM: SVM: implement force_intercept_exceptions_mask

2021-04-01 Thread Maxim Levitsky
Currently #TS interception is only done once.
Also exception interception is not enabled for SEV guests.

Signed-off-by: Maxim Levitsky 
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm/svm.c  | 70 +
 arch/x86/kvm/svm/svm.h  |  6 ++-
 arch/x86/kvm/x86.c  |  5 ++-
 4 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8c529ae9dbbe..d15ae64a2c4e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1574,6 +1574,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long 
payload);
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+u32 error_code, unsigned long payload);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 
error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2aa951bc470c..de7fd7922ec7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -220,6 +220,8 @@ static const u32 msrpm_ranges[] = {0, 0xc000, 
0xc001};
 #define MSRS_RANGE_SIZE 2048
 #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
 
+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
+
 u32 svm_msrpm_offset(u32 msr)
 {
u32 offset;
@@ -1113,6 +1115,22 @@ static void svm_check_invpcid(struct vcpu_svm *svm)
}
 }
 
+static void svm_init_force_exceptions_intercepts(struct vcpu_svm *svm)
+{
+   int exc;
+
+   svm->force_intercept_exceptions_mask = force_intercept_exceptions_mask;
+   for (exc = 0 ; exc < 32 ; exc++) {
+   if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
+   continue;
+
+   /* Those are defined to have undefined behavior in the SVM spec 
*/
+   if (exc != 2 && exc != 9)
+   continue;
+   set_exception_intercept(svm, exc);
+   }
+}
+
 static void init_vmcb(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1288,6 +1306,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
enable_gif(svm);
 
+   if (!sev_es_guest(vcpu->kvm))
+   svm_init_force_exceptions_intercepts(svm);
+
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1913,6 +1934,17 @@ static int pf_interception(struct kvm_vcpu *vcpu)
u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
 
+   if ((svm->force_intercept_exceptions_mask & (1 << PF_VECTOR)))
+   if (npt_enabled && !vcpu->arch.apf.host_apf_flags) {
+   /* If the #PF was only intercepted for debug, inject
+* it directly to the guest, since the kvm's mmu code
+* is not ready to deal with such page faults.
+*/
+   kvm_queue_exception_e_p(vcpu, PF_VECTOR,
+   error_code, fault_address);
+   return 1;
+   }
+
return kvm_handle_page_fault(vcpu, error_code, fault_address,
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
svm->vmcb->control.insn_bytes : NULL,
@@ -1988,6 +2020,40 @@ static int ac_interception(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static int gen_exc_interception(struct kvm_vcpu *vcpu)
+{
+   /*
+* Generic exception intercept handler which forwards a guest exception
+* as-is to the guest.
+* For exceptions that don't have a special intercept handler.
+*
+* Used only for 'force_intercept_exceptions_mask' KVM debug feature.
+*/
+   struct vcpu_svm *svm = to_svm(vcpu);
+   int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
+
+   /* SVM doesn't provide us with an error code for the #DF */
+   u32 err_code = exc == DF_VECTOR ? 0 : svm->vmcb->control.exit_info_1;
+
+   if (!(svm->force_intercept_exceptions_mask & (1 << exc)))
+   return svm_handle_invalid_exit(vcpu, 
svm->vmcb->control.exit_code);
+
+   if (exc == TS_VECTOR) {
+   /*
+* SVM doesn't provide us with an error code to be able to
+* re-inject the #TS exception, so just disable its
+* intercept, and let the guest re-execute the instruction.
+*/
+   vmcb_clr_intercept(>vmcb01.ptr->control,
+  INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
+   

[PATCH v2 7/9] KVM: SVM: split svm_handle_invalid_exit

2021-04-01 Thread Maxim Levitsky
Split the check for having a vmexit handler to
svm_check_exit_valid, and make svm_handle_invalid_exit
only handle a vmexit that is already not valid.

Signed-off-by: Maxim Levitsky 
---
 arch/x86/kvm/svm/svm.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495..2aa951bc470c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3220,12 +3220,14 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
   "excp_to:", save->last_excp_to);
 }
 
-static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+static bool svm_check_exit_valid(struct kvm_vcpu *vcpu, u64 exit_code)
 {
-   if (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
-   svm_exit_handlers[exit_code])
-   return 0;
+   return (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
+   svm_exit_handlers[exit_code]);
+}
 
+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+{
vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%llx\n", exit_code);
dump_vmcb(vcpu);
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -3233,14 +3235,13 @@ static int svm_handle_invalid_exit(struct kvm_vcpu 
*vcpu, u64 exit_code)
vcpu->run->internal.ndata = 2;
vcpu->run->internal.data[0] = exit_code;
vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
-
-   return -EINVAL;
+   return 0;
 }
 
 int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 {
-   if (svm_handle_invalid_exit(vcpu, exit_code))
-   return 0;
+   if (!svm_check_exit_valid(vcpu, exit_code))
+   return svm_handle_invalid_exit(vcpu, exit_code);
 
 #ifdef CONFIG_RETPOLINE
if (exit_code == SVM_EXIT_MSR)
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/9] KVM: introduce KVM_CAP_SET_GUEST_DEBUG2

2021-04-01 Thread Maxim Levitsky
This capability will allow the user to know which KVM_GUESTDBG_* bits
are supported.

Signed-off-by: Maxim Levitsky 
---
 Documentation/virt/kvm/api.rst | 3 +++
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38e327d4b479..9778b2434c03 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3357,6 +3357,9 @@ indicating the number of supported registers.
 For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether
 the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported.
 
+Also when supported, KVM_CAP_SET_GUEST_DEBUG2 capability indicates the
+supported KVM_GUESTDBG_* bits in the control field.
+
 When debug events exit the main run loop with the reason
 KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 structure containing architecture specific debug information.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..727010788eff 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_SET_GUEST_DEBUG2 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 4/9] KVM: aarch64: implement KVM_CAP_SET_GUEST_DEBUG2

2021-04-01 Thread Maxim Levitsky
Move KVM_GUESTDBG_VALID_MASK to kvm_host.h
and use it to return the value of this capability.
Compile tested only.

Signed-off-by: Maxim Levitsky 
---
 arch/arm64/include/asm/kvm_host.h | 4 
 arch/arm64/kvm/arm.c  | 2 ++
 arch/arm64/kvm/guest.c| 5 -
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..613421454ab6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -401,6 +401,10 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_PENDING_EXCEPTION(1 << 8) /* Exception pending */
 #define KVM_ARM64_EXCEPT_MASK  (7 << 9) /* Target EL/MODE */
 
+#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
+KVM_GUESTDBG_USE_SW_BP | \
+KVM_GUESTDBG_USE_HW | \
+KVM_GUESTDBG_SINGLESTEP)
 /*
  * When KVM_ARM64_PENDING_EXCEPTION is set, KVM_ARM64_EXCEPT_MASK can
  * take the following values:
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7f06ba76698d..e575eff76e97 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -208,6 +208,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VCPU_ATTRIBUTES:
r = 1;
break;
+   case KVM_CAP_SET_GUEST_DEBUG2:
+   return KVM_GUESTDBG_VALID_MASK;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
r = 1;
break;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9bbd30e62799..6cb39ee74acd 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -888,11 +888,6 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
-#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\
-   KVM_GUESTDBG_USE_SW_BP | \
-   KVM_GUESTDBG_USE_HW | \
-   KVM_GUESTDBG_SINGLESTEP)
-
 /**
  * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
  * @kvm:   pointer to the KVM struct
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm64: Fix a typo in the kvm_init_stage2_mmu documentation

2021-04-01 Thread Yoan Picchi
A single letter : strucrure -> structure

Signed-off-by: Yoan Picchi 
---
 arch/arm64/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f..eed9fed86 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -352,7 +352,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t 
size,
 }
 
 /**
- * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
+ * kvm_init_stage2_mmu - Initialise a S2 MMU structure
  * @kvm:   The pointer to the KVM structure
  * @mmu:   The pointer to the s2 MMU structure
  *
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/9] KVM: x86: implement KVM_CAP_SET_GUEST_DEBUG2

2021-04-01 Thread Maxim Levitsky
Store the supported bits into KVM_GUESTDBG_VALID_MASK
macro, similar to how arm does this.

Signed-off-by: Maxim Levitsky 
---
 arch/x86/include/asm/kvm_host.h | 9 +
 arch/x86/kvm/x86.c  | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a52f973bdff6..cc7c82a449d5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -221,6 +221,15 @@ enum x86_intercept_stage;
 #define DR7_FIXED_10x0400
 #define DR7_VOLATILE   0x2bff
 
+#define KVM_GUESTDBG_VALID_MASK \
+   (KVM_GUESTDBG_ENABLE | \
+   KVM_GUESTDBG_SINGLESTEP | \
+   KVM_GUESTDBG_USE_HW_BP | \
+   KVM_GUESTDBG_USE_SW_BP | \
+   KVM_GUESTDBG_INJECT_BP | \
+   KVM_GUESTDBG_INJECT_DB)
+
+
 #define PFERR_PRESENT_BIT 0
 #define PFERR_WRITE_BIT 1
 #define PFERR_USER_BIT 2
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d95f90a048..956e8e0bd6af 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3798,6 +3798,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
r = 1;
break;
+   case KVM_CAP_SET_GUEST_DEBUG2:
+   return KVM_GUESTDBG_VALID_MASK;
 #ifdef CONFIG_KVM_XEN
case KVM_CAP_XEN_HVM:
r = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 5/9] KVM: s390x: implement KVM_CAP_SET_GUEST_DEBUG2

2021-04-01 Thread Maxim Levitsky
Define KVM_GUESTDBG_VALID_MASK and use it to implement this capabiity.
Compile tested only.

Signed-off-by: Maxim Levitsky 
---
 arch/s390/include/asm/kvm_host.h | 4 
 arch/s390/kvm/kvm-s390.c | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 6bcfc5614bbc..a3902b57b825 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -700,6 +700,10 @@ struct kvm_hw_bp_info_arch {
 #define guestdbg_exit_pending(vcpu) (guestdbg_enabled(vcpu) && \
(vcpu->guest_debug & KVM_GUESTDBG_EXIT_PENDING))
 
+#define KVM_GUESTDBG_VALID_MASK \
+   (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP |\
+   KVM_GUESTDBG_USE_HW_BP | KVM_GUESTDBG_EXIT_PENDING)
+
 struct kvm_guestdbg_info_arch {
unsigned long cr0;
unsigned long cr9;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2f09e9d7dc95..2049fc8c222a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -544,6 +544,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_DIAG318:
r = 1;
break;
+   case KVM_CAP_SET_GUEST_DEBUG2:
+   r = KVM_GUESTDBG_VALID_MASK;
+   break;
case KVM_CAP_S390_HPAGE_1M:
r = 0;
if (hpage && !kvm_is_ucontrol(kvm))
-- 
2.26.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/9] scripts/gdb: rework lx-symbols gdb script

2021-04-01 Thread Maxim Levitsky
Fix several issues that are present in lx-symbols script:

* Track module unloads by placing another software breakpoint at
  'free_module'
  (force uninline this symbol just in case), and use remove-symbol-file
  gdb command to unload the symobls of the module that is unloading.

  That gives the gdb a chance to mark all software breakpoints from
  this module as pending again.
  Also remove the module from the 'known' module list once it is unloaded.

* Since we now track module unload, we don't need to reload all
  symbols anymore when 'known' module loaded again
  (that can't happen anymore).
  This allows reloading a module in the debugged kernel to finish
  much faster, while lx-symbols tracks module loads and unloads.

* Disable/enable all gdb breakpoints on both module load and unload
  breakpoint hits, and not only in 'load_all_symbols' as was done before.
  (load_all_symbols is no longer called on breakpoint hit)
  That allows gdb to avoid getting confused about the state of the
  (now two) internal breakpoints we place.
  Otherwise it will leave them in the kernel code segment, when
  continuing which triggers a guest kernel panic as soon as it skips
  over the 'int3' instruction and executes the garbage tail of the optcode
  on which the breakpoint was placed.

* Block SIGINT while symbols are reloading as this seems to crash gdb.
  (new in V2)

* Add a basic check that kernel is already loaded into the guest memory
  to avoid confusing errors.
  (new in V2)

Signed-off-by: Maxim Levitsky 
---
 kernel/module.c  |   8 +-
 scripts/gdb/linux/symbols.py | 203 +++
 2 files changed, 143 insertions(+), 68 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 30479355ab85..ea81fc06ea1f 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
 }
 EXPORT_SYMBOL(module_refcount);
 
-/* This exists whether we can unload or not */
-static void free_module(struct module *mod);
+/* This exists whether we can unload or not
+ * Keep it uninlined to provide a reliable breakpoint target,
+ * e.g. for the gdb helper command 'lx-symbols'.
+ */
+
+static noinline void free_module(struct module *mod);
 
 SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
unsigned int, flags)
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb..e1374a6e06f7 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -14,45 +14,23 @@
 import gdb
 import os
 import re
+import signal
 
 from linux import modules, utils
 
 
 if hasattr(gdb, 'Breakpoint'):
-class LoadModuleBreakpoint(gdb.Breakpoint):
-def __init__(self, spec, gdb_command):
-super(LoadModuleBreakpoint, self).__init__(spec, internal=True)
+
+class BreakpointWrapper(gdb.Breakpoint):
+def __init__(self, callback, **kwargs):
+super(BreakpointWrapper, self).__init__(internal=True, **kwargs)
 self.silent = True
-self.gdb_command = gdb_command
+self.callback = callback
 
 def stop(self):
-module = gdb.parse_and_eval("mod")
-module_name = module['name'].string()
-cmd = self.gdb_command
-
-# enforce update if object file is not found
-cmd.module_files_updated = False
-
-# Disable pagination while reporting symbol (re-)loading.
-# The console input is blocked in this context so that we would
-# get stuck waiting for the user to acknowledge paged output.
-show_pagination = gdb.execute("show pagination", to_string=True)
-pagination = show_pagination.endswith("on.\n")
-gdb.execute("set pagination off")
-
-if module_name in cmd.loaded_modules:
-gdb.write("refreshing all symbols to reload module "
-  "'{0}'\n".format(module_name))
-cmd.load_all_symbols()
-else:
-cmd.load_module_symbols(module)
-
-# restore pagination state
-gdb.execute("set pagination %s" % ("on" if pagination else "off"))
-
+self.callback()
 return False
 
-
 class LxSymbols(gdb.Command):
 """(Re-)load symbols of Linux kernel and currently loaded modules.
 
@@ -61,15 +39,52 @@ are scanned recursively, starting in the same directory. 
Optionally, the module
 search path can be extended by a space separated list of paths passed to the
 lx-symbols command."""
 
-module_paths = []
-module_files = []
-module_files_updated = False
-loaded_modules = []
-breakpoint = None
-
 def __init__(self):
 super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
 gdb.COMPLETE_FILENAME)
+self.module_paths = []
+self.module_files = []
+self.module_files_updated = False
+

[PATCH v2 0/9] KVM: my debug patch queue

2021-04-01 Thread Maxim Levitsky
Hi!

I would like to publish two debug features which were needed for other stuff
I work on.

One is the reworked lx-symbols script which now actually works on at least
gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel
for some reason, not related to this patch) and upstream qemu.

The other feature is the ability to trap all guest exceptions (on SVM for now)
and see them in kvmtrace prior to potential merge to double/triple fault.

This can be very useful and I already had to manually patch KVM a few
times for this.
I will, once time permits, implement this feature on Intel as well.

V2:

 * Some more refactoring and workarounds for lx-symbols script

 * added KVM_GUESTDBG_BLOCKEVENTS flag to enable 'block interrupts on
   single step' together with KVM_CAP_SET_GUEST_DEBUG2 capability
   to indicate which guest debug flags are supported.

   This is a replacement for unconditional block of interrupts on single
   step that was done in previous version of this patch set.
   Patches to qemu to use that feature will be sent soon.

 * Reworked the the 'intercept all exceptions for debug' feature according
   to the review feedback:

   - renamed the parameter that enables the feature and
 moved it to common kvm module.
 (only SVM part is currently implemented though)

   - disable the feature for SEV guests as was suggested during the review
   - made the vmexit table const again, as was suggested in the review as well.

Best regards,
Maxim Levitsky

Maxim Levitsky (9):
  scripts/gdb: rework lx-symbols gdb script
  KVM: introduce KVM_CAP_SET_GUEST_DEBUG2
  KVM: x86: implement KVM_CAP_SET_GUEST_DEBUG2
  KVM: aarch64: implement KVM_CAP_SET_GUEST_DEBUG2
  KVM: s390x: implement KVM_CAP_SET_GUEST_DEBUG2
  KVM: x86: implement KVM_GUESTDBG_BLOCKEVENTS
  KVM: SVM: split svm_handle_invalid_exit
  KVM: x86: add force_intercept_exceptions_mask
  KVM: SVM: implement force_intercept_exceptions_mask

 Documentation/virt/kvm/api.rst|   4 +
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/arm.c  |   2 +
 arch/arm64/kvm/guest.c|   5 -
 arch/s390/include/asm/kvm_host.h  |   4 +
 arch/s390/kvm/kvm-s390.c  |   3 +
 arch/x86/include/asm/kvm_host.h   |  12 ++
 arch/x86/include/uapi/asm/kvm.h   |   1 +
 arch/x86/kvm/svm/svm.c|  87 +++--
 arch/x86/kvm/svm/svm.h|   6 +-
 arch/x86/kvm/x86.c|  14 ++-
 arch/x86/kvm/x86.h|   2 +
 include/uapi/linux/kvm.h  |   1 +
 kernel/module.c   |   8 +-
 scripts/gdb/linux/symbols.py  | 203 --
 15 files changed, 272 insertions(+), 84 deletions(-)

-- 
2.26.2


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

2021-04-01 Thread Alexandru Elisei
Hi Marc,

On 3/30/21 8:57 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:49:54 +0100,
> Alexandru Elisei  wrote:
>> Hi Marc,
>>
>> On 3/30/21 6:13 PM, Alexandru Elisei wrote:
>>> [..]
> +}
> +
>  /**
>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu 
> state
>   */
> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   * @vcpu:the vcpu pointer
>   *
>   * This is called before each entry into the hypervisor to setup any
> - * debug related registers. Currently this just ensures we will trap
> - * access to:
> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> - *  - Debug ROM Address (MDCR_EL2_TDRA)
> - *  - OS related registers (MDCR_EL2_TDOSA)
> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> + * debug related registers.
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>  
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  {
> - bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>   unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>  
>   trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>  
> - /*
> -  * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> -  * to the profiling buffer.
> -  */
> - vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
> - vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> - MDCR_EL2_TPMS |
> - MDCR_EL2_TPMCR |
> - MDCR_EL2_TDRA |
> - MDCR_EL2_TDOSA);
> + kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>  
>   /* Is Guest debugging in effect? */
>   if (vcpu->guest_debug) {
> - /* Route all software debug exceptions to EL2 */
> - vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> -
>   /* Save guest debug state */
>   save_guest_debug_regs(vcpu);
>  
> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  
>   vcpu->arch.debug_ptr = >arch.external_debug_state;
>   vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> - trap_debug = true;
 There is something that slightly worries me here: there is now a
 disconnect between flagging debug as dirty and setting the
 trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
 set the trap bits *before* setting the dirty bit itself.

 Here, I believe you end up with guest/host confusion of breakpoints,
 which isn't great. Or did I miss something?
>>> I'm sorry, but I don't understand what you mean. This is my understanding 
>>> of what
>>> is happening.
>>>
>>> Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY 
>>> flag
>>> is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug 
>>> is
>>> only used when computing mdcr_el2.
>>>
>>> With this patch, trap_debug is set to true if vcpu->guest_debug &
>>> KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this 
>>> happens in
>>> kvm_arm_setup_mdcr_el2(), which is called at the start of 
>>> kvm_arm_setup_debug().
>>> The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
>>> vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.
>>>
>>> The guest never runs with the value computed in kvm_vcpu_first_run_init() 
>>> unless
>>> it's identical with the value recomputed in kvm_arm_setup_debug().
>>>
>>> The only difference I see is that mdcr_el2 is computed at the start of
>>> kvm_arm_setup_debug(). I get the feeling I'm also missing something.
>> I think I understand what you mean, you are worried that we won't
>> set the bit in mdcr_el2 to trap debug in the same place where we set
>> the debug dirty flag.
> Yes, that's what I mean. The code is conceptually as such ATM:
>
>   debug_trap = (something based on vcpu->flags);
>   if (something else) {
>   check stuff;
>   vcpu->flags |= stuff;
>   debug_trap = true;
>   }
>
>   if (debug_trap)
>   set trap conditions;
>
> You are turning this into:
>
>   debug_trap = (something based on vcpu->flags);
>   if (debug_trap) {
>   set trap conditions;
>   }
>   if (something else) {
>   check stuff;
>   vcpu->flags |= stuff;
>   }
>
> which isn't the same thing. In your case, it probably works because of
> KVM_GUESTDBG_USE_HW, but that's really hard to follow, and we have had
> so many bugs in the debug code that it really needs to be kept as
> stupid as possible.
>
>> If that's the case, then I can move kvm_arm_setup_mdcr_el2 

Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Marc Zyngier
Hi Eric,

On Thu, 01 Apr 2021 09:52:37 +0100,
Eric Auger  wrote:
> 
> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
> reporting of GICR_TYPER.Last for userspace") temporarily fixed
> a bug identified when attempting to access the GICR_TYPER
> register before the redistributor region setting, but dropped
> the support of the LAST bit.
> 
> Emulating the GICR_TYPER.Last bit still makes sense for
> architecture compliance though. This patch restores its support
> (if the redistributor region was set) while keeping the code safe.
> 
> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
> computes whether a redistributor is the highest one of a series
> of redistributor contributor pages.
> 
> The spec says "Indicates whether this Redistributor is the
> highest-numbered Redistributor in a series of contiguous
> Redistributor pages."
> 
> The code is a bit convulated since there is no guarantee

nit: convoluted

> redistributors are added in a given reditributor region in
> ascending order. In that case the current implementation was
> wrong. Also redistributor regions can be contiguous
> and registered in non increasing base address order.
> 
> So the index of redistributors are stored in an array within
> the redistributor region structure.
> 
> With this new implementation we do not need to have a uaccess
> read accessor anymore.
> 
> Signed-off-by: Eric Auger 

This patch also hurt my head, a lot more than the first one.  See
below.

> ---
>  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
>  arch/arm64/kvm/vgic/vgic.h |  1 +
>  include/kvm/arm_vgic.h |  3 +
>  4 files changed, 73 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index cf6faa0aeddb2..61150c34c268c 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>   int i;
>  
>   vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
> + vgic_cpu->index = vcpu->vcpu_id;

Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
do we need another field? We can always get to the vcpu using a
container_of().

>  
>   INIT_LIST_HEAD(_cpu->ap_list_head);
>   raw_spin_lock_init(_cpu->ap_list_lock);
> @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
>  
>   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
> - list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
> - list_del(>list);
> - kfree(rdreg);
> - }
> + list_for_each_entry_safe(rdreg, next, >rd_regions, list)
> + vgic_v3_free_redist_region(rdreg);

Consider moving the introduction of vgic_v3_free_redist_region() into
a separate patch. On its own, that's a good readability improvement.

>   INIT_LIST_HEAD(>rd_regions);
>   } else {
>   dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 987e366c80008..f6a7eed1d6adb 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
> *vcpu,
>   vgic_enable_lpis(vcpu);
>  }
>  
> +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_dist *vgic = >kvm->arch.vgic;
> + struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> + struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
> +
> + if (!rdreg)
> + return false;
> +
> + if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
> + /* check whether there is no other contiguous rdist region */
> + struct list_head *rd_regions = >rd_regions;
> + struct vgic_redist_region *iter;
> +
> + list_for_each_entry(iter, rd_regions, list) {
> + if (iter->base == rdreg->base + rdreg->count * 
> KVM_VGIC_V3_REDIST_SIZE &&
> + iter->free_index > 0) {
> + /* check the first rdist index of this region, if any */
> + if (vgic_cpu->index < iter->rdist_indices[0])
> + return false;

rdist_indices[] contains the vcpu_id of the vcpu associated with a
given RD in the region. At this stage, you have established that there
is another region that is contiguous with the one associated with our
vcpu. You also know that this adjacent region has a vcpu mapped in
(free_index isn't 0). Isn't that enough to declare that our vcpu isn't
last?  I definitely don't understand what the index comparison does
here.

It also seem to me that some of the complexity could be eliminated if
the regions were kept 

Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 31/03/21 23:05, Sean Christopherson wrote:
> > > Wouldn't it be incorrect to lock a mutex (e.g. inside*another*  MMU
> > > notifier's invalidate callback) while holding an rwlock_t?  That makes 
> > > sense
> > > because anybody that's busy waiting in write_lock potentially cannot be
> > > preempted until the other task gets the mutex.  This is a potential
> > > deadlock.
> > 
> > Yes?  I don't think I follow your point though.  Nesting a spinlock or 
> > rwlock
> > inside a rwlock is ok, so long as the locks are always taken in the same 
> > order,
> > i.e. it's never mmu_lock -> mmu_notifier_slots_lock.
> 
> *Another* MMU notifier could nest a mutex inside KVM's rwlock.
> 
> But... is it correct that the MMU notifier invalidate callbacks are always
> called with the mmap_sem taken (sometimes for reading, e.g.
> try_to_merge_with_ksm_page->try_to_merge_one_page->write_protect_page)?

No :-(

File-based invalidations through the rmaps do not take mmap_sem.  They get at
the VMAs via the address_space's interval tree, which is protected by its own
i_mmap_rwsem.

E.g. try_to_unmap() -> rmap_walk_file() -> try_to_unmap_one() 

> We could take it temporarily in install_memslots, since the MMU notifier's mm
> is stored in kvm->mm.
> 
> In this case, a pair of kvm_mmu_notifier_lock/unlock functions would be the
> best way to abstract it.
> 
> Paolo
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 31/03/21 18:41, Sean Christopherson wrote:
> > > That said, the easiest way to avoid this would be to always update
> > > mmu_notifier_count.
> > Updating mmu_notifier_count requires taking mmu_lock, which would defeat the
> > purpose of these shenanigans.
> 
> Okay; I wasn't sure if the problem was contention with page faults in
> general, or just the long critical sections from the MMU notifier callbacks.
> Still updating mmu_notifier_count unconditionally is a good way to break up
> the patch in two and keep one commit just for the rwsem nastiness.

Rereading things, a small chunk of the rwsem nastiness can go away.  I don't see
any reason to use rw_semaphore instead of rwlock_t.  install_new_memslots() only
holds the lock for a handful of instructions.  Readers could get queued up
behind a writer, but since install_new_memslots() is serialized by slots_lock
(the existing mutex), there is no chance of multiple writers, i.e. the worst
case wait duration is bounded at the length of an in-flight notification.  And
that's _already_ the worst case since notifications are currently serialized by
mmu_lock.  In practice, the existing worst case is probably far worse since
there can be far more writers trying to acquire mmu_lock.

In other words, there's no strong argument for sleeping instead of busy waiting
in the notifiers.

By switching to rwlock_t, taking mmu_notifier_slots_lock doesn't have to depend
on mmu_notifier_range_blockable(), and the must_lock path also goes away.

> > > > +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > > > +   down_write(>mmu_notifier_slots_lock);
> > > > +#endif
> > > > rcu_assign_pointer(kvm->memslots[as_id], slots);
> > > > +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > > > +   up_write(>mmu_notifier_slots_lock);
> > > > +#endif
> > > Please do this unconditionally, the cost is minimal if the rwsem is not
> > > contended (as is the case if the architecture doesn't use MMU notifiers at
> > > all).
> > It's not the cost, it's that mmu_notifier_slots_lock doesn't exist.  That's 
> > an
> > easily solved problem, but then the lock wouldn't be initialized since
> > kvm_init_mmu_notifier() is a nop.  That's again easy to solve, but IMO would
> > look rather weird.  I guess the counter argument is that __kvm_memslots()
> > wouldn't need #ifdeffery.
> 
> Yep.  Less #ifdefs usually wins. :)
> 
> > These are the to ideas I've come up with:
> > 
> > Option 1:
> > static int kvm_init_mmu_notifier(struct kvm *kvm)
> > {
> > init_rwsem(>mmu_notifier_slots_lock);
> > 
> > #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > kvm->mmu_notifier.ops = _mmu_notifier_ops;
> > return mmu_notifier_register(>mmu_notifier, current->mm);
> > #else
> > return 0;
> > #endif
> > }
> 
> Option 2 is also okay I guess, but the simplest is option 1 + just init it
> in kvm_create_vm.

Arr.  I'll play around with it to try and purge the #ifdefs.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 26/03/21 03:19, Sean Christopherson wrote:
> > +   /*
> > +* Reset the lock used to prevent memslot updates between MMU notifier
> > +* range_start and range_end.  At this point no more MMU notifiers will
> > +* run, but the lock could still be held if KVM's notifier was removed
> > +* between range_start and range_end.  No threads can be waiting on the
> > +* lock as the last reference on KVM has been dropped.  If the lock is
> > +* still held, freeing memslots will deadlock.
> > +*/
> > +   init_rwsem(>mmu_notifier_slots_lock);
> 
> I was going to say that this is nasty, then I noticed that
> mmu_notifier_unregister uses SRCU to ensure completion of concurrent calls
> to the MMU notifier.  So I guess it's fine, but it's better to point it out:
> 
>   /*
>* At this point no more MMU notifiers will run and pending
>* calls to range_start have completed, but the lock would
>* still be held and never released if the MMU notifier was
>* removed between range_start and range_end.  Since the last
>* reference to the struct kvm has been dropped, no threads can
>* be waiting on the lock, but we might still end up taking it
>* when freeing memslots in kvm_arch_destroy_vm.  Reset the lock
>* to avoid deadlocks.
>*/

An alternative would be to not take the lock in install_new_memslots() if
kvm->users_count == 0.  It'd be weirder to document, and the conditional locking
would still be quite ugly.  Not sure if that's better than blasting a lock
during destruction?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 26/03/21 03:19, Sean Christopherson wrote:
> Also, related to the first part of the series, perhaps you could structure
> the series in a slightly different way:
> 
> 1) introduce the HVA walking API in common code, complete with on_lock and
> patch 15, so that you can use on_lock to increase mmu_notifier_seq
> 
> 2) then migrate all architectures including x86 to the new API
> 
> IOW, first half of patch 10 and all of patch 15; then the second half of
> patch 10; then patches 11-14.

100% agree with introducing on_lock separately from the conditional locking.

Not so sure about introducing conditional locking and then converting non-x86
archs.  I'd prefer to keep the conditional locking after arch conversion.
If something does go awry, it would be nice to be able to preciesly bisect to
the conditional locking.  Ditto if it needs to be reverted because it breaks an
arch.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 31/03/21 21:47, Sean Christopherson wrote:
> > Rereading things, a small chunk of the rwsem nastiness can go away.  I 
> > don't see
> > any reason to use rw_semaphore instead of rwlock_t.
> 
> Wouldn't it be incorrect to lock a mutex (e.g. inside *another* MMU
> notifier's invalidate callback) while holding an rwlock_t?  That makes sense
> because anybody that's busy waiting in write_lock potentially cannot be
> preempted until the other task gets the mutex.  This is a potential
> deadlock.

Yes?  I don't think I follow your point though.  Nesting a spinlock or rwlock
inside a rwlock is ok, so long as the locks are always taken in the same order,
i.e. it's never mmu_lock -> mmu_notifier_slots_lock.

> I also thought of busy waiting on down_read_trylock if the MMU notifier
> cannot block, but that would also be invalid for the opposite reason (the
> down_write task might be asleep, waiting for other readers to release the
> task, and the down_read_trylock busy loop might not let that task run).
> 
> > And that's _already_ the worst case since notifications are currently
> > serialized by mmu_lock.
> 
> But right now notifications are not a single critical section, they're two,
> aren't they?

Ah, crud, yes.  Holding a spinlock across the entire start() ... end() would be
bad, especially when the notifier can block since that opens up the possibility
of the task sleeping/blocking/yielding while the spinlock is held.  Bummer.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 26/03/21 03:19, Sean Christopherson wrote:
> > +   /*
> > +* Reset the lock used to prevent memslot updates between MMU notifier
> > +* range_start and range_end.  At this point no more MMU notifiers will
> > +* run, but the lock could still be held if KVM's notifier was removed
> > +* between range_start and range_end.  No threads can be waiting on the
> > +* lock as the last reference on KVM has been dropped.  If the lock is
> > +* still held, freeing memslots will deadlock.
> > +*/
> > +   init_rwsem(>mmu_notifier_slots_lock);
> 
> I was going to say that this is nasty,

Heh, I still think it's nasty.

> then I noticed that
> mmu_notifier_unregister uses SRCU to ensure completion of concurrent calls
> to the MMU notifier.  So I guess it's fine, but it's better to point it out:
> 
>   /*
>* At this point no more MMU notifiers will run and pending
>* calls to range_start have completed, but the lock would
>* still be held and never released if the MMU notifier was
>* removed between range_start and range_end.  Since the last
>* reference to the struct kvm has been dropped, no threads can
>* be waiting on the lock, but we might still end up taking it
>* when freeing memslots in kvm_arch_destroy_vm.  Reset the lock
>* to avoid deadlocks.
>*/
> 
> That said, the easiest way to avoid this would be to always update
> mmu_notifier_count.

Updating mmu_notifier_count requires taking mmu_lock, which would defeat the
purpose of these shenanigans.  I think it could be made atomic, since mmu_lock
would be taken before the elevated count _must_ be visible, but that would
break the mmu_notifier_range_{start,end} optimization that was recently added.

Or did I completely misunderstand what you're suggesting?

> I don't mind the rwsem, but at least I suggest that you
> split the patch in two---the first one keeping the mmu_notifier_count update
> unconditional, and the second one introducing the rwsem and the on_lock
> function kvm_inc_notifier_count.  Please document the new lock in
> Documentation/virt/kvm/locking.rst too.

Note, will update docs.

> Also, related to the first part of the series, perhaps you could structure
> the series in a slightly different way:
> 
> 1) introduce the HVA walking API in common code, complete with on_lock and
> patch 15, so that you can use on_lock to increase mmu_notifier_seq
> 
> 2) then migrate all architectures including x86 to the new API
> 
> IOW, first half of patch 10 and all of patch 15; then the second half of
> patch 10; then patches 11-14.
> 
> > +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > +   down_write(>mmu_notifier_slots_lock);
> > +#endif
> > rcu_assign_pointer(kvm->memslots[as_id], slots);
> > +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > +   up_write(>mmu_notifier_slots_lock);
> > +#endif
> 
> Please do this unconditionally, the cost is minimal if the rwsem is not
> contended (as is the case if the architecture doesn't use MMU notifiers at
> all).

It's not the cost, it's that mmu_notifier_slots_lock doesn't exist.  That's an
easily solved problem, but then the lock wouldn't be initialized since
kvm_init_mmu_notifier() is a nop.  That's again easy to solve, but IMO would
look rather weird.  I guess the counter argument is that __kvm_memslots()
wouldn't need #ifdeffery.

These are the to ideas I've come up with:

Option 1:
static int kvm_init_mmu_notifier(struct kvm *kvm)
{
init_rwsem(>mmu_notifier_slots_lock);

#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
kvm->mmu_notifier.ops = _mmu_notifier_ops;
return mmu_notifier_register(>mmu_notifier, current->mm);
#else
return 0;
#endif
}


Option 2:
kvm_mmu_notifier_lock(kvm);
rcu_assign_pointer(kvm->memslots[as_id], slots);
kvm_mmu_notifier_unlock(kvm);




___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Sean Christopherson wrote:
> On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> > On 31/03/21 21:47, Sean Christopherson wrote:
> > I also thought of busy waiting on down_read_trylock if the MMU notifier
> > cannot block, but that would also be invalid for the opposite reason (the
> > down_write task might be asleep, waiting for other readers to release the
> > task, and the down_read_trylock busy loop might not let that task run).
> > 
> > > And that's _already_ the worst case since notifications are currently
> > > serialized by mmu_lock.
> > 
> > But right now notifications are not a single critical section, they're two,
> > aren't they?
> 
> Ah, crud, yes.  Holding a spinlock across the entire start() ... end() would 
> be
> bad, especially when the notifier can block since that opens up the 
> possibility
> of the task sleeping/blocking/yielding while the spinlock is held.  Bummer.

On a related topic, any preference on whether to have an explicit "must_lock"
flag (what I posted), or derive the logic based on other params?

The helper I posted does:

if (range->must_lock &&
kvm_mmu_lock_and_check_handler(kvm, range, ))
goto out_unlock;

but it could be:

if (!IS_KVM_NULL_FN(range->on_lock) && !range->may_block &&
kvm_mmu_lock_and_check_handler(kvm, range, ))
goto out_unlock;

The generated code should be nearly identical on a modern compiler, so it's
purely a question of aesthetics.  I slightly prefer the explicit "must_lock" to
avoid spreading out the logic too much, but it also feels a bit superfluous.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code

2021-04-01 Thread Sean Christopherson
On Wed, Mar 31, 2021, Paolo Bonzini wrote:
> On 26/03/21 03:19, Sean Christopherson wrote:
> > +#ifdef KVM_ARCH_WANT_NEW_MMU_NOTIFIER_APIS
> > +   kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
> > +#else
> > struct kvm *kvm = mmu_notifier_to_kvm(mn);
> > int idx;
> > trace_kvm_set_spte_hva(address);
> > idx = srcu_read_lock(>srcu);
> > 
> > KVM_MMU_LOCK(kvm);
> > 
> > kvm->mmu_notifier_seq++;
> > 
> > if (kvm_set_spte_hva(kvm, address, pte))
> > kvm_flush_remote_tlbs(kvm);
> > 
> > KVM_MMU_UNLOCK(kvm);
> > srcu_read_unlock(>srcu, idx);
> > +#endif
> 
> The kvm->mmu_notifier_seq is missing in the new API side.  I guess you can
> add an argument to __kvm_handle_hva_range and handle it also in patch 15
> ("KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
> memslot").

Yikes.  Superb eyes!

That does bring up an oddity I discovered when digging into this.  Every call
to .change_pte() is bookended by .invalidate_range_{start,end}(), i.e. the above
missing kvm->mmu_notifier_seq++ is benign because kvm->mmu_notifier_count is
guaranteed to be non-zero.

I'm also fairly certain it means kvm_set_spte_gfn() is effectively dead code on
_all_ architectures.  x86 and MIPS are clearcut nops if the old SPTE is
not-present, and that's guaranteed due to the prior invalidation.  PPC simply
unmaps the SPTE, which again should be a nop due to the invalidation.  arm64 is
a bit murky, but if I'm reading the code correctly, it's also a nop because
kvm_pgtable_stage2_map() is called without a cache pointer, which I think means
it will map an entry if and only if an existing PTE was found.

I haven't actually tested the above analysis, e.g. by asserting that
kvm->mmu_notifier_count is indeed non-zero.  I'll do that sooner than later.
But, given the shortlog of commit:

  6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start
 and invalidate_range_end")

I'm fairly confident my analysis is correct.  And if so, it also means that the
whole point of adding .change_pte() in the first place (for KSM, commit
828502d30073, "ksm: add mmu_notifier set_pte_at_notify()"), has since been lost.

When it was originally added, .change_pte() was a pure alternative to
invalidating the entry.

  void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address,
   pte_t pte)
  {
struct mmu_notifier *mn;
struct hlist_node *n;

rcu_read_lock();
hlist_for_each_entry_rcu(mn, n, >mmu_notifier_mm->list, hlist) {
if (mn->ops->change_pte)
mn->ops->change_pte(mn, mm, address, pte);
/*
 * Some drivers don't have change_pte,
 * so we must call invalidate_page in that case.
 */
else if (mn->ops->invalidate_page)
mn->ops->invalidate_page(mn, mm, address);
}
rcu_read_unlock();
  }

The aforementioned commit 6bdb913f0a70 wrapped set_pte_at_notify() with
invalidate_range_{start,end}() so that .invalidate_page() implementations could
sleep.  But, no one noticed that in doing so, .change_pte() was completely
neutered.

Assuming all of the above is correct, I'm very tempted to rip out .change_pte()
entirely.  It's been dead weight for 8+ years and no one has complained about
KSM+KVM performance (I'd also be curious to know how much performance was gained
by shaving VM-Exits).  As KVM is the only user of .change_pte(), dropping it in
KVM would mean the entire MMU notifier could also go away.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Shameer,
On 4/1/21 2:38 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 01 April 2021 12:49
>> To: yuzenghui 
>> Cc: eric.auger@gmail.com; io...@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
>> kvmarm@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
>> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
>> t...@semihalf.com; zhukeqian ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; wangxingang
>> ; jiangkunkun ;
>> jean-phili...@linaro.org; zhangfei@linaro.org; zhangfei@gmail.com;
>> vivek.gau...@arm.com; Shameerali Kolothum Thodi
>> ; nicoleots...@gmail.com;
>> lushenming ; vse...@nvidia.com; Wanghaibin (D)
>> 
>> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
>> one context descriptor
>>
>> Hi Zenghui,
>>
>> On 3/30/21 11:23 AM, Zenghui Yu wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
 In preparation for vSVA, let's accept userspace provided configs
 with more than one CD. We check the max CD against the host iommu
 capability and also the format (linear versus 2 level).

 Signed-off-by: Eric Auger 
 Signed-off-by: Shameer Kolothum
>> 
 ---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
   1 file changed, 8 insertions(+), 5 deletions(-)

 diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 index 332d31c0680f..ab74a0289893 100644
 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 @@ -3038,14 +3038,17 @@ static int
>> arm_smmu_attach_pasid_table(struct
 iommu_domain *domain,
   if (smmu_domain->s1_cfg.set)
   goto out;
   -    /*
 - * we currently support a single CD so s1fmt and s1dss
 - * fields are also ignored
 - */
 -    if (cfg->pasid_bits)
 +    list_for_each_entry(master, _domain->devices,
 domain_head) {
 +    if (cfg->pasid_bits > master->ssid_bits)
 +    goto out;
 +    }
 +    if (cfg->vendor_data.smmuv3.s1fmt ==
 STRTAB_STE_0_S1FMT_64K_L2 &&
 +    !(smmu->features &
>> ARM_SMMU_FEAT_2_LVL_CDTAB))
   goto out;
     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
 +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
 +    smmu_domain->s1_cfg.s1fmt =
>> cfg->vendor_data.smmuv3.s1fmt;
>>>
>>> And what about the SIDSS field?
>>>
>> I added this patch upon Shameer's request, to be more vSVA friendly.
>> Hower this series does not really target multiple CD support. At the
>> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
>> At this moment maybe I can only check the s1dss field is 0x2. Or simply
>> removes this patch?
>>
>> Thoughts?
> 
> Right. This was useful for vSVA tests. But yes, to properly support multiple 
> CDs
> we need to pass the S1DSS from Qemu. And that requires further changes.
> So I think it's better to remove this patch and reject S1CDMAX != 0 cases.
OK I will remove it

Thanks

Eric
> 
> Thanks,
> Shameer
>
>>
>> Eric
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 01 April 2021 12:49
> To: yuzenghui 
> Cc: eric.auger@gmail.com; io...@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
> t...@semihalf.com; zhukeqian ;
> jacob.jun@linux.intel.com; yi.l@intel.com; wangxingang
> ; jiangkunkun ;
> jean-phili...@linaro.org; zhangfei@linaro.org; zhangfei@gmail.com;
> vivek.gau...@arm.com; Shameerali Kolothum Thodi
> ; nicoleots...@gmail.com;
> lushenming ; vse...@nvidia.com; Wanghaibin (D)
> 
> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
> one context descriptor
> 
> Hi Zenghui,
> 
> On 3/30/21 11:23 AM, Zenghui Yu wrote:
> > Hi Eric,
> >
> > On 2021/2/24 4:56, Eric Auger wrote:
> >> In preparation for vSVA, let's accept userspace provided configs
> >> with more than one CD. We check the max CD against the host iommu
> >> capability and also the format (linear versus 2 level).
> >>
> >> Signed-off-by: Eric Auger 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
> >>   1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> index 332d31c0680f..ab74a0289893 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> @@ -3038,14 +3038,17 @@ static int
> arm_smmu_attach_pasid_table(struct
> >> iommu_domain *domain,
> >>   if (smmu_domain->s1_cfg.set)
> >>   goto out;
> >>   -    /*
> >> - * we currently support a single CD so s1fmt and s1dss
> >> - * fields are also ignored
> >> - */
> >> -    if (cfg->pasid_bits)
> >> +    list_for_each_entry(master, _domain->devices,
> >> domain_head) {
> >> +    if (cfg->pasid_bits > master->ssid_bits)
> >> +    goto out;
> >> +    }
> >> +    if (cfg->vendor_data.smmuv3.s1fmt ==
> >> STRTAB_STE_0_S1FMT_64K_L2 &&
> >> +    !(smmu->features &
> ARM_SMMU_FEAT_2_LVL_CDTAB))
> >>   goto out;
> >>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> >> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
> >> +    smmu_domain->s1_cfg.s1fmt =
> cfg->vendor_data.smmuv3.s1fmt;
> >
> > And what about the SIDSS field?
> >
> I added this patch upon Shameer's request, to be more vSVA friendly.
> Hower this series does not really target multiple CD support. At the
> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
> At this moment maybe I can only check the s1dss field is 0x2. Or simply
> removes this patch?
> 
> Thoughts?

Right. This was useful for vSVA tests. But yes, to properly support multiple CDs
we need to pass the S1DSS from Qemu. And that requires further changes.
So I think it's better to remove this patch and reject S1CDMAX != 0 cases.

Thanks,
Shameer
   
> 
> Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-01 Thread Kunkun Jiang

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger 

---

v13 -> v14
- Actually send the NH_ASID command (reported by Xingang Wang)
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 -
  1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5579ec4fccc8..4c19a1114de4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
*smmu_domain, int ssid,
  }
  
  /* IO_PGTABLE API */

-static void arm_smmu_tlb_inv_context(void *cookie)
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain,
+  int ext_asid)
  {
-   struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cmdq_ent cmd;
  
@@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)

 * insertion to guarantee those are observed before the TLBI. Do be
 * careful, 007.
 */
-   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+   if (ext_asid >= 0) { /* guest stage 1 invalidation */
+   cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
+   cmd.tlbi.asid   = ext_asid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   arm_smmu_cmdq_issue_cmd(smmu, );
+   arm_smmu_cmdq_issue_sync(smmu);
+   } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
} else {
cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
@@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
  }
  
+static void arm_smmu_tlb_inv_context(void *cookie)

+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
  static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 unsigned long iova, size_t size,
 size_t granule,
@@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct 
arm_smmu_cmdq_ent *cmd,
arm_smmu_cmdq_batch_submit(smmu, );
  }
  

Here is the part of code in __arm_smmu_tlb_inv_range():

    if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
    /* Get the leaf page size */
    tg = __ffs(smmu_domain->domain.pgsize_bitmap);

    /* Convert page size of 12,14,16 (log2) to 1,2,3 */
    cmd->tlbi.tg = (tg - 10) / 2;

    /* Determine what level the granule is at */
    cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));

    num_pages = size >> tg;
    }

When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
domain.pgsize_bitmap). In nested mode, it is determined by host 
PAGE_SIZE. If
the host kernel and guest kernel has different translation granule (e.g. 
host 16K,

guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.

Do you have any idea about this issue?

Best Regards,
Kunkun Jiang

-static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
- size_t granule, bool leaf,
- struct arm_smmu_domain *smmu_domain)
+static void
+arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
+ size_t granule, bool leaf, int ext_asid,
+ struct arm_smmu_domain *smmu_domain)
  {
struct arm_smmu_cmdq_ent cmd = {
.tlbi = {
@@ -1936,7 +1950,12 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long 
iova, size_t size,
},
};
  
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {

+   if (ext_asid >= 0) {  /* guest stage 1 invalidation */
+   cmd.opcode  = smmu_domain->smmu->features & 
ARM_SMMU_FEAT_E2H ?
+ CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
+   cmd.tlbi.asid   = ext_asid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = smmu_domain->smmu->features & 
ARM_SMMU_FEAT_E2H ?
  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
cmd.tlbi.asid   = smmu_domain->s1_cfg.cd.asid;
@@ -1944,6 +1963,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long 
iova, size_t size,
cmd.opcode  = 

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 4/1/21 8:11 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
>> *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +    return -EINVAL;
>> +
>> +    if (!smmu)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
> 
> I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
> for guest and invalidate the stale CD entries on the physical side. Is
> PASID-cache type designed for that effect?
Yes it is. PASID-cache matches the CD table.
> 
>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +    return -ENOENT;
>> +    }
>> +
>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +    return -EINVAL;
>> +
>> +    /* IOTLB invalidation */
>> +
>> +    switch (inv_info->granularity) {
>> +    case IOMMU_INV_GRANU_PASID:
>> +    {
>> +    struct iommu_inv_pasid_info *info =
>> +    _info->granu.pasid_info;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +    if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +    return -EINVAL;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +    return 0;
>> +    }
>> +    case IOMMU_INV_GRANU_ADDR:
>> +    {
>> +    struct iommu_inv_addr_info *info = _info->granu.addr_info;
>> +    size_t size = info->nb_granules * info->granule_size;
>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +
>> +    if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +    break;
>> +
>> +    arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +  info->granule_size, leaf,
>> +  info->archid, smmu_domain);
>> +
>> +    arm_smmu_cmdq_issue_sync(smmu);
> 
> There is no need to issue one more SYNC.
Hum yes I did not notice it was made by the arm_smmu_cmdq_issue_cmdlist()

Thanks!

Eric
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> In preparation for vSVA, let's accept userspace provided configs
>> with more than one CD. We check the max CD against the host iommu
>> capability and also the format (linear versus 2 level).
>>
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Shameer Kolothum 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 332d31c0680f..ab74a0289893 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct
>> iommu_domain *domain,
>>   if (smmu_domain->s1_cfg.set)
>>   goto out;
>>   -    /*
>> - * we currently support a single CD so s1fmt and s1dss
>> - * fields are also ignored
>> - */
>> -    if (cfg->pasid_bits)
>> +    list_for_each_entry(master, _domain->devices,
>> domain_head) {
>> +    if (cfg->pasid_bits > master->ssid_bits)
>> +    goto out;
>> +    }
>> +    if (cfg->vendor_data.smmuv3.s1fmt ==
>> STRTAB_STE_0_S1FMT_64K_L2 &&
>> +    !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>>   goto out;
>>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>> +    smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
> 
> And what about the SIDSS field?
> 
I added this patch upon Shameer's request, to be more vSVA friendly.
Hower this series does not really target multiple CD support. At the
moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
At this moment maybe I can only check the s1dss field is 0x2. Or simply
removes this patch?

Thoughts?

Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/8] KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 12:52 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Thu, 01 Apr 2021 09:52:31 +0100,
> Eric Auger  wrote:
>>
>> KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
>> -EEXIST in case the base address of the redist is already set.
>> We currently return -EINVAL.
>>
>> However we need to return -EINVAL in case a legacy REDIST address
>> is attempted to be set while REDIST_REGIONS were set. This case
>> is discriminated by looking at the count field.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - simplify the check sequence
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 15a6c98ee92f0..013b737b658f8 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -791,10 +791,6 @@ static int vgic_v3_insert_redist_region(struct kvm 
>> *kvm, uint32_t index,
>>  size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
>>  int ret;
>>  
>> -/* single rdist region already set ?*/
>> -if (!count && !list_empty(rd_regions))
>> -return -EINVAL;
>> -
>>  /* cross the end of memory ? */
>>  if (base + size < base)
>>  return -EINVAL;
>> @@ -805,11 +801,14 @@ static int vgic_v3_insert_redist_region(struct kvm 
>> *kvm, uint32_t index,
>>  } else {
>>  rdreg = list_last_entry(rd_regions,
>>  struct vgic_redist_region, list);
>> -if (index != rdreg->index + 1)
>> -return -EINVAL;
>>  
>> -/* Cannot add an explicitly sized regions after legacy region */
>> -if (!rdreg->count)
>> +if ((!count) != (!rdreg->count))
>> +return -EINVAL; /* Mix REDIST and REDIST_REGION */
> 
> Urgh... The triple negation killed me. Can we come up with a more
> intuitive expression? Something like:

Yes sometimes I can be "different" ;-)
> 
>   /* Don't mix single region and discrete redist regions */
>   if (!count && rdreg->count)
>   return -EINVAL;>
> Does it capture what you want to express?

yes it does!

Thanks

Eric
> 
> Thanks,
> 
>   M.
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/8] KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base

2021-04-01 Thread Marc Zyngier
Hi Eric,

On Thu, 01 Apr 2021 09:52:31 +0100,
Eric Auger  wrote:
> 
> KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
> -EEXIST in case the base address of the redist is already set.
> We currently return -EINVAL.
> 
> However we need to return -EINVAL in case a legacy REDIST address
> is attempted to be set while REDIST_REGIONS were set. This case
> is discriminated by looking at the count field.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - simplify the check sequence
> ---
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 15a6c98ee92f0..013b737b658f8 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -791,10 +791,6 @@ static int vgic_v3_insert_redist_region(struct kvm *kvm, 
> uint32_t index,
>   size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
>   int ret;
>  
> - /* single rdist region already set ?*/
> - if (!count && !list_empty(rd_regions))
> - return -EINVAL;
> -
>   /* cross the end of memory ? */
>   if (base + size < base)
>   return -EINVAL;
> @@ -805,11 +801,14 @@ static int vgic_v3_insert_redist_region(struct kvm 
> *kvm, uint32_t index,
>   } else {
>   rdreg = list_last_entry(rd_regions,
>   struct vgic_redist_region, list);
> - if (index != rdreg->index + 1)
> - return -EINVAL;
>  
> - /* Cannot add an explicitly sized regions after legacy region */
> - if (!rdreg->count)
> + if ((!count) != (!rdreg->count))
> + return -EINVAL; /* Mix REDIST and REDIST_REGION */

Urgh... The triple negation killed me. Can we come up with a more
intuitive expression? Something like:

/* Don't mix single region and discrete redist regions */
if (!count && rdreg->count)
return -EINVAL;

Does it capture what you want to express?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:17 AM, Zenghui Yu wrote:
> On 2021/2/24 4:56, Eric Auger wrote:
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
> 
> If I understand it correctly, the true nested mode effectively gives us
> a *NS-EL1* StreamWorld. We should therefore use CMDQ_OP_TLBI_NH_VA to
> invalidate the stage-1 NS-EL1 entries, no matter E2H is selected or not.
> 

Yes at the moment you're right. Support for nested virt may induce some
changes here but we are not there. I will fix it and add a comment.
Thank you!

Best Regards

Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/5] Debug info for nVHE hyp panics

2021-04-01 Thread Marc Zyngier
On Thu, 18 Mar 2021 14:33:06 +, Andrew Scull wrote:
> Panics from arm64's nVHE hyp mode are hard to interpret. This series
> adds some more debug info to help with diagnosis.
> 
> Using BUG() in nVHE hyp gives a meaningful address to locate invariants
> that fail to hold. The host can also look up the bug to provide the file
> and line, if the debug configs are enabled. Otherwise a kimg address is
> much more useful than a hyp VA since it can be looked up in vmlinux.
> 
> [...]

Applied to kvm-arm64/nvhe-panic-info, thanks!

[1/5] bug: Remove redundant condition check in report_bug
  commit: 3ad1a6cb0abc63d036fc866bd7c2c5983516dec5
[2/5] bug: Factor out a getter for a bug's file line
  commit: 26dbc7e299c7ebbb6a95e2c620b21b5280b37c57
[3/5] bug: Assign values once in bug_get_file_line()
  commit: 5b8be5d875a996776708ba174fcd08c8bcd721a5
[4/5] KVM: arm64: Use BUG and BUG_ON in nVHE hyp
  commit: f79e616f27ab6cd74deb0995a8eead3d1c9d65af
[5/5] KVM: arm64: Log source when panicking from nVHE hyp
  commit: aec0fae62e47050019474936248a311a0ab08705

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 8/8] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-04-01 Thread Eric Auger
The tests exercise the VGIC_V3 device creation including the
associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:

- KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
- KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION

Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
and especially the GICR_TYPER read. The goal was to test the case
recently fixed by commit 23bde34771f1
("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").

The API under test can be found at
Documentation/virt/kvm/devices/arm-vgic-v3.rst

Signed-off-by: Eric Auger 

---

v3 -> v4:
- update .gitignore
- More vgic-mmio-v3.c change into the previous patch
- rename fuzz_dist_rdist into test_dist_rdist
- cleanup in run_vcpu and guest_code
- max_ipa_bits is global
- s/fuzz/subtest
- added test_kvm_device,
- moved ucall_init() just before the cpu run
- use vm_create_default_with_vcpus
- use vm_gic struct, vm_gic_create, vm_gic_destroy
- revwrite util.c helpers to comply with the usual style
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/aarch64/vgic_init.c | 652 ++
 .../testing/selftests/kvm/include/kvm_util.h  |   9 +
 tools/testing/selftests/kvm/lib/kvm_util.c|  77 +++
 5 files changed, 740 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 7bd7e776c266a..bb862f91f6409 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 /aarch64/get-reg-list
 /aarch64/get-reg-list-sve
+/aarch64/vgic_init
 /s390x/memop
 /s390x/resets
 /s390x/sync_regs_test
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 67eebb53235fd..2fd4801de9ca7 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -78,6 +78,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
 
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
+TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
 TEST_GEN_PROGS_aarch64 += demand_paging_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
b/tools/testing/selftests/kvm/aarch64/vgic_init.c
new file mode 100644
index 0..04e29c4d3e065
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
@@ -0,0 +1,652 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * vgic init sequence tests
+ *
+ * Copyright (C) 2020, Red Hat, Inc.
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define NR_VCPUS   4
+
+#define REDIST_REGION_ATTR_ADDR(count, base, flags, index) (((uint64_t)(count) 
<< 52) | \
+   ((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index)
+#define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset)
+
+#define GICR_TYPER 0x8
+
+struct vm_gic {
+   struct kvm_vm *vm;
+   int gic_fd;
+};
+
+int max_ipa_bits;
+
+/* helper to access a redistributor register */
+static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
+uint32_t *val, bool write)
+{
+   uint64_t attr = REG_OFFSET(vcpu, offset);
+
+   return _kvm_device_access(gicv3_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
+ attr, val, write);
+}
+
+/* dummy guest code */
+static void guest_code(void)
+{
+   GUEST_SYNC(0);
+   GUEST_SYNC(1);
+   GUEST_SYNC(2);
+   GUEST_DONE();
+}
+
+/* we don't want to assert on run execution, hence that helper */
+static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
+{
+   int ret;
+
+   vcpu_args_set(vm, vcpuid, 1);
+   ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
+   get_ucall(vm, vcpuid, NULL);
+
+   if (ret)
+   return -errno;
+   return 0;
+}
+
+static struct vm_gic vm_gic_create(void)
+{
+   struct vm_gic v;
+
+   v.vm = vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL);
+   v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false);
+   TEST_ASSERT(v.gic_fd > 0, "GICv3 device created");
+
+   return v;
+}
+
+static void vm_gic_destroy(struct vm_gic *v)
+{
+   close(v->gic_fd);
+   kvm_vm_free(v->vm);
+}
+
+/**
+ * Helper routine that performs KVM device tests in general and
+ * especially ARM_VGIC_V3 ones. Eventually the ARM_VGIC_V3
+ * device gets created, a legacy RDIST region is set at @0x0
+ * and a DIST region is set @0x6
+ */
+static void subtest_dist_rdist(struct vm_gic *v)
+{
+   int ret;
+   uint64_t addr;
+
+   /* Check existing group/attributes */
+   ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
+  

[PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Eric Auger
Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
reporting of GICR_TYPER.Last for userspace") temporarily fixed
a bug identified when attempting to access the GICR_TYPER
register before the redistributor region setting, but dropped
the support of the LAST bit.

Emulating the GICR_TYPER.Last bit still makes sense for
architecture compliance though. This patch restores its support
(if the redistributor region was set) while keeping the code safe.

We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
computes whether a redistributor is the highest one of a series
of redistributor contributor pages.

The spec says "Indicates whether this Redistributor is the
highest-numbered Redistributor in a series of contiguous
Redistributor pages."

The code is a bit convulated since there is no guarantee
redistributors are added in a given reditributor region in
ascending order. In that case the current implementation was
wrong. Also redistributor regions can be contiguous
and registered in non increasing base address order.

So the index of redistributors are stored in an array within
the redistributor region structure.

With this new implementation we do not need to have a uaccess
read accessor anymore.

Signed-off-by: Eric Auger 
---
 arch/arm64/kvm/vgic/vgic-init.c|  7 +--
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
 arch/arm64/kvm/vgic/vgic.h |  1 +
 include/kvm/arm_vgic.h |  3 +
 4 files changed, 73 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index cf6faa0aeddb2..61150c34c268c 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
int i;
 
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
+   vgic_cpu->index = vcpu->vcpu_id;
 
INIT_LIST_HEAD(_cpu->ap_list_head);
raw_spin_lock_init(_cpu->ap_list_lock);
@@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
 
if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
-   list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
-   list_del(>list);
-   kfree(rdreg);
-   }
+   list_for_each_entry_safe(rdreg, next, >rd_regions, list)
+   vgic_v3_free_redist_region(rdreg);
INIT_LIST_HEAD(>rd_regions);
} else {
dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 987e366c80008..f6a7eed1d6adb 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
*vcpu,
vgic_enable_lpis(vcpu);
 }
 
+static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
+{
+   struct vgic_dist *vgic = >kvm->arch.vgic;
+   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
+
+   if (!rdreg)
+   return false;
+
+   if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
+   /* check whether there is no other contiguous rdist region */
+   struct list_head *rd_regions = >rd_regions;
+   struct vgic_redist_region *iter;
+
+   list_for_each_entry(iter, rd_regions, list) {
+   if (iter->base == rdreg->base + rdreg->count * 
KVM_VGIC_V3_REDIST_SIZE &&
+   iter->free_index > 0) {
+   /* check the first rdist index of this region, if any */
+   if (vgic_cpu->index < iter->rdist_indices[0])
+   return false;
+   }
+   }
+   } else if (vgic_cpu->rdreg_index < rdreg->free_index - 1) {
+   /* look at the index of next rdist */
+   int next_rdist_index = 
rdreg->rdist_indices[vgic_cpu->rdreg_index + 1];
+
+   if (vgic_cpu->index < next_rdist_index)
+   return false;
+   }
+   return true;
+}
+
 static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
  gpa_t addr, unsigned int len)
 {
unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
-   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
-   struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
int target_vcpu_id = vcpu->vcpu_id;
-   gpa_t last_rdist_typer = rdreg->base + GICR_TYPER +
-   (rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE;
u64 value;
 
value = (u64)(mpidr & GENMASK(23, 0)) << 32;
value |= ((target_vcpu_id & 0x) << 8);
 
-   if (addr == last_rdist_typer)
+   if (vgic_has_its(vcpu->kvm))
+

[PATCH v4 6/8] KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]

2021-04-01 Thread Eric Auger
vgic_uaccess() takes a struct vgic_io_device argument, converts it
to a struct kvm_io_device and passes it to the read/write accessor
functions, which convert it back to a struct vgic_io_device.
Avoid the indirection by passing the struct vgic_io_device argument
directly to vgic_uaccess_{read,write}.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- reworded the commit message as suggested by Alexandru
---
 arch/arm64/kvm/vgic/vgic-mmio.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
index b2d73fc0d1ef4..48c6067fc5ecb 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio.c
@@ -938,10 +938,9 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct 
vgic_io_device *iodev,
return region;
 }
 
-static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
+static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct vgic_io_device 
*iodev,
 gpa_t addr, u32 *val)
 {
-   struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
const struct vgic_register_region *region;
struct kvm_vcpu *r_vcpu;
 
@@ -960,10 +959,9 @@ static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct 
kvm_io_device *dev,
return 0;
 }
 
-static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
+static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct vgic_io_device 
*iodev,
  gpa_t addr, const u32 *val)
 {
-   struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
const struct vgic_register_region *region;
struct kvm_vcpu *r_vcpu;
 
@@ -986,9 +984,9 @@ int vgic_uaccess(struct kvm_vcpu *vcpu, struct 
vgic_io_device *dev,
 bool is_write, int offset, u32 *val)
 {
if (is_write)
-   return vgic_uaccess_write(vcpu, >dev, offset, val);
+   return vgic_uaccess_write(vcpu, dev, offset, val);
else
-   return vgic_uaccess_read(vcpu, >dev, offset, val);
+   return vgic_uaccess_read(vcpu, dev, offset, val);
 }
 
 static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 5/8] docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT doc

2021-04-01 Thread Eric Auger
kvm_arch_vcpu_precreate() returns -EBUSY if the vgic is
already initialized. So let's document that KVM_DEV_ARM_VGIC_CTRL_INIT
must be called after all vcpu creations.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- Must be called after all vcpu creations ->
  Must be called after all VCPUs have been created as per
  Alexandru's suggestion
---
 Documentation/virt/kvm/devices/arm-vgic-v3.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst 
b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 5dd3bff519783..51e5e57625716 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -228,7 +228,7 @@ Groups:
 
 KVM_DEV_ARM_VGIC_CTRL_INIT
   request the initialization of the VGIC, no additional parameter in
-  kvm_device_attr.addr.
+  kvm_device_attr.addr. Must be called after all VCPUs have been created.
 KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
   save all LPI pending bits into guest RAM pending tables.
 
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 4/8] KVM: arm/arm64: vgic: Reset base address on kvm_vgic_dist_destroy()

2021-04-01 Thread Eric Auger
On vgic_dist_destroy(), the addresses are not reset. However for
kvm selftest purpose this would allow to continue the test execution
even after a failure when running KVM_RUN. So let's reset the
base addresses.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- use dist-> in the else and add braces
---
 arch/arm64/kvm/vgic/vgic-init.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 052917deb1495..cf6faa0aeddb2 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -335,13 +335,16 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
kfree(dist->spis);
dist->spis = NULL;
dist->nr_spis = 0;
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
 
-   if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
+   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
list_del(>list);
kfree(rdreg);
}
INIT_LIST_HEAD(>rd_regions);
+   } else {
+   dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
}
 
if (vgic_has_its(kvm))
@@ -362,6 +365,7 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
vgic_flush_pending_lpis(vcpu);
 
INIT_LIST_HEAD(_cpu->ap_list_head);
+   vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
 }
 
 /* To be called with kvm->lock held */
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 3/8] KVM: arm64: vgic-v3: Fix error handling in vgic_v3_set_redist_base()

2021-04-01 Thread Eric Auger
vgic_v3_insert_redist_region() may succeed while
vgic_register_all_redist_iodevs fails. For example this happens
while adding a redistributor region overlapping a dist region. The
failure only is detected on vgic_register_all_redist_iodevs when
vgic_v3_check_base() gets called in vgic_register_redist_iodev().

In such a case, remove the newly added redistributor region and free
it.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- fix the commit message and split declaration/assignment of rdreg
---
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 013b737b658f8..987e366c80008 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -860,8 +860,14 @@ int vgic_v3_set_redist_base(struct kvm *kvm, u32 index, 
u64 addr, u32 count)
 * afterwards will register the iodevs when needed.
 */
ret = vgic_register_all_redist_iodevs(kvm);
-   if (ret)
+   if (ret) {
+   struct vgic_redist_region *rdreg;
+
+   rdreg = vgic_v3_rdist_region_from_index(kvm, index);
+   list_del(>list);
+   kfree(rdreg);
return ret;
+   }
 
return 0;
 }
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 1/8] KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base

2021-04-01 Thread Eric Auger
KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
-EEXIST in case the base address of the redist is already set.
We currently return -EINVAL.

However we need to return -EINVAL in case a legacy REDIST address
is attempted to be set while REDIST_REGIONS were set. This case
is discriminated by looking at the count field.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- simplify the check sequence
---
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 15a6c98ee92f0..013b737b658f8 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -791,10 +791,6 @@ static int vgic_v3_insert_redist_region(struct kvm *kvm, 
uint32_t index,
size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
int ret;
 
-   /* single rdist region already set ?*/
-   if (!count && !list_empty(rd_regions))
-   return -EINVAL;
-
/* cross the end of memory ? */
if (base + size < base)
return -EINVAL;
@@ -805,11 +801,14 @@ static int vgic_v3_insert_redist_region(struct kvm *kvm, 
uint32_t index,
} else {
rdreg = list_last_entry(rd_regions,
struct vgic_redist_region, list);
-   if (index != rdreg->index + 1)
-   return -EINVAL;
 
-   /* Cannot add an explicitly sized regions after legacy region */
-   if (!rdreg->count)
+   if ((!count) != (!rdreg->count))
+   return -EINVAL; /* Mix REDIST and REDIST_REGION */
+
+   if (!count)
+   return -EEXIST;
+
+   if (index != rdreg->index + 1)
return -EINVAL;
}
 
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/8] KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read

2021-04-01 Thread Eric Auger
The doc says:
"The characteristics of a specific redistributor region can
 be read by presetting the index field in the attr data.
 Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"

Unfortunately the existing code fails to read the input attr data.

Fixes: 04c110932225 ("KVM: arm/arm64: Implement 
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
Cc: sta...@vger.kernel.org#v4.17+
Signed-off-by: Eric Auger 
Reviewed-by: Alexandru Elisei 

---

v1 -> v2:
- in the commit message, remove the statement that the index always is 0
- add Alexandru's R-b
---
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c 
b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 44419679f91ad..2f66cf2472825 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -226,6 +226,9 @@ static int vgic_get_common_attr(struct kvm_device *dev,
u64 addr;
unsigned long type = (unsigned long)attr->attr;
 
+   if (copy_from_user(, uaddr, sizeof(addr)))
+   return -EFAULT;
+
r = kvm_vgic_addr(dev->kvm, type, , false);
if (r)
return (r == -ENODEV) ? -ENXIO : r;
-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 0/8] KVM/ARM: Some vgic fixes and init sequence KVM selftests

2021-04-01 Thread Eric Auger
While writting vgic v3 init sequence KVM selftests I noticed some
relatively minor issues. This was also the opportunity to try to
fix the issue laterly reported by Zenghui, related to the RDIST_TYPER
last bit emulation. The final patch is a first batch of VGIC init
sequence selftests. Of course they can be augmented with a lot more
register access tests, but let's try to move forward incrementally ...

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/vgic_kvmselftests_v4

History:
v3 -> v4:
- take into account Drew's comment on the kvm selftests. No
  change to the KVM related patches compared to v3
v2 ->v3:
- reworked last bit read accessor to handle contiguous redist
  regions and rdist not registered in ascending order
- removed [PATCH 5/9] KVM: arm: move has_run_once after the
  map_resources
v1 -> v2:
- Took into account all comments from Marc and Alexandru's except
  the has_run_once still after the map_resources (this would oblige
  me to revisit in depth the selftests)


Eric Auger (8):
  KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base
  KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
  KVM: arm64: vgic-v3: Fix error handling in vgic_v3_set_redist_base()
  KVM: arm/arm64: vgic: Reset base address on kvm_vgic_dist_destroy()
  docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT doc
  KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]
  KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
  KVM: selftests: aarch64/vgic-v3 init sequence tests

 .../virt/kvm/devices/arm-vgic-v3.rst  |   2 +-
 arch/arm64/kvm/vgic/vgic-init.c   |  13 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c |   3 +
 arch/arm64/kvm/vgic/vgic-mmio-v3.c| 116 +++-
 arch/arm64/kvm/vgic/vgic-mmio.c   |  10 +-
 arch/arm64/kvm/vgic/vgic.h|   1 +
 include/kvm/arm_vgic.h|   3 +
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/aarch64/vgic_init.c | 652 ++
 .../testing/selftests/kvm/include/kvm_util.h  |   9 +
 tools/testing/selftests/kvm/lib/kvm_util.c|  77 +++
 12 files changed, 838 insertions(+), 50 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c

-- 
2.26.3

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-04-01 Thread Zenghui Yu

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:

+static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||


I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
for guest and invalidate the stale CD entries on the physical side. Is
PASID-cache type designed for that effect?


+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   _info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = _info->granu.addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ info->granule_size, leaf,
+ info->archid, smmu_domain);
+
+   arm_smmu_cmdq_issue_sync(smmu);


There is no need to issue one more SYNC.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm