Re: [PATCH v4 15/16] KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification

2022-11-22 Thread Reiji Watanabe
On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:
>
> The way we compute the target vcpu on getting an overflow is
> a bit odd, as we use the PMC array as an anchor for kvm_pmc_to_vcpu,
> while we could directly compute the correct address.
>
> Get rid of the intermediate step and directly compute the target
> vcpu.
>
> Signed-off-by: Marc Zyngier 

Reviewed-by: Reiji Watanabe 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 14/16] KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest

2022-11-22 Thread Reiji Watanabe
On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:
>
> Now that the infrastructure is in place, bump the PMU support up
> to PMUv3p5.
>
> Signed-off-by: Marc Zyngier 

Reviewed-by: Reiji Watanabe 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 13/16] KVM: arm64: PMU: Implement PMUv3p5 long counter support

2022-11-22 Thread Reiji Watanabe
Hi Marc,

On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:
>
> PMUv3p5 (which is mandatory with ARMv8.5) comes with some extra
> features:
>
> - All counters are 64bit
>
> - The overflow point is controlled by the PMCR_EL0.LP bit
>
> Add the required checks in the helpers that control counter
> width and overflow, as well as the sysreg handling for the LP
> bit. A new kvm_pmu_is_3p5() helper makes it easy to spot the
> PMUv3p5 specific handling.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/pmu-emul.c | 8 +---
>  arch/arm64/kvm/sys_regs.c | 4 
>  include/kvm/arm_pmu.h | 7 +++
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index 4320c389fa7f..c37cc67ff1d7 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -52,13 +52,15 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
>   */
>  static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
>  {
> -   return (select_idx == ARMV8_PMU_CYCLE_IDX);
> +   return (select_idx == ARMV8_PMU_CYCLE_IDX || kvm_pmu_is_3p5(vcpu));
>  }
>
>  static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
> select_idx)
>  {
> -   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> -   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> +   u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0);
> +
> +   return (select_idx < ARMV8_PMU_CYCLE_IDX && (val & 
> ARMV8_PMU_PMCR_LP)) ||
> +  (select_idx == ARMV8_PMU_CYCLE_IDX && (val & 
> ARMV8_PMU_PMCR_LC));

Since the vCPU's PMCR_EL0 value is not always in sync with
kvm->arch.dfr0_pmuver.imp, shouldn't kvm_pmu_idx_has_64bit_overflow()
check kvm_pmu_is_3p5() ?
(e.g. when the host supports PMUv3p5, PMCR.LP will be set by reset_pmcr()
initially. Then, even if userspace sets ID_AA64DFR0_EL1.PMUVER to
PMUVer_V3P1, PMCR.LP will stay the same (still set) unless PMCR is
written.  So, kvm_pmu_idx_has_64bit_overflow() might return true
even though the guest's PMU version is lower than PMUVer_V3P5.)


>  }
>
>  static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index dc201a0557c0..615cb148e22a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -654,6 +654,8 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
> struct sys_reg_desc *r)
>| (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
> if (!kvm_supports_32bit_el0())
> val |= ARMV8_PMU_PMCR_LC;
> +   if (!kvm_pmu_is_3p5(vcpu))
> +   val &= ~ARMV8_PMU_PMCR_LP;
> __vcpu_sys_reg(vcpu, r->reg) = val;
>  }
>
> @@ -703,6 +705,8 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
> sys_reg_params *p,
> val |= p->regval & ARMV8_PMU_PMCR_MASK;
> if (!kvm_supports_32bit_el0())
> val |= ARMV8_PMU_PMCR_LC;
> +   if (!kvm_pmu_is_3p5(vcpu))
> +   val &= ~ARMV8_PMU_PMCR_LP;
> __vcpu_sys_reg(vcpu, PMCR_EL0) = val;
> kvm_pmu_handle_pmcr(vcpu, val);
> kvm_vcpu_pmu_restore_guest(vcpu);

For the read case of access_pmcr() (the code below),
since PMCR.LP is RES0 when FEAT_PMUv3p5 is not implemented,
shouldn't it clear PMCR.LP if kvm_pmu_is_3p5(vcpu) is false ?
(Similar issue to kvm_pmu_idx_has_64bit_overflow())

} else {
/* PMCR.P & PMCR.C are RAZ */
val = __vcpu_sys_reg(vcpu, PMCR_EL0)
  & ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
p->regval = val;
}

Thank you,
Reiji

> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 812f729c9108..628775334d5e 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -89,6 +89,12 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
> vcpu->arch.pmu.events = *kvm_get_pmu_events();  \
> } while (0)
>
> +/*
> + * Evaluates as true when emulating PMUv3p5, and false otherwise.
> + */
> +#define kvm_pmu_is_3p5(vcpu)   \
> +   (vcpu->kvm->arch.dfr0_pmuver.imp >= ID_AA64DFR0_EL1_PMUVer_V3P5)
> +
>  u8 kvm_arm_pmu_get_pmuver_limit(void);
>
>  #else
> @@ -153,6 +159,7 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu 
> *vcpu, bool pmceid1)
>  }
>
>  #define kvm_vcpu_has_pmu(vcpu) ({ false; })
> +#define kvm_pmu_is_3p5(vcpu)   ({ false; })
>  static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {}
> --
> 2.34.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/2] KVM: selftests: Have perf_test_util signal when to stop vCPUs

2022-11-22 Thread Gavin Shan

On 11/19/22 5:15 AM, Oliver Upton wrote:

Signal that a test run is complete through perf_test_args instead of
having tests open code a similar solution. Ensure that the field resets
to false at the beginning of a test run as the structure is reused
between test runs, eliminating a couple of bugs:

access_tracking_perf_test hangs indefinitely on a subsequent test run,
as 'done' remains true. The bug doesn't amount to much right now, as x86
supports a single guest mode. However, this is a precondition of
enabling the test for other architectures with >1 guest mode, like
arm64.

memslot_modification_stress_test has the exact opposite problem, where
subsequent test runs complete immediately as 'run_vcpus' remains false.

Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
[oliver: added commit message, preserve spin_wait_for_next_iteration()]
Signed-off-by: Oliver Upton 
---
  tools/testing/selftests/kvm/access_tracking_perf_test.c   | 8 +---
  tools/testing/selftests/kvm/include/perf_test_util.h  | 3 +++
  tools/testing/selftests/kvm/lib/perf_test_util.c  | 3 +++
  .../selftests/kvm/memslot_modification_stress_test.c  | 6 +-
  4 files changed, 8 insertions(+), 12 deletions(-)



Reviewed-by: Gavin Shan 


diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c 
b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 76c583a07ea2..942370d57392 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -58,9 +58,6 @@ static enum {
ITERATION_MARK_IDLE,
  } iteration_work;
  
-/* Set to true when vCPU threads should exit. */

-static bool done;
-
  /* The iteration that was last completed by each vCPU. */
  static int vcpu_last_completed_iteration[KVM_MAX_VCPUS];
  
@@ -211,7 +208,7 @@ static bool spin_wait_for_next_iteration(int *current_iteration)

int last_iteration = *current_iteration;
  
  	do {

-   if (READ_ONCE(done))
+   if (READ_ONCE(perf_test_args.stop_vcpus))
return false;
  
  		*current_iteration = READ_ONCE(iteration);

@@ -321,9 +318,6 @@ static void run_test(enum vm_guest_mode mode, void *arg)
mark_memory_idle(vm, nr_vcpus);
access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory");
  
-	/* Set done to signal the vCPU threads to exit */

-   done = true;
-
perf_test_join_vcpu_threads(nr_vcpus);
perf_test_destroy_vm(vm);
  }
diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h 
b/tools/testing/selftests/kvm/include/perf_test_util.h
index eaa88df0555a..536d7c3c3f14 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -40,6 +40,9 @@ struct perf_test_args {
/* Run vCPUs in L2 instead of L1, if the architecture supports it. */
bool nested;
  
+	/* Test is done, stop running vCPUs. */

+   bool stop_vcpus;
+
struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
  };
  
diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c

index 9618b37c66f7..ee3f499ccbd2 100644
--- a/tools/testing/selftests/kvm/lib/perf_test_util.c
+++ b/tools/testing/selftests/kvm/lib/perf_test_util.c
@@ -267,6 +267,7 @@ void perf_test_start_vcpu_threads(int nr_vcpus,
  
  	vcpu_thread_fn = vcpu_fn;

WRITE_ONCE(all_vcpu_threads_running, false);
+   WRITE_ONCE(perf_test_args.stop_vcpus, false);
  
  	for (i = 0; i < nr_vcpus; i++) {

struct vcpu_thread *vcpu = _threads[i];
@@ -289,6 +290,8 @@ void perf_test_join_vcpu_threads(int nr_vcpus)
  {
int i;
  
+	WRITE_ONCE(perf_test_args.stop_vcpus, true);

+
for (i = 0; i < nr_vcpus; i++)
pthread_join(vcpu_threads[i].thread, NULL);
  }
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c 
b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index bb1d17a1171b..3a5e4518307c 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -34,8 +34,6 @@
  static int nr_vcpus = 1;
  static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
  
-static bool run_vcpus = true;

-
  static void vcpu_worker(struct perf_test_vcpu_args *vcpu_args)
  {
struct kvm_vcpu *vcpu = vcpu_args->vcpu;
@@ -45,7 +43,7 @@ static void vcpu_worker(struct perf_test_vcpu_args *vcpu_args)
run = vcpu->run;
  
  	/* Let the guest access its memory until a stop signal is received */

-   while (READ_ONCE(run_vcpus)) {
+   while (!READ_ONCE(perf_test_args.stop_vcpus)) {
ret = _vcpu_run(vcpu);
TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
  
@@ -110,8 +108,6 @@ static void run_test(enum vm_guest_mode mode, void *arg)

add_remove_memslot(vm,