from:"Wei Huang"

[PATCH v3 0/4] Handle #GP for SVM execution instructions

2021-01-26 Thread Wei Huang

While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before #VMEXIT. This causes unexpected #GP
under nested virtualization. To solve this problem, this patchset makes
KVM trap #GP and emulate these SVM instuctions accordingly.

Also newer AMD CPUs will change this behavior by triggering #VMEXIT
before #GP. This change is indicated by CPUID_0x800A_EDX[28]. Under
this circumstance, #GP interception is not required. This patchset supports
the new feature.

This patchset has been verified with vmrun_errata_test and vmware_backdoor
tests of kvm_unit_test on the following configs. Also it was verified that
vmware_backdoor can be turned on under nested on nested.
  * Current CPU: nested, nested on nested
  * New CPU with X86_FEATURE_SVME_ADDR_CHK: nested, nested on nested

v2->v3:
  * Change the decode function name to x86_decode_emulated_instruction()
  * Add a new variable, svm_gp_erratum_intercept, to control interception
  * Turn on VM's X86_FEATURE_SVME_ADDR_CHK feature in svm_set_cpu_caps()
  * Fix instruction emulation for vmware_backdoor under nested-on-nested
  * Minor comment fixes

v1->v2:
  * Factor out instruction decode for sharing
  * Re-org gp_interception() handling for both #GP and vmware_backdoor
  * Use kvm_cpu_cap for X86_FEATURE_SVME_ADDR_CHK feature support
  * Add nested on nested support

Thanks,
-Wei

Wei Huang (4):
  KVM: x86: Factor out x86 instruction emulation with decoding
  KVM: SVM: Add emulation support for #GP triggered by SVM instructions
  KVM: SVM: Add support for SVM instruction address check change
  KVM: SVM: Support #GP handling for the case of nested on nested

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/kvm/svm/svm.c | 128 +
 arch/x86/kvm/x86.c |  62 --
 arch/x86/kvm/x86.h |   2 +
 4 files changed, 152 insertions(+), 41 deletions(-)

-- 
2.27.0

Re: [PATCH v3 3/4] KVM: SVM: Add support for SVM instruction address check change

2021-01-26 Thread Wei Huang





On 1/26/21 5:52 AM, Maxim Levitsky wrote:

On Tue, 2021-01-26 at 03:18 -0500, Wei Huang wrote:

New AMD CPUs have a change that checks #VMEXIT intercept on special SVM
instructions before checking their EAX against reserved memory region.
This change is indicated by CPUID_0x800A_EDX[28]. If it is 1, #VMEXIT
is triggered before #GP. KVM doesn't need to intercept and emulate #GP
faults as #GP is supposed to be triggered.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
Reviewed-by: Maxim Levitsky 
---
  arch/x86/include/asm/cpufeatures.h | 1 +
  arch/x86/kvm/svm/svm.c | 3 +++
  2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..ea89d6fdd79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
  #define X86_FEATURE_AVIC  (15*32+13) /* Virtual Interrupt 
Controller */
  #define X86_FEATURE_V_VMSAVE_VMLOAD   (15*32+15) /* Virtual VMSAVE VMLOAD */
  #define X86_FEATURE_VGIF  (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_SVME_ADDR_CHK  (15*32+28) /* "" SVME addr check */
  
  /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */

  #define X86_FEATURE_AVX512VBMI(16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e5ca01e25e89..f9233c79265b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1036,6 +1036,9 @@ static __init int svm_hardware_setup(void)
}
}
  
+	if (boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))

+   svm_gp_erratum_intercept = false;
+

Again, I would make svm_gp_erratum_intercept a tri-state module param,
and here if it is in 'auto' state do this.



I will try to craft a param patch and see if it flies...

Re: [PATCH v3 0/4] Handle #GP for SVM execution instructions

2021-01-26 Thread Wei Huang





On 1/26/21 5:39 AM, Paolo Bonzini wrote:

On 26/01/21 09:18, Wei Huang wrote:

While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before #VMEXIT. This causes unexpected #GP
under nested virtualization. To solve this problem, this patchset makes
KVM trap #GP and emulate these SVM instuctions accordingly.

Also newer AMD CPUs will change this behavior by triggering #VMEXIT
before #GP. This change is indicated by CPUID_0x800A_EDX[28]. Under
this circumstance, #GP interception is not required. This patchset 
supports

the new feature.

This patchset has been verified with vmrun_errata_test and 
vmware_backdoor
tests of kvm_unit_test on the following configs. Also it was verified 
that

vmware_backdoor can be turned on under nested on nested.
   * Current CPU: nested, nested on nested
   * New CPU with X86_FEATURE_SVME_ADDR_CHK: nested, nested on nested

v2->v3:
   * Change the decode function name to x86_decode_emulated_instruction()
   * Add a new variable, svm_gp_erratum_intercept, to control 
interception

   * Turn on VM's X86_FEATURE_SVME_ADDR_CHK feature in svm_set_cpu_caps()
   * Fix instruction emulation for vmware_backdoor under nested-on-nested
   * Minor comment fixes

v1->v2:
   * Factor out instruction decode for sharing
   * Re-org gp_interception() handling for both #GP and vmware_backdoor
   * Use kvm_cpu_cap for X86_FEATURE_SVME_ADDR_CHK feature support
   * Add nested on nested support

Thanks,
-Wei

Wei Huang (4):
   KVM: x86: Factor out x86 instruction emulation with decoding
   KVM: SVM: Add emulation support for #GP triggered by SVM instructions
   KVM: SVM: Add support for SVM instruction address check change
   KVM: SVM: Support #GP handling for the case of nested on nested

  arch/x86/include/asm/cpufeatures.h |   1 +
  arch/x86/kvm/svm/svm.c | 128 +
  arch/x86/kvm/x86.c |  62 --
  arch/x86/kvm/x86.h |   2 +
  4 files changed, 152 insertions(+), 41 deletions(-)



Queued, thanks.


Thanks. BTW because we use kvm_cpu_cap_set() in svm_set_cpu_caps(). This 
will be reflected into the CPUID received by QEMU. QEMU needs a one-line 
patch to declare the new feature. I will send it out this morning.


-Wei



Paolo

[PATCH v3 3/4] KVM: SVM: Add support for SVM instruction address check change

2021-01-26 Thread Wei Huang

New AMD CPUs have a change that checks #VMEXIT intercept on special SVM
instructions before checking their EAX against reserved memory region.
This change is indicated by CPUID_0x800A_EDX[28]. If it is 1, #VMEXIT
is triggered before #GP. KVM doesn't need to intercept and emulate #GP
faults as #GP is supposed to be triggered.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
Reviewed-by: Maxim Levitsky 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kvm/svm/svm.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..ea89d6fdd79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_SVME_ADDR_CHK  (15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e5ca01e25e89..f9233c79265b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1036,6 +1036,9 @@ static __init int svm_hardware_setup(void)
}
}
 
+   if (boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))
+   svm_gp_erratum_intercept = false;
+
if (vgif) {
if (!boot_cpu_has(X86_FEATURE_VGIF))
vgif = false;
-- 
2.27.0

[PATCH v3 4/4] KVM: SVM: Support #GP handling for the case of nested on nested

2021-01-26 Thread Wei Huang

Under the case of nested on nested (L0->L1->L2->L3), #GP triggered by
SVM instructions can be hided from L1. Instead the hypervisor can
inject the proper #VMEXIT to inform L1 of what is happening. Thus L1
can avoid invoking the #GP workaround. For this reason we turns on
guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM running inside VM to
receive the notification and change behavior.

Similarly we check if vcpu is under guest mode before emulating the
vmware-backdoor instructions. For the case of nested on nested, we
let the guest handle it.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
Tested-by: Maxim Levitsky 
Reviewed-by: Maxim Levitsky 
---
 arch/x86/kvm/svm/svm.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f9233c79265b..83c401d2709f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -929,6 +929,9 @@ static __init void svm_set_cpu_caps(void)
 
if (npt_enabled)
kvm_cpu_cap_set(X86_FEATURE_NPT);
+
+   /* Nested VM can receive #VMEXIT instead of triggering #GP */
+   kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
}
 
/* CPUID 0x8008 */
@@ -2198,6 +2201,11 @@ static int svm_instr_opcode(struct kvm_vcpu *vcpu)
 
 static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
 {
+   const int guest_mode_exit_codes[] = {
+   [SVM_INSTR_VMRUN] = SVM_EXIT_VMRUN,
+   [SVM_INSTR_VMLOAD] = SVM_EXIT_VMLOAD,
+   [SVM_INSTR_VMSAVE] = SVM_EXIT_VMSAVE,
+   };
int (*const svm_instr_handlers[])(struct vcpu_svm *svm) = {
[SVM_INSTR_VMRUN] = vmrun_interception,
[SVM_INSTR_VMLOAD] = vmload_interception,
@@ -2205,7 +2213,14 @@ static int emulate_svm_instr(struct kvm_vcpu *vcpu, int 
opcode)
};
struct vcpu_svm *svm = to_svm(vcpu);
 
-   return svm_instr_handlers[opcode](svm);
+   if (is_guest_mode(vcpu)) {
+   svm->vmcb->control.exit_code = guest_mode_exit_codes[opcode];
+   svm->vmcb->control.exit_info_1 = 0;
+   svm->vmcb->control.exit_info_2 = 0;
+
+   return nested_svm_vmexit(svm);
+   } else
+   return svm_instr_handlers[opcode](svm);
 }
 
 /*
@@ -2239,7 +2254,8 @@ static int gp_interception(struct vcpu_svm *svm)
 * VMware backdoor emulation on #GP interception only handles
 * IN{S}, OUT{S}, and RDPMC.
 */
-   return kvm_emulate_instruction(vcpu,
+   if (!is_guest_mode(vcpu))
+   return kvm_emulate_instruction(vcpu,
EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
} else
return emulate_svm_instr(vcpu, opcode);
-- 
2.27.0

[PATCH v3 2/4] KVM: SVM: Add emulation support for #GP triggered by SVM instructions

2021-01-26 Thread Wei Huang

From: Bandan Das 

While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before VMEXIT. This causes problem under
nested virtualization. To solve this problem, KVM needs to trap #GP and
check the instructions triggering #GP. For VM execution instructions,
KVM emulates these instructions.

Co-developed-by: Wei Huang 
Signed-off-by: Wei Huang 
Signed-off-by: Bandan Das 
---
 arch/x86/kvm/svm/svm.c | 109 ++---
 1 file changed, 91 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..e5ca01e25e89 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -200,6 +200,8 @@ module_param(sev_es, int, 0444);
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
+bool svm_gp_erratum_intercept = true;
+
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
 
 static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -288,6 +290,9 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
if (!(efer & EFER_SVME)) {
svm_leave_nested(svm);
svm_set_gif(svm, true);
+   /* #GP intercept is still needed in vmware_backdoor */
+   if (!enable_vmware_backdoor)
+   clr_exception_intercept(svm, GP_VECTOR);
 
/*
 * Free the nested guest state, unless we are in SMM.
@@ -309,6 +314,10 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+   /* Enable #GP interception for SVM instructions */
+   if (svm_gp_erratum_intercept)
+   set_exception_intercept(svm, GP_VECTOR);
+
return 0;
 }
 
@@ -1957,24 +1966,6 @@ static int ac_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int gp_interception(struct vcpu_svm *svm)
-{
-   struct kvm_vcpu *vcpu = >vcpu;
-   u32 error_code = svm->vmcb->control.exit_info_1;
-
-   WARN_ON_ONCE(!enable_vmware_backdoor);
-
-   /*
-* VMware backdoor emulation on #GP interception only handles IN{S},
-* OUT{S}, and RDPMC, none of which generate a non-zero error code.
-*/
-   if (error_code) {
-   kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
-   return 1;
-   }
-   return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
-}
-
 static bool is_erratum_383(void)
 {
int err, i;
@@ -2173,6 +2164,88 @@ static int vmrun_interception(struct vcpu_svm *svm)
return nested_svm_vmrun(svm);
 }
 
+enum {
+   NONE_SVM_INSTR,
+   SVM_INSTR_VMRUN,
+   SVM_INSTR_VMLOAD,
+   SVM_INSTR_VMSAVE,
+};
+
+/* Return NONE_SVM_INSTR if not SVM instrs, otherwise return decode result */
+static int svm_instr_opcode(struct kvm_vcpu *vcpu)
+{
+   struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+
+   if (ctxt->b != 0x1 || ctxt->opcode_len != 2)
+   return NONE_SVM_INSTR;
+
+   switch (ctxt->modrm) {
+   case 0xd8: /* VMRUN */
+   return SVM_INSTR_VMRUN;
+   case 0xda: /* VMLOAD */
+   return SVM_INSTR_VMLOAD;
+   case 0xdb: /* VMSAVE */
+   return SVM_INSTR_VMSAVE;
+   default:
+   break;
+   }
+
+   return NONE_SVM_INSTR;
+}
+
+static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
+{
+   int (*const svm_instr_handlers[])(struct vcpu_svm *svm) = {
+   [SVM_INSTR_VMRUN] = vmrun_interception,
+   [SVM_INSTR_VMLOAD] = vmload_interception,
+   [SVM_INSTR_VMSAVE] = vmsave_interception,
+   };
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   return svm_instr_handlers[opcode](svm);
+}
+
+/*
+ * #GP handling code. Note that #GP can be triggered under the following two
+ * cases:
+ *   1) SVM VM-related instructions (VMRUN/VMSAVE/VMLOAD) that trigger #GP on
+ *  some AMD CPUs when EAX of these instructions are in the reserved memory
+ *  regions (e.g. SMM memory on host).
+ *   2) VMware backdoor
+ */
+static int gp_interception(struct vcpu_svm *svm)
+{
+   struct kvm_vcpu *vcpu = >vcpu;
+   u32 error_code = svm->vmcb->control.exit_info_1;
+   int opcode;
+
+   /* Both #GP cases have zero error_code */
+   if (error_code)
+   goto reinject;
+
+   /* Decode the instruction for usage later */
+   if (x86_decode_emulated_instruction(vcpu, 0, NULL, 0) != EMULATION_OK)
+   goto reinject;
+
+   opcode = svm_instr_opcode(vcpu);
+
+   if (opcode == NONE_SVM_INSTR) {
+   WARN_ON_ONCE(!enable_vmware_backdoor);
+
+   /

[PATCH v3 1/4] KVM: x86: Factor out x86 instruction emulation with decoding

2021-01-26 Thread Wei Huang

Move the instruction decode part out of x86_emulate_instruction() for it
to be used in other places. Also kvm_clear_exception_queue() is moved
inside the if-statement as it doesn't apply when KVM are coming back from
userspace.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
---
 arch/x86/kvm/x86.c | 62 +-
 arch/x86/kvm/x86.h |  2 ++
 2 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a8969a6dd06..a1c83cd43c1a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7298,6 +7298,42 @@ static bool is_vmware_backdoor_opcode(struct 
x86_emulate_ctxt *ctxt)
return false;
 }
 
+/*
+ * Decode to be emulated instruction. Return EMULATION_OK if success.
+ */
+int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
+   void *insn, int insn_len)
+{
+   int r = EMULATION_OK;
+   struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+
+   init_emulate_ctxt(vcpu);
+
+   /*
+* We will reenter on the same instruction since we do not set
+* complete_userspace_io. This does not handle watchpoints yet,
+* those would be handled in the emulate_ops.
+*/
+   if (!(emulation_type & EMULTYPE_SKIP) &&
+   kvm_vcpu_check_breakpoint(vcpu, ))
+   return r;
+
+   ctxt->interruptibility = 0;
+   ctxt->have_exception = false;
+   ctxt->exception.vector = -1;
+   ctxt->perm_ok = false;
+
+   ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
+
+   r = x86_decode_insn(ctxt, insn, insn_len);
+
+   trace_kvm_emulate_insn_start(vcpu);
+   ++vcpu->stat.insn_emulation;
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(x86_decode_emulated_instruction);
+
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len)
 {
@@ -7317,32 +7353,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, 
gpa_t cr2_or_gpa,
 */
write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
vcpu->arch.write_fault_to_shadow_pgtable = false;
-   kvm_clear_exception_queue(vcpu);
 
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
-   init_emulate_ctxt(vcpu);
-
-   /*
-* We will reenter on the same instruction since
-* we do not set complete_userspace_io.  This does not
-* handle watchpoints yet, those would be handled in
-* the emulate_ops.
-*/
-   if (!(emulation_type & EMULTYPE_SKIP) &&
-   kvm_vcpu_check_breakpoint(vcpu, ))
-   return r;
-
-   ctxt->interruptibility = 0;
-   ctxt->have_exception = false;
-   ctxt->exception.vector = -1;
-   ctxt->perm_ok = false;
-
-   ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
-
-   r = x86_decode_insn(ctxt, insn, insn_len);
+   kvm_clear_exception_queue(vcpu);
 
-   trace_kvm_emulate_insn_start(vcpu);
-   ++vcpu->stat.insn_emulation;
+   r = x86_decode_emulated_instruction(vcpu, emulation_type,
+   insn, insn_len);
if (r != EMULATION_OK)  {
if ((emulation_type & EMULTYPE_TRAP_UD) ||
(emulation_type & EMULTYPE_TRAP_UD_FORCED)) {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c5ee0f5ce0f1..482e7f24801e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -273,6 +273,8 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu 
*vcpu, gfn_t gfn,
  int page_num);
 bool kvm_vector_hashing_enabled(void);
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 
error_code);
+int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
+   void *insn, int insn_len);
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
 fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
-- 
2.27.0

Re: [PATCH v2 2/4] KVM: SVM: Add emulation support for #GP triggered by SVM instructions

2021-01-21 Thread Wei Huang




On 1/21/21 8:07 AM, Maxim Levitsky wrote:
> On Thu, 2021-01-21 at 01:55 -0500, Wei Huang wrote:
>> From: Bandan Das 
>>
>> While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
>> CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
>> before checking VMCB's instruction intercept. If EAX falls into such
>> memory areas, #GP is triggered before VMEXIT. This causes problem under
>> nested virtualization. To solve this problem, KVM needs to trap #GP and
>> check the instructions triggering #GP. For VM execution instructions,
>> KVM emulates these instructions.
>>
>> Co-developed-by: Wei Huang 
>> Signed-off-by: Wei Huang 
>> Signed-off-by: Bandan Das 
>> ---
>>  arch/x86/kvm/svm/svm.c | 99 ++
>>  1 file changed, 81 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 7ef171790d02..6ed523cab068 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -288,6 +288,9 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>>  if (!(efer & EFER_SVME)) {
>>  svm_leave_nested(svm);
>>  svm_set_gif(svm, true);
>> +/* #GP intercept is still needed in vmware_backdoor */
>> +if (!enable_vmware_backdoor)
>> +clr_exception_intercept(svm, GP_VECTOR);
> Again I would prefer a flag for the errata workaround, but this is still
> better.

Instead of using !enable_vmware_backdoor, will the following be better?
Or the existing form is acceptable.

if (!kvm_cpu_cap_has(X86_FEATURE_SVME_ADDR_CHK))
clr_exception_intercept(svm, GP_VECTOR);

> 
>>  
>>  /*
>>   * Free the nested guest state, unless we are in SMM.
>> @@ -309,6 +312,9 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>>  
>>  svm->vmcb->save.efer = efer | EFER_SVME;
>>  vmcb_mark_dirty(svm->vmcb, VMCB_CR);
>> +/* Enable #GP interception for SVM instructions */
>> +set_exception_intercept(svm, GP_VECTOR);
>> +
>>  return 0;
>>  }
>>  
>> @@ -1957,24 +1963,6 @@ static int ac_interception(struct vcpu_svm *svm)
>>  return 1;
>>  }
>>  
>> -static int gp_interception(struct vcpu_svm *svm)
>> -{
>> -struct kvm_vcpu *vcpu = >vcpu;
>> -u32 error_code = svm->vmcb->control.exit_info_1;
>> -
>> -WARN_ON_ONCE(!enable_vmware_backdoor);
>> -
>> -/*
>> - * VMware backdoor emulation on #GP interception only handles IN{S},
>> - * OUT{S}, and RDPMC, none of which generate a non-zero error code.
>> - */
>> -if (error_code) {
>> -kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
>> -return 1;
>> -}
>> -return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
>> -}
>> -
>>  static bool is_erratum_383(void)
>>  {
>>  int err, i;
>> @@ -2173,6 +2161,81 @@ static int vmrun_interception(struct vcpu_svm *svm)
>>  return nested_svm_vmrun(svm);
>>  }
>>  
>> +enum {
>> +NOT_SVM_INSTR,
>> +SVM_INSTR_VMRUN,
>> +SVM_INSTR_VMLOAD,
>> +SVM_INSTR_VMSAVE,
>> +};
>> +
>> +/* Return NOT_SVM_INSTR if not SVM instrs, otherwise return decode result */
>> +static int svm_instr_opcode(struct kvm_vcpu *vcpu)
>> +{
>> +struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>> +
>> +if (ctxt->b != 0x1 || ctxt->opcode_len != 2)
>> +return NOT_SVM_INSTR;
>> +
>> +switch (ctxt->modrm) {
>> +case 0xd8: /* VMRUN */
>> +return SVM_INSTR_VMRUN;
>> +case 0xda: /* VMLOAD */
>> +return SVM_INSTR_VMLOAD;
>> +case 0xdb: /* VMSAVE */
>> +return SVM_INSTR_VMSAVE;
>> +default:
>> +break;
>> +}
>> +
>> +return NOT_SVM_INSTR;
>> +}
>> +
>> +static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
>> +{
>> +int (*const svm_instr_handlers[])(struct vcpu_svm *svm) = {
>> +[SVM_INSTR_VMRUN] = vmrun_interception,
>> +[SVM_INSTR_VMLOAD] = vmload_interception,
>> +[SVM_INSTR_VMSAVE] = vmsave_interception,
>> +};
>> +struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +return svm_instr_handlers[opcode](svm);
>> +}
>> +
>> +/*
>> + *

Re: [PATCH v2 1/4] KVM: x86: Factor out x86 instruction emulation with decoding

2021-01-21 Thread Wei Huang




On 1/21/21 8:23 AM, Paolo Bonzini wrote:
> On 21/01/21 15:04, Maxim Levitsky wrote:
>>> +int x86_emulate_decoded_instruction(struct kvm_vcpu *vcpu, int
>>> emulation_type,
>>> +    void *insn, int insn_len)
>> Isn't the name of this function wrong? This function decodes the
>> instruction.
>> So I would expect something like x86_decode_instruction.
>>
> 
> Yes, that or x86_decode_emulated_instruction.

I was debating about it while making the change. I will update it to new
name in v3.

> 
> Paolo
>

[PATCH v2 3/4] KVM: SVM: Add support for VMCB address check change

2021-01-20 Thread Wei Huang

New AMD CPUs have a change that checks VMEXIT intercept on special SVM
instructions before checking their EAX against reserved memory region.
This change is indicated by CPUID_0x800A_EDX[28]. If it is 1, #VMEXIT
is triggered before #GP. KVM doesn't need to intercept and emulate #GP
faults as #GP is supposed to be triggered.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kvm/svm/svm.c | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..ea89d6fdd79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_SVME_ADDR_CHK  (15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6ed523cab068..2a12870ac71a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -313,7 +313,8 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
/* Enable #GP interception for SVM instructions */
-   set_exception_intercept(svm, GP_VECTOR);
+   if (!kvm_cpu_cap_has(X86_FEATURE_SVME_ADDR_CHK))
+   set_exception_intercept(svm, GP_VECTOR);
 
return 0;
 }
@@ -933,6 +934,9 @@ static __init void svm_set_cpu_caps(void)
boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
 
+   if (boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))
+   kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
+
/* Enable INVPCID feature */
kvm_cpu_cap_check_and_set(X86_FEATURE_INVPCID);
 }
-- 
2.27.0

[PATCH v2 4/4] KVM: SVM: Support #GP handling for the case of nested on nested

2021-01-20 Thread Wei Huang

Under the case of nested on nested (e.g. L0->L1->L2->L3), #GP triggered
by SVM instructions can be hided from L1. Instead the hypervisor can
inject the proper #VMEXIT to inform L1 of what is happening. Thus L1
can avoid invoking the #GP workaround. For this reason we turns on
guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM running inside VM to
receive the notification and change behavior.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
---
 arch/x86/kvm/svm/svm.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2a12870ac71a..89512c0e7663 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2196,6 +2196,11 @@ static int svm_instr_opcode(struct kvm_vcpu *vcpu)
 
 static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
 {
+   const int guest_mode_exit_codes[] = {
+   [SVM_INSTR_VMRUN] = SVM_EXIT_VMRUN,
+   [SVM_INSTR_VMLOAD] = SVM_EXIT_VMLOAD,
+   [SVM_INSTR_VMSAVE] = SVM_EXIT_VMSAVE,
+   };
int (*const svm_instr_handlers[])(struct vcpu_svm *svm) = {
[SVM_INSTR_VMRUN] = vmrun_interception,
[SVM_INSTR_VMLOAD] = vmload_interception,
@@ -2203,7 +2208,14 @@ static int emulate_svm_instr(struct kvm_vcpu *vcpu, int 
opcode)
};
struct vcpu_svm *svm = to_svm(vcpu);
 
-   return svm_instr_handlers[opcode](svm);
+   if (is_guest_mode(vcpu)) {
+   svm->vmcb->control.exit_code = guest_mode_exit_codes[opcode];
+   svm->vmcb->control.exit_info_1 = 0;
+   svm->vmcb->control.exit_info_2 = 0;
+
+   return nested_svm_vmexit(svm);
+   } else
+   return svm_instr_handlers[opcode](svm);
 }
 
 /*
@@ -4034,6 +4046,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu 
*vcpu)
/* Check again if INVPCID interception if required */
svm_check_invpcid(svm);
 
+   if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM)) {
+   best = kvm_find_cpuid_entry(vcpu, 0x800A, 0);
+   best->edx |= (1 << 28);
+   }
+
/* For sev guests, the memory encryption bit is not reserved in CR3.  */
if (sev_guest(vcpu->kvm)) {
best = kvm_find_cpuid_entry(vcpu, 0x801F, 0);
-- 
2.27.0

[PATCH v2 2/4] KVM: SVM: Add emulation support for #GP triggered by SVM instructions

2021-01-20 Thread Wei Huang

From: Bandan Das 

While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before VMEXIT. This causes problem under
nested virtualization. To solve this problem, KVM needs to trap #GP and
check the instructions triggering #GP. For VM execution instructions,
KVM emulates these instructions.

Co-developed-by: Wei Huang 
Signed-off-by: Wei Huang 
Signed-off-by: Bandan Das 
---
 arch/x86/kvm/svm/svm.c | 99 ++
 1 file changed, 81 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..6ed523cab068 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -288,6 +288,9 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
if (!(efer & EFER_SVME)) {
svm_leave_nested(svm);
svm_set_gif(svm, true);
+   /* #GP intercept is still needed in vmware_backdoor */
+   if (!enable_vmware_backdoor)
+   clr_exception_intercept(svm, GP_VECTOR);
 
/*
 * Free the nested guest state, unless we are in SMM.
@@ -309,6 +312,9 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+   /* Enable #GP interception for SVM instructions */
+   set_exception_intercept(svm, GP_VECTOR);
+
return 0;
 }
 
@@ -1957,24 +1963,6 @@ static int ac_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int gp_interception(struct vcpu_svm *svm)
-{
-   struct kvm_vcpu *vcpu = >vcpu;
-   u32 error_code = svm->vmcb->control.exit_info_1;
-
-   WARN_ON_ONCE(!enable_vmware_backdoor);
-
-   /*
-* VMware backdoor emulation on #GP interception only handles IN{S},
-* OUT{S}, and RDPMC, none of which generate a non-zero error code.
-*/
-   if (error_code) {
-   kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
-   return 1;
-   }
-   return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
-}
-
 static bool is_erratum_383(void)
 {
int err, i;
@@ -2173,6 +2161,81 @@ static int vmrun_interception(struct vcpu_svm *svm)
return nested_svm_vmrun(svm);
 }
 
+enum {
+   NOT_SVM_INSTR,
+   SVM_INSTR_VMRUN,
+   SVM_INSTR_VMLOAD,
+   SVM_INSTR_VMSAVE,
+};
+
+/* Return NOT_SVM_INSTR if not SVM instrs, otherwise return decode result */
+static int svm_instr_opcode(struct kvm_vcpu *vcpu)
+{
+   struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+
+   if (ctxt->b != 0x1 || ctxt->opcode_len != 2)
+   return NOT_SVM_INSTR;
+
+   switch (ctxt->modrm) {
+   case 0xd8: /* VMRUN */
+   return SVM_INSTR_VMRUN;
+   case 0xda: /* VMLOAD */
+   return SVM_INSTR_VMLOAD;
+   case 0xdb: /* VMSAVE */
+   return SVM_INSTR_VMSAVE;
+   default:
+   break;
+   }
+
+   return NOT_SVM_INSTR;
+}
+
+static int emulate_svm_instr(struct kvm_vcpu *vcpu, int opcode)
+{
+   int (*const svm_instr_handlers[])(struct vcpu_svm *svm) = {
+   [SVM_INSTR_VMRUN] = vmrun_interception,
+   [SVM_INSTR_VMLOAD] = vmload_interception,
+   [SVM_INSTR_VMSAVE] = vmsave_interception,
+   };
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   return svm_instr_handlers[opcode](svm);
+}
+
+/*
+ * #GP handling code. Note that #GP can be triggered under the following two
+ * cases:
+ *   1) SVM VM-related instructions (VMRUN/VMSAVE/VMLOAD) that trigger #GP on
+ *  some AMD CPUs when EAX of these instructions are in the reserved memory
+ *  regions (e.g. SMM memory on host).
+ *   2) VMware backdoor
+ */
+static int gp_interception(struct vcpu_svm *svm)
+{
+   struct kvm_vcpu *vcpu = >vcpu;
+   u32 error_code = svm->vmcb->control.exit_info_1;
+   int opcode;
+
+   /* Both #GP cases have zero error_code */
+   if (error_code)
+   goto reinject;
+
+   /* Decode the instruction for usage later */
+   if (x86_emulate_decoded_instruction(vcpu, 0, NULL, 0) != EMULATION_OK)
+   goto reinject;
+
+   opcode = svm_instr_opcode(vcpu);
+   if (opcode)
+   return emulate_svm_instr(vcpu, opcode);
+   else
+   return kvm_emulate_instruction(vcpu,
+   EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
+
+reinject:
+   kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
+   return 1;
+}
+
 void svm_set_gif(struct vcpu_svm *svm, bool value)
 {
if (value) {
-- 
2.27.0

[PATCH v2 1/4] KVM: x86: Factor out x86 instruction emulation with decoding

2021-01-20 Thread Wei Huang

Move the instruction decode part out of x86_emulate_instruction() for it
to be used in other places. Also kvm_clear_exception_queue() is moved
inside the if-statement as it doesn't apply when KVM are coming back from
userspace.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
---
 arch/x86/kvm/x86.c | 63 +-
 arch/x86/kvm/x86.h |  2 ++
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a8969a6dd06..580883cee493 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7298,6 +7298,43 @@ static bool is_vmware_backdoor_opcode(struct 
x86_emulate_ctxt *ctxt)
return false;
 }
 
+/*
+ * Decode and emulate instruction. Return EMULATION_OK if success.
+ */
+int x86_emulate_decoded_instruction(struct kvm_vcpu *vcpu, int emulation_type,
+   void *insn, int insn_len)
+{
+   int r = EMULATION_OK;
+   struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+
+   init_emulate_ctxt(vcpu);
+
+   /*
+* We will reenter on the same instruction since
+* we do not set complete_userspace_io.  This does not
+* handle watchpoints yet, those would be handled in
+* the emulate_ops.
+*/
+   if (!(emulation_type & EMULTYPE_SKIP) &&
+   kvm_vcpu_check_breakpoint(vcpu, ))
+   return r;
+
+   ctxt->interruptibility = 0;
+   ctxt->have_exception = false;
+   ctxt->exception.vector = -1;
+   ctxt->perm_ok = false;
+
+   ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
+
+   r = x86_decode_insn(ctxt, insn, insn_len);
+
+   trace_kvm_emulate_insn_start(vcpu);
+   ++vcpu->stat.insn_emulation;
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(x86_emulate_decoded_instruction);
+
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len)
 {
@@ -7317,32 +7354,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, 
gpa_t cr2_or_gpa,
 */
write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
vcpu->arch.write_fault_to_shadow_pgtable = false;
-   kvm_clear_exception_queue(vcpu);
 
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
-   init_emulate_ctxt(vcpu);
-
-   /*
-* We will reenter on the same instruction since
-* we do not set complete_userspace_io.  This does not
-* handle watchpoints yet, those would be handled in
-* the emulate_ops.
-*/
-   if (!(emulation_type & EMULTYPE_SKIP) &&
-   kvm_vcpu_check_breakpoint(vcpu, ))
-   return r;
-
-   ctxt->interruptibility = 0;
-   ctxt->have_exception = false;
-   ctxt->exception.vector = -1;
-   ctxt->perm_ok = false;
-
-   ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
-
-   r = x86_decode_insn(ctxt, insn, insn_len);
+   kvm_clear_exception_queue(vcpu);
 
-   trace_kvm_emulate_insn_start(vcpu);
-   ++vcpu->stat.insn_emulation;
+   r = x86_emulate_decoded_instruction(vcpu, emulation_type,
+   insn, insn_len);
if (r != EMULATION_OK)  {
if ((emulation_type & EMULTYPE_TRAP_UD) ||
(emulation_type & EMULTYPE_TRAP_UD_FORCED)) {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c5ee0f5ce0f1..fc42454a4c27 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -273,6 +273,8 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu 
*vcpu, gfn_t gfn,
  int page_num);
 bool kvm_vector_hashing_enabled(void);
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 
error_code);
+int x86_emulate_decoded_instruction(struct kvm_vcpu *vcpu, int emulation_type,
+   void *insn, int insn_len);
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
 fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
-- 
2.27.0

[PATCH v2 0/4] Handle #GP for SVM execution instructions

2021-01-20 Thread Wei Huang

While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before #VMEXIT. This causes unexpected #GP
under nested virtualization. To solve this problem, this patchset makes
KVM trap #GP and emulate these SVM instuctions accordingly.

Also newer AMD CPUs will change this behavior by triggering #VMEXIT
before #GP. This change is indicated by CPUID_0x800A_EDX[28]. Under
this circumstance, #GP interception is not required. This patchset supports
the new feature.

This patchset has been verified with vmrun_errata_test and vmware_backdoors
tests of kvm_unit_test on the following configs:
  * Current CPU: nested, nested on nested
  * New CPU with X86_FEATURE_SVME_ADDR_CHK: nested, nested on nested

v1->v2:
  * Factor out instruction decode for sharing
  * Re-org gp_interception() handling for both #GP and vmware_backdoor
  * Use kvm_cpu_cap for X86_FEATURE_SVME_ADDR_CHK feature support
  * Add nested on nested support

Thanks,
-Wei

Wei Huang (4):
  KVM: x86: Factor out x86 instruction emulation with decoding
  KVM: SVM: Add emulation support for #GP triggered by SVM instructions
  KVM: SVM: Add support for VMCB address check change
  KVM: SVM: Support #GP handling for the case of nested on nested

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/kvm/svm/svm.c | 120 -
 arch/x86/kvm/x86.c |  63 +--
 arch/x86/kvm/x86.h |   2 +
 4 files changed, 145 insertions(+), 41 deletions(-)

-- 
2.27.0

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-14 Thread Wei Huang




On 1/12/21 8:01 AM, Paolo Bonzini wrote:
> On 12/01/21 07:37, Wei Huang wrote:
>>   static int gp_interception(struct vcpu_svm *svm)
>>   {
>>   struct kvm_vcpu *vcpu = >vcpu;
>>   u32 error_code = svm->vmcb->control.exit_info_1;
>> -
>> -    WARN_ON_ONCE(!enable_vmware_backdoor);
>> +    int rc;
>>     /*
>> - * VMware backdoor emulation on #GP interception only handles IN{S},
>> - * OUT{S}, and RDPMC, none of which generate a non-zero error code.
>> + * Only VMware backdoor and SVM VME errata are handled. Neither of
>> + * them has non-zero error codes.
>>    */
>>   if (error_code) {
>>   kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
>>   return 1;
>>   }
>> -    return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
>> +
>> +    rc = kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
>> +    if (rc > 1)
>> +    rc = svm_emulate_vm_instr(vcpu, rc);
>> +    return rc;
>>   }
>>   
> 
> Passing back the third byte is quick hacky.  Instead of this change to
> kvm_emulate_instruction, I'd rather check the instruction bytes in
> gp_interception before calling kvm_emulate_instruction.  That would be
> something like:
> 
> - move "kvm_clear_exception_queue(vcpu);" inside the "if
> (!(emulation_type & EMULTYPE_NO_DECODE))".  It doesn't apply when you
> are coming back from userspace.
> 
> - extract the "if (!(emulation_type & EMULTYPE_NO_DECODE))" body to a
> new function x86_emulate_decoded_instruction.  Call it from
> gp_interception, we know this is not a pagefault and therefore
> vcpu->arch.write_fault_to_shadow_pgtable must be false.

If the whole body inside if-statement is moved out, do you expect the
interface of x86_emulate_decoded_instruction to be something like:

int x86_emulate_decoded_instruction(struct kvm_vcpu *vcpu,
gpa_t cr2_or_gpa,
int emulation_type, void *insn,
int insn_len,
bool write_fault_to_spt)

And if so, what is the emulation type to use when calling this function
from svm.c? EMULTYPE_VMWARE_GP?

> 
> - check ctxt->insn_bytes for an SVM instruction
> 
> - if not an SVM instruction, call kvm_emulate_instruction(vcpu,
> EMULTYPE_VMWARE_GP|EMULTYPE_NO_DECODE).
> 
> Thanks,
> 
> Paolo
>

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-12 Thread Wei Huang





On 1/12/21 12:58 PM, Andy Lutomirski wrote:

Andrew Cooper points out that there may be a nicer workaround.  Make
sure that the SMRAM and HT region (FFFD - ) are
marked as reserved in the guest, too.


In theory this proposed solution can avoid intercepting #GP. But in 
reality SMRAM regions can be different on different machines. So this 
solution can break after VM migration.

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-12 Thread Wei Huang





On 1/12/21 11:59 AM, Sean Christopherson wrote:

On Tue, Jan 12, 2021, Sean Christopherson wrote:

On Tue, Jan 12, 2021, Wei Huang wrote:

From: Bandan Das 

While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept.


It would be very helpful to list exactly which CPUs are/aren't affected, even if
that just means stating something like "all CPUs before XYZ".  Given patch 2/2,
I assume it's all CPUs without the new CPUID flag?


This behavior was dated back to fairly old CPUs. It is fair to assume 
that _most_ CPUs without this CPUID bit can demonstrate such behavior.




Ah, despite calling this an 'errata', the bad behavior is explicitly documented
in the APM, i.e. it's an architecture bug, not a silicon bug.

Can you reword the changelog to make it clear that the premature #GP is the
correct architectural behavior for CPUs without the new CPUID flag?


Sure, will do in the next version.

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-12 Thread Wei Huang





On 1/12/21 11:56 AM, Sean Christopherson wrote:

On Tue, Jan 12, 2021, Andy Lutomirski wrote:



On Jan 12, 2021, at 7:46 AM, Bandan Das  wrote:

Andy Lutomirski  writes:
...

#endif diff --git a/arch/x86/kvm/mmu/mmu.c
b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..c5c4aaf01a1a 100644
--- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@
-50,6 +50,7 @@ #include  #include  #include
 +#include  #include
"trace.h"

extern bool itlb_multihit_kvm_mitigation; @@ -5675,6 +5676,12 @@
void kvm_mmu_slot_set_dirty(struct kvm *kvm, }
EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);

+bool kvm_is_host_reserved_region(u64 gpa) +{ + return
e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED); +}

While _e820__mapped_any()'s doc says '..  checks if any part of
the range  is mapped ..' it seems to me that the real
check is [start, end) so we should use 'gpa' instead of 'gpa-1',
no?

Why do you need to check GPA at all?


To reduce the scope of the workaround.

The errata only happens when you use one of SVM instructions in the
guest with EAX that happens to be inside one of the host reserved
memory regions (for example SMM).


This code reduces the scope of the workaround at the cost of
increasing the complexity of the workaround and adding a nonsensical
coupling between KVM and host details and adding an export that really
doesn’t deserve to be exported.

Is there an actual concrete benefit to this check?


Besides reducing the scope, my intention for the check was that we should
know if such exceptions occur for any other undiscovered reasons with other
memory types rather than hiding them under this workaround.


Ask AMD?


There are several checking before VMRUN launch. The function, 
e820__mapped_raw_any(), was definitely one of the easies way to figure 
out the problematic regions we had.




I would also believe that someone somewhere has a firmware that simply omits
the problematic region instead of listing it as reserved.


I agree with Andy, odds are very good that attempting to be precise will lead to
pain due to false negatives.

And, KVM's SVM instruction emulation needs to be be rock solid regardless of
this behavior since KVM unconditionally intercepts the instruction, i.e. there's
basically zero risk to KVM.



Are you saying that the instruction decode before 
kvm_is_host_reserved_region() already guarantee the instructions #GP hit 
are SVM execution instructions (see below)? If so, I think this argument 
is fair.


+   switch (ctxt->modrm) {
+   case 0xd8: /* VMRUN */
+   case 0xda: /* VMLOAD */
+   case 0xdb: /* VMSAVE */

Bandan: What is your thoughts about removing kvm_is_host_reserved_region()?

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-12 Thread Wei Huang





On 1/12/21 6:15 AM, Vitaly Kuznetsov wrote:

Wei Huang  writes:


From: Bandan Das 

While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before VMEXIT. This causes problem under
nested virtualization. To solve this problem, KVM needs to trap #GP and
check the instructions triggering #GP. For VM execution instructions,
KVM emulates these instructions; otherwise it re-injects #GP back to
guest VMs.

Signed-off-by: Bandan Das 
Co-developed-by: Wei Huang 
Signed-off-by: Wei Huang 
---
  arch/x86/include/asm/kvm_host.h |   8 +-
  arch/x86/kvm/mmu.h  |   1 +
  arch/x86/kvm/mmu/mmu.c  |   7 ++
  arch/x86/kvm/svm/svm.c  | 157 +++-
  arch/x86/kvm/svm/svm.h  |   8 ++
  arch/x86/kvm/vmx/vmx.c  |   2 +-
  arch/x86/kvm/x86.c  |  37 +++-
  7 files changed, 146 insertions(+), 74 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..0ddc309f5a14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
   * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
   * Used to test the full emulator from userspace.
   *
- * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
+ * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
   *backdoor emulation, which is opt in via module param.
   *VMware backoor emulation handles select instructions
- * and reinjects the #GP for all other cases.
+ * and reinjects #GP for all other cases. This also
+ * handles other cases where #GP condition needs to be
+ * handled and emulated appropriately
   *
   * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in 
which
   * case the CR2/GPA value pass on the stack is valid.
@@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
  #define EMULTYPE_SKIP (1 << 2)
  #define EMULTYPE_ALLOW_RETRY_PF   (1 << 3)
  #define EMULTYPE_TRAP_UD_FORCED   (1 << 4)
-#define EMULTYPE_VMWARE_GP (1 << 5)
+#define EMULTYPE_PARAVIRT_GP   (1 << 5)
  #define EMULTYPE_PF   (1 << 6)
  
  int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 581925e476d6..1a2fff4e7140 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
  
  int kvm_mmu_post_init_vm(struct kvm *kvm);

  void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+bool kvm_is_host_reserved_region(u64 gpa);


Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?


Will do in v2.



  
  #endif

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..c5c4aaf01a1a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -50,6 +50,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "trace.h"
  
  extern bool itlb_multihit_kvm_mitigation;

@@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
  }
  EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
  
+bool kvm_is_host_reserved_region(u64 gpa)

+{
+   return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
+}


While _e820__mapped_any()'s doc says '..  checks if any part of the
range  is mapped ..' it seems to me that the real check is
[start, end) so we should use 'gpa' instead of 'gpa-1', no?


I think you are right. The statement of "entry->addr >= end || 
entry->addr + entry->size <= start" shows the checking is against the 
area of [start, end).





+EXPORT_SYMBOL_GPL(kvm_is_host_reserved_region);
+
  void kvm_mmu_zap_all(struct kvm *kvm)
  {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..74620d32aa82 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -288,6 +288,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
if (!(efer & EFER_SVME)) {
svm_leave_nested(svm);
svm_set_gif(svm, true);
+   clr_exception_intercept(svm, GP_VECTOR);
  
  			/*

 * Free the nested guest state, unless we are in SMM.
@@ -309,6 +310,10 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
  
  	svm->vmcb->save.efer = efer | EFER_SVME;

vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+   /* Enable GP interception for SVM instructions if needed */
+   if (efer & EFER_SVM

Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-12 Thread Wei Huang





On 1/12/21 5:09 AM, Maxim Levitsky wrote:

On Tue, 2021-01-12 at 00:37 -0600, Wei Huang wrote:

From: Bandan Das 

While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before VMEXIT. This causes problem under
nested virtualization. To solve this problem, KVM needs to trap #GP and
check the instructions triggering #GP. For VM execution instructions,
KVM emulates these instructions; otherwise it re-injects #GP back to
guest VMs.

Signed-off-by: Bandan Das 
Co-developed-by: Wei Huang 
Signed-off-by: Wei Huang 


This is the ultimate fix for this bug that I had in mind,
but I didn't dare to develop it, thinking it won't be accepted
due to the added complexity.

 From a cursory look this look all right, and I will review
and test this either today or tomorrow.


My tests mainly relied on the kvm-unit-test you developed (thanks BTW), 
on machines w/ and w/o CPUID_0x800A_EDX[28]=1. Both cases passed.

[PATCH 2/2] KVM: SVM: Add support for VMCB address check change

2021-01-11 Thread Wei Huang

New AMD CPUs have a change that checks VMEXIT intercept on special SVM
instructions before checking their EAX against reserved memory region.
This change is indicated by CPUID_0x800A_EDX[28]. If it is 1, KVM
doesn't need to intercept and emulate #GP faults for such instructions
because #GP isn't supposed to be triggered.

Co-developed-by: Bandan Das 
Signed-off-by: Bandan Das 
Signed-off-by: Wei Huang 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kvm/svm/svm.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..ea89d6fdd79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -337,6 +337,7 @@
 #define X86_FEATURE_AVIC   (15*32+13) /* Virtual Interrupt 
Controller */
 #define X86_FEATURE_V_VMSAVE_VMLOAD(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF   (15*32+16) /* Virtual GIF */
+#define X86_FEATURE_SVME_ADDR_CHK  (15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit 
Manipulation instructions*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 74620d32aa82..451b82df2eab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -311,7 +311,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
/* Enable GP interception for SVM instructions if needed */
-   if (efer & EFER_SVME)
+   if ((efer & EFER_SVME) && !boot_cpu_has(X86_FEATURE_SVME_ADDR_CHK))
set_exception_intercept(svm, GP_VECTOR);
 
return 0;
-- 
2.27.0

[PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

2021-01-11 Thread Wei Huang

From: Bandan Das 

While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
before checking VMCB's instruction intercept. If EAX falls into such
memory areas, #GP is triggered before VMEXIT. This causes problem under
nested virtualization. To solve this problem, KVM needs to trap #GP and
check the instructions triggering #GP. For VM execution instructions,
KVM emulates these instructions; otherwise it re-injects #GP back to
guest VMs.

Signed-off-by: Bandan Das 
Co-developed-by: Wei Huang 
Signed-off-by: Wei Huang 
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.h  |   1 +
 arch/x86/kvm/mmu/mmu.c  |   7 ++
 arch/x86/kvm/svm/svm.c  | 157 +++-
 arch/x86/kvm/svm/svm.h  |   8 ++
 arch/x86/kvm/vmx/vmx.c  |   2 +-
 arch/x86/kvm/x86.c  |  37 +++-
 7 files changed, 146 insertions(+), 74 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..0ddc309f5a14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
  *  due to an intercepted #UD (see EMULTYPE_TRAP_UD).
  *  Used to test the full emulator from userspace.
  *
- * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
+ * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
  * backdoor emulation, which is opt in via module param.
  * VMware backoor emulation handles select instructions
- * and reinjects the #GP for all other cases.
+ * and reinjects #GP for all other cases. This also
+ * handles other cases where #GP condition needs to be
+ * handled and emulated appropriately
  *
  * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
  *  case the CR2/GPA value pass on the stack is valid.
@@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
 #define EMULTYPE_SKIP  (1 << 2)
 #define EMULTYPE_ALLOW_RETRY_PF(1 << 3)
 #define EMULTYPE_TRAP_UD_FORCED(1 << 4)
-#define EMULTYPE_VMWARE_GP (1 << 5)
+#define EMULTYPE_PARAVIRT_GP   (1 << 5)
 #define EMULTYPE_PF(1 << 6)
 
 int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 581925e476d6..1a2fff4e7140 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+bool kvm_is_host_reserved_region(u64 gpa);
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..c5c4aaf01a1a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "trace.h"
 
 extern bool itlb_multihit_kvm_mitigation;
@@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
 
+bool kvm_is_host_reserved_region(u64 gpa)
+{
+   return e820__mapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
+}
+EXPORT_SYMBOL_GPL(kvm_is_host_reserved_region);
+
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..74620d32aa82 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -288,6 +288,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
if (!(efer & EFER_SVME)) {
svm_leave_nested(svm);
svm_set_gif(svm, true);
+   clr_exception_intercept(svm, GP_VECTOR);
 
/*
 * Free the nested guest state, unless we are in SMM.
@@ -309,6 +310,10 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+   /* Enable GP interception for SVM instructions if needed */
+   if (efer & EFER_SVME)
+   set_exception_intercept(svm, GP_VECTOR);
+
return 0;
 }
 
@@ -1957,22 +1962,104 @@ static int ac_interception(struct vcpu_svm *svm)
return 1;
 }
 
+static int vmload_interception(struct vcpu_svm *svm)
+{
+   struct vmcb *nested_vmcb;
+   struct kvm_host_map map;
+   int ret;
+
+   if (nested_svm_check_permissions(svm))
+   return 1;
+
+   ret = kvm_vcpu_map(>vcpu, gpa_to_gfn(svm->vmcb->save.rax), );
+   if (ret) {
+   if (ret == -EINVAL)
+

Re: k10temp: ZEN3 readings are broken

2020-12-21 Thread Wei Huang





On 12/21/20 11:09 PM, Gabriel C wrote:

Am Di., 22. Dez. 2020 um 05:33 Uhr schrieb Wei Huang :




On 12/21/20 9:58 PM, Guenter Roeck wrote:

Hi,

On 12/21/20 5:45 PM, Gabriel C wrote:

Hello Guenter,

while trying to add ZEN3 support for zenpower out of tree modules, I find out
the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).

commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:

case 0x0 ... 0x1:   /* Zen3 */

however, this is wrong, we look for a model which is 0x21 for ZEN3,
these seem to
be steppings?


These are model numbers for server CPUs. I believe 0x21 is for desktop
CPUs. In other words, current upstream code doesn't support your CPUs.
You are welcomed to add support for 0x21, but it is wrong to remove
support for 0x00/0x01.


I figured that myself after seeing what was committed to amd_energy driver.
Would be better you as the author of the patch to have a better commit
message to start with.


commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e
Author: Wei Huang 
Date:   Mon Sep 14 15:07:15 2020 -0500

hwmon: (k10temp) Add support for Zen3 CPUs


Which you didn't. That should read:

"Added support for NOT yet released SP3 ZEN3 CPU"

Right?


Yes. This subject line can be more clear with something like "Add 
support for Zen3 Server and TR CPUs".








Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
the model.

Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
also ) that should be:

PLANE0  (ZEN_SVI_BASE + 0x10)
PLANE1  (ZEN_SVI_BASE + 0xc)


Same problem here with model 0x71. 0x31 is for server CPUs.


Yes, is why I split both in my 'guess what the eff is this about' patch.

0x31 is TR 3000/ Sp3 ZEN2 , while 0x71 is ZEN2 Desktop.




Which is the same as for ZEN2 >= 0x71. Since this is not really
documented and I have some
confirmations of these numbers from *somewhere* :-) I created a demo patch only.

I would like AMD people to really have a look at the driver and
confirm the changes, since
getting information from *somewhere*,  dosen't mean they are 100%
correct. However, the driver
is working with these changes.

In any way the model needs changing to 0x21 even if we let the other
readings broken.

There is my demo patch:

https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch


For family 19h, the patch should look like. But this might not matter
anymore as suggested by Guenter below.

   /* F19h thermal registers through SMN */
#define F19H_M01_SVI_TEL_PLANE0 (ZEN_SVI_BASE + 0x14)
#define F19H_M01_SVI_TEL_PLANE1 (ZEN_SVI_BASE + 0x10)
+/* Zen3 Ryzen */
+#define F19H_M21H_SVI_TEL_PLANE0   (ZEN_SVI_BASE + 0x10)
+#define F19H_M21H_SVI_TEL_PLANE1   (ZEN_SVI_BASE + 0xc)

Then add the following change:

 switch (boot_cpu_data.x86_model) {
 case 0x0 ... 0x1:   /* Zen3 */
 data->show_current = true;
 data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
 data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
 data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
 data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
 k10temp_get_ccd_support(pdev, data, 8);
+   case 0x21:  /* Zen3 */
+   data->show_current = true;
+   data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
+   data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
+   data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
+   data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
+   k10temp_get_ccd_support(pdev, data, 8);





You are a really funny guy.
After _all_ these are YOUR Company CPUs, and want me to fix these without docs?
Sure I can, but the confusion started with your wrong commit message.


Sorry for the confusion. The review comments above was merely to point 
out server parts won't be supported if 0x0..0x1 is removed. I do 
appreciate the test results and bug report. The original commit 
unfortunately doesn't work on your CPUs. It was indeed a misfire from my 
side.




Besides, is that how AMD operates now?
Let the customer pay thousands of euros for HW and then tell
him to fix or add drivers support himself? Very interesting.

And yes it matters even after removing these.

case 0x0 ... 0x1:   /* Zen3 SP3 ( NOT YET RELEASED ) */
case 0x21:  /* Zen3 Ryzen Desktop  */


Right?

Re: k10temp: ZEN3 readings are broken

2020-12-21 Thread Wei Huang





On 12/21/20 9:58 PM, Guenter Roeck wrote:

Hi,

On 12/21/20 5:45 PM, Gabriel C wrote:

Hello Guenter,

while trying to add ZEN3 support for zenpower out of tree modules, I find out
the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).

commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:

case 0x0 ... 0x1:   /* Zen3 */

however, this is wrong, we look for a model which is 0x21 for ZEN3,
these seem to
be steppings?


These are model numbers for server CPUs. I believe 0x21 is for desktop 
CPUs. In other words, current upstream code doesn't support your CPUs. 
You are welcomed to add support for 0x21, but it is wrong to remove 
support for 0x00/0x01.




Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
the model.

Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
also ) that should be:

PLANE0  (ZEN_SVI_BASE + 0x10)
PLANE1  (ZEN_SVI_BASE + 0xc)


Same problem here with model 0x71. 0x31 is for server CPUs.



Which is the same as for ZEN2 >= 0x71. Since this is not really
documented and I have some
confirmations of these numbers from *somewhere* :-) I created a demo patch only.

I would like AMD people to really have a look at the driver and
confirm the changes, since
getting information from *somewhere*,  dosen't mean they are 100%
correct. However, the driver
is working with these changes.

In any way the model needs changing to 0x21 even if we let the other
readings broken.

There is my demo patch:

https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch


For family 19h, the patch should look like. But this might not matter 
anymore as suggested by Guenter below.


 /* F19h thermal registers through SMN */
#define F19H_M01_SVI_TEL_PLANE0 (ZEN_SVI_BASE + 0x14)
#define F19H_M01_SVI_TEL_PLANE1 (ZEN_SVI_BASE + 0x10)
+/* Zen3 Ryzen */
+#define F19H_M21H_SVI_TEL_PLANE0   (ZEN_SVI_BASE + 0x10)
+#define F19H_M21H_SVI_TEL_PLANE1   (ZEN_SVI_BASE + 0xc)

Then add the following change:

switch (boot_cpu_data.x86_model) {
case 0x0 ... 0x1:   /* Zen3 */
data->show_current = true;
data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
k10temp_get_ccd_support(pdev, data, 8);
+   case 0x21:  /* Zen3 */
+   data->show_current = true;
+   data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
+   data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
+   data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
+   data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
+   k10temp_get_ccd_support(pdev, data, 8);



Also, there is some discuss and testing for both drivers:

https://github.com/ocerman/zenpower/issues/39



Thanks for the information. However, since I do not have time to actively 
maintain
the driver, since each chip variant seems to use different addresses and scales,
and since the information about voltages and currents is unpublished by AMD,
I'll remove support for voltage/current readings from the upstream driver.
I plan to send the patch doing that to Linus shortly after the commit window
closes (or even before that).


I believe Guenter is talking about 
https://www.spinics.net/lists/linux-hwmon/msg10252.html.




Thanks,
Guenter

Re: [RFC PATCH 2/4] cpufreq: acpi-cpufreq: Add processor to the ignore PSD override list

2020-12-07 Thread Wei Huang




On 12/7/20 4:30 PM, Borislav Petkov wrote:
> On Mon, Dec 07, 2020 at 04:07:52PM -0600, Wei Huang wrote:
>> I think we shouldn't override zen2 if _PSD is correct. In my opinion,
>> there are two approaches:
>>
>> * Keep override_acpi_psd()
>> Let us keep the original quirk and override_acpi_psd() function. Over
>> the time, people may want to add new CPUs to override_acpi_psd(). The
>> maintainer may declare that only CPUs >= family 17h will be fixed, to
>> avoid exploding the check-list.
>>
>> * Remove the quirk completely
>> We can completely remove commit acd316248205 ("acpi-cpufreq: Add quirk
>> to disable _PSD usage on all AMD CPUs")? I am not sure what "AMD desktop
>> boards" was referring to in the original commit message of acd316248205.
>> Maybe such machines aren't in use anymore.
> 
> * Third option: do not do anything. Why?

I am fine with this option, unless Punit can prove me that I am wrong,
(i.e. some Zen2 is broken because of acd316248205).

> 
> - Let sleeping dogs lie and leave the workaround acd316248205 for old
> machines.
> 
> - Make a clear cut that the override is not needed from Zen3 on, i.e.,
> your patch
> 
>5368512abe08 ("acpi-cpufreq: Honor _PSD table setting on new AMD CPUs")
> 
> 
> Punit's commit message reads "...indicates that the override is not
> required for Zen3 onwards, it seems that domain information can be
> trusted even on certain earlier systems."
> 
> That's not nearly a justification in my book to do this on anything < Zen3.
> 
> This way you have a clear cut, you don't need to deal with adding any
> more models to override_acpi_psd() and all is good.
> 
> Unless there's a better reason to skip the override on machines < Zen3
> but I haven't heard any so far...
> 
> Thx.
>

Re: [RFC PATCH 2/4] cpufreq: acpi-cpufreq: Add processor to the ignore PSD override list

2020-12-07 Thread Wei Huang

On 12/7/20 2:26 PM, Borislav Petkov wrote:
> On Mon, Dec 07, 2020 at 02:20:55PM -0600, Wei Huang wrote:
>> In summary, this patch is fine if Punit already verified it. My only
>> concern is the list can potentially increase over the time, and we will
>> keep coming back to fix override_acpi_psd() function.
> 
> Can the detection be done by looking at those _PSD things instead of
> comparing f/m/s?

Not I am aware of. I don't know the correlation between _PSD
configuration and CPU's f/m/s.

> 
> And, alternatively, what is this fixing?
> 
> So what if some zen2 boxes have correct _PSD objects? Why do they need
> to ignore the override?

I think we shouldn't override zen2 if _PSD is correct. In my opinion,
there are two approaches:

* Keep override_acpi_psd()
Let us keep the original quirk and override_acpi_psd() function. Over
the time, people may want to add new CPUs to override_acpi_psd(). The
maintainer may declare that only CPUs >= family 17h will be fixed, to
avoid exploding the check-list.

* Remove the quirk completely
We can completely remove commit acd316248205 ("acpi-cpufreq: Add quirk
to disable _PSD usage on all AMD CPUs")? I am not sure what "AMD desktop
boards" was referring to in the original commit message of acd316248205.
Maybe such machines aren't in use anymore.

> 
> Hmmm?
>

Re: [RFC PATCH 2/4] cpufreq: acpi-cpufreq: Add processor to the ignore PSD override list

2020-12-07 Thread Wei Huang

On 11/25/20 8:48 AM, Punit Agrawal wrote:
> Booting Linux on a Zen2 based processor (family: 0x17, model: 0x60,
> stepping: 0x1) shows the following message in the logs -
> 
> acpi_cpufreq: overriding BIOS provided _PSD data
> 
> Although commit 5368512abe08 ("acpi-cpufreq: Honor _PSD table setting
> on new AMD CPUs") indicates that the override is not required for Zen3
> onwards, it seems that domain information can be trusted even on

Given that the original quirk acd316248205 ("acpi-cpufreq: Add quirk to
disable _PSD usage on all AMD CPUs") was submitted 8 years ago, it is
not a surprise that some system firmware before family 19h might been
fixed. Unfortunately, like what Punit said, I didn't find any
documentation on the list of existing, fixed CPUs.

In my commit 5368512abe ("acpi-cpufreq: Honor _PSD table setting on new
AMD CPUs"), family 19h was picked because 1) we know BIOS will fix this
problem for this specific generation of CPUs, and 2) without this
commit, it _might_ cause issues on certain CPUs.

In summary, this patch is fine if Punit already verified it. My only
concern is the list can potentially increase over the time, and we will
keep coming back to fix override_acpi_psd() function.

> certain earlier systems. Update the check, to skip the override for
> Zen2 processors known to work without the override.
> 
> Signed-off-by: Punit Agrawal 
> Cc: Wei Huang 
> ---
>  drivers/cpufreq/acpi-cpufreq.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index b1e7df96d428..29f1cd93541e 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -198,8 +198,13 @@ static int override_acpi_psd(unsigned int cpu_id)
>   if (c->x86_vendor == X86_VENDOR_AMD) {
>   if (!check_amd_hwpstate_cpu(cpu_id))
>   return false;
> -
> - return c->x86 < 0x19;
> + /*
> +  * CPU's before Zen3 (except some Zen2) need the
> +  * override.
> +  */
> + return (c->x86 < 0x19) &&
> + !(c->x86 == 0x17 && c->x86_model == 0x60 &&
> +   c->x86_stepping == 0x1);
>   }
>  
>   return false;
>

Re: [RFC PATCH 1/4] cpufreq: acpi-cpufreq: Re-factor overriding ACPI PSD

2020-12-07 Thread Wei Huang




On 11/25/20 8:48 AM, Punit Agrawal wrote:
> Re-factor the code to override the firmware provided frequency domain
> information (via PSD) to localise the checks in one function.
> 
> No functional change intended.
> 
> Signed-off-by: Punit Agrawal 
> Cc: Wei Huang 
> ---
>  drivers/cpufreq/acpi-cpufreq.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index 1e4fbb002a31..b1e7df96d428 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -191,6 +191,20 @@ static int check_amd_hwpstate_cpu(unsigned int cpuid)
>   return cpu_has(cpu, X86_FEATURE_HW_PSTATE);
>  }
>  
> +static int override_acpi_psd(unsigned int cpu_id)
 ^
int is fine, but it might be better to use bool. Otherwise I don't see
any issues with this patch.

> +{
> + struct cpuinfo_x86 *c = _cpu_data;
> +
> + if (c->x86_vendor == X86_VENDOR_AMD) {
> + if (!check_amd_hwpstate_cpu(cpu_id))
> + return false;
> +
> + return c->x86 < 0x19;
> + }
> +
> + return false;
> +}
> +
>  static unsigned extract_io(struct cpufreq_policy *policy, u32 value)
>  {
>   struct acpi_cpufreq_data *data = policy->driver_data;
> @@ -691,8 +705,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
> *policy)
>   cpumask_copy(policy->cpus, topology_core_cpumask(cpu));
>   }
>  
> - if (check_amd_hwpstate_cpu(cpu) && boot_cpu_data.x86 < 0x19 &&
> - !acpi_pstate_strict) {
> + if (override_acpi_psd(cpu) && !acpi_pstate_strict) {
>   cpumask_clear(policy->cpus);
>   cpumask_set_cpu(cpu, policy->cpus);
>   cpumask_copy(data->freqdomain_cpus,
>

[PATCH v2 1/1] acpi-cpufreq: Honor _PSD table setting in CPU frequency control

2020-10-18 Thread Wei Huang

acpi-cpufreq has a old quirk that overrides the _PSD table supplied by
BIOS on AMD CPUs. However the _PSD table of new AMD CPUs (Family 19h+)
now accurately reports the P-state dependency of CPU cores. Hence this
quirk needs to be fixed in order to support new CPUs' frequency control.

Fixes: acd316248205 ("acpi-cpufreq: Add quirk to disable _PSD usage on all AMD 
CPUs")
Signed-off-by: Wei Huang 
---
 drivers/cpufreq/acpi-cpufreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index e4ff681f..1e4fbb002a31 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -691,7 +691,8 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
cpumask_copy(policy->cpus, topology_core_cpumask(cpu));
}
 
-   if (check_amd_hwpstate_cpu(cpu) && !acpi_pstate_strict) {
+   if (check_amd_hwpstate_cpu(cpu) && boot_cpu_data.x86 < 0x19 &&
+   !acpi_pstate_strict) {
cpumask_clear(policy->cpus);
cpumask_set_cpu(cpu, policy->cpus);
cpumask_copy(data->freqdomain_cpus,
-- 
2.24.1

Re: [PATCH 1/1] acpi-cpufreq: Honor _PSD table setting in CPU frequency control

2020-10-16 Thread Wei Huang

On 10/16 04:58, Rafael J. Wysocki wrote:
> On Wed, Oct 7, 2020 at 10:44 PM Wei Huang  wrote:
> >
> > acpi-cpufreq has a old quirk that overrides the _PSD table supplied by
> > BIOS on AMD CPUs. However the _PSD table of new AMD CPUs (Family 19h+)
> > now accurately reports the P-state dependency of CPU cores. Hence this
> > quirk needs to be fixed in order to support new CPUs' frequency control.
> >
> > Fixes: acd316248205 ("acpi-cpufreq: Add quirk to disable _PSD usage on all 
> > AMD CPUs")
> > Signed-off-by: Wei Huang 
> > ---
> >  drivers/cpufreq/acpi-cpufreq.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> > index e4ff681f..1e6e2abde428 100644
> > --- a/drivers/cpufreq/acpi-cpufreq.c
> > +++ b/drivers/cpufreq/acpi-cpufreq.c
> > @@ -691,7 +691,8 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
> > *policy)
> > cpumask_copy(policy->cpus, topology_core_cpumask(cpu));
> > }
> >
> > -   if (check_amd_hwpstate_cpu(cpu) && !acpi_pstate_strict) {
> > +   if (check_amd_hwpstate_cpu(cpu) && (c->x86 < 0x19) &&
> 
> Why don't you use boot_cpu_data instead of *c?

Thanks for your review. c->x86 contains the same level of information as 
boot_cpu_data when
acpi_cpufreq_cpu_init() starts to execute. But you are right, it is better to 
use boot_cpu_data,
consistent with the rest code in the same function.

> 
> And why don't you do the extra check in check_amd_hwpstate_cpu()?

check_amd_hwpstate_cpu() is called at various locations. This _PSD fix doesn't 
apply at
other callers.

> 
> Also the parens around it are not necessary here and is there any

I will remove it in the next rev.

> chance for having a proper symbol instead of the raw 0x19 in that
> check?

Unfortunately I didn't find a replacement. Only x86_vendor has an acronym. The 
rest
(fam/model/stepping) use numerical values, including in arch/x86 boot code.

> 
> > +   !acpi_pstate_strict) {
> > cpumask_clear(policy->cpus);
> > cpumask_set_cpu(cpu, policy->cpus);
> > cpumask_copy(data->freqdomain_cpus,
> > --
> > 2.26.2
> >

[PATCH 1/1] acpi-cpufreq: Honor _PSD table setting in CPU frequency control

2020-10-07 Thread Wei Huang

acpi-cpufreq has a old quirk that overrides the _PSD table supplied by
BIOS on AMD CPUs. However the _PSD table of new AMD CPUs (Family 19h+)
now accurately reports the P-state dependency of CPU cores. Hence this
quirk needs to be fixed in order to support new CPUs' frequency control.

Fixes: acd316248205 ("acpi-cpufreq: Add quirk to disable _PSD usage on all AMD 
CPUs")
Signed-off-by: Wei Huang 
---
 drivers/cpufreq/acpi-cpufreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index e4ff681f..1e6e2abde428 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -691,7 +691,8 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
cpumask_copy(policy->cpus, topology_core_cpumask(cpu));
}
 
-   if (check_amd_hwpstate_cpu(cpu) && !acpi_pstate_strict) {
+   if (check_amd_hwpstate_cpu(cpu) && (c->x86 < 0x19) &&
+   !acpi_pstate_strict) {
cpumask_clear(policy->cpus);
cpumask_set_cpu(cpu, policy->cpus);
cpumask_copy(data->freqdomain_cpus,
-- 
2.26.2

Re: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest

2020-09-18 Thread Wei Huang

On 09/17 03:23, Cathy Avery wrote:
> svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 
> vmcb ( nested ).
> 
> Issues:
> 
> 1) There is some wholesale copying of vmcb.save and vmcb.contol
>areas which will need to be refined.
> 
> 2) There is a workaround in nested_svm_vmexit() where
> 
>if (svm->vmcb01->control.asid == 0)
>svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> 
>This was done as a result of the kvm selftest 'state_test'. In that
>test svm_set_nested_state() is called before svm_vcpu_run().
>The asid is assigned by svm_vcpu_run -> pre_svm_run for the current
>vmcb which is now vmcb02 as we are in nested mode subsequently
>vmcb01.control.asid is never set as it should be.
> 
> Tested:
> kvm-unit-tests
> kvm self tests

I was able to run some basic nested SVM tests using this patch. Full L2 VM
(Fedora) had some problem to boot after grub loading kernel.

Comments below.

> 
> Signed-off-by: Cathy Avery 
> ---
>  arch/x86/kvm/svm/nested.c | 116 ++
>  arch/x86/kvm/svm/svm.c|  41 +++---
>  arch/x86/kvm/svm/svm.h|  10 ++--
>  3 files changed, 81 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e90bc436f584..0a06e62010d8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -75,12 +75,12 @@ static unsigned long nested_svm_get_tdp_cr3(struct 
> kvm_vcpu *vcpu)
>  static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
> - struct vmcb *hsave = svm->nested.hsave;
>  
>   WARN_ON(mmu_is_nested(vcpu));
>  
>   vcpu->arch.mmu = >arch.guest_mmu;
> - kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, hsave->save.cr4, 
> hsave->save.efer,
> + kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01->save.cr4,
> + svm->vmcb01->save.efer,
>   svm->nested.ctl.nested_cr3);
>   vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3;
>   vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr;
> @@ -105,7 +105,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
>   return;
>  
>   c = >vmcb->control;
> - h = >nested.hsave->control;
> + h = >vmcb01->control;
>   g = >nested.ctl;
>  
>   svm->nested.host_intercept_exceptions = h->intercept_exceptions;
> @@ -403,7 +403,7 @@ static void nested_prepare_vmcb_control(struct vcpu_svm 
> *svm)
>  
>   svm->vmcb->control.int_ctl =
>   (svm->nested.ctl.int_ctl & ~mask) |
> - (svm->nested.hsave->control.int_ctl & mask);
> + (svm->vmcb01->control.int_ctl & mask);
>  
>   svm->vmcb->control.virt_ext= svm->nested.ctl.virt_ext;
>   svm->vmcb->control.int_vector  = svm->nested.ctl.int_vector;
> @@ -432,6 +432,12 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 
> vmcb_gpa,
>   int ret;
>  
>   svm->nested.vmcb = vmcb_gpa;
> +
> + WARN_ON(svm->vmcb == svm->nested.vmcb02);
> +
> + svm->nested.vmcb02->control = svm->vmcb01->control;

This part is a bit confusing. svm->vmcb01->control contains the control
info from L0 hypervisor to L1 VM. Shouldn't vmcb02->control use the info
from the control info contained in nested_vmcb?

> + svm->vmcb = svm->nested.vmcb02;
> + svm->vmcb_pa = svm->nested.vmcb02_pa;
>   load_nested_vmcb_control(svm, _vmcb->control);
>   nested_prepare_vmcb_save(svm, nested_vmcb);
>   nested_prepare_vmcb_control(svm);
> @@ -450,8 +456,6 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>  {
>   int ret;
>   struct vmcb *nested_vmcb;
> - struct vmcb *hsave = svm->nested.hsave;
> - struct vmcb *vmcb = svm->vmcb;
>   struct kvm_host_map map;
>   u64 vmcb_gpa;
>  
> @@ -496,29 +500,17 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
>   kvm_clear_exception_queue(>vcpu);
>   kvm_clear_interrupt_queue(>vcpu);
>  
> - /*
> -  * Save the old vmcb, so we don't need to pick what we save, but can
> -  * restore everything when a VMEXIT occurs
> -  */
> - hsave->save.es = vmcb->save.es;
> - hsave->save.cs = vmcb->save.cs;
> - hsave->save.ss = vmcb->save.ss;
> - hsave->save.ds = vmcb->save.ds;
> - hsave->save.gdtr   = vmcb->save.gdtr;
> - hsave->save.idtr   = vmcb->save.idtr;
> - hsave->save.efer   = svm->vcpu.arch.efer;
> - hsave->save.cr0= kvm_read_cr0(>vcpu);
> - hsave->save.cr4= svm->vcpu.arch.cr4;
> - hsave->save.rflags = kvm_get_rflags(>vcpu);
> - hsave->save.rip= kvm_rip_read(>vcpu);
> - hsave->save.rsp= vmcb->save.rsp;
> - hsave->save.rax= vmcb->save.rax;
> - if (npt_enabled)
> - hsave->save.cr3= vmcb->save.cr3;
> - else
> - hsave->save.cr3= kvm_read_cr3(>vcpu);
> -
> - copy_vmcb_control_area(>control,

Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

2020-09-16 Thread Wei Huang

On 09/16 09:33, Dr. David Alan Gilbert wrote:
> * Wei Huang (wei.hua...@amd.com) wrote:
> > On 09/15 05:51, Dr. David Alan Gilbert wrote:
> > > * Vitaly Kuznetsov (vkuzn...@redhat.com) wrote:
> > > > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
> > 
> > Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions 
> > which
> > include Fn_, Fn4000_ and Fn8000_.
> 
> On my 7302P the output of:
> cpuid -1 -r | wc -l
> 
> is 61, there is one line of header in there.
> 
> However in a guest I see more; and I think that's because KVM  tends to
> list the CPUID entries for a lot of disabled Intel features, even on
> AMD, e.g. 0x11-0x1f which AMD doesn't have, are listed in a KVM guest.
> Then you add the KVM CPUIDs at 4...0 and 41.
>

It is indeed a mixing bag. Some are added even though AMD CPU doesn't define
them. BTW I also believe that cpuid command lists more CPUIDs than the real
value of cpuid->nent in kvm_vcpu_ioctl_set_cpuid(2).

Anyway I don't have objection to this patchset.

> IMHO we should be filtering those out for at least two reasons:
>   a) They're wrong
>   b) We're probably not keeping the set of visible CPUID fields the same
> when we move between host kernels, and that can't be good for
> migration.
> 
> Still, those are separate problems.
> 
> Dave
> 
> > > > KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> > > > from '40' in 2010. We can, of course, just bump it a little bit to fix
> > > > the immediate issue but the report made me wonder why we need to pre-
> > > > allocate vcpu->arch.cpuid_entries array instead of sizing it 
> > > > dynamically.
> > > > This RFC is intended to feed my curiosity.
> > > > 
> > > > Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> > > > break. I also don't have access to the system where the original issue
> > > > was reported but chances we're fixing it are very good IMO as just the
> > > > second patch alone was reported to be sufficient.
> > > > 
> > > > Reported-by: Dr. David Alan Gilbert 
> > > 
> > > Oh nice, I was just going to bump the magic number :-)
> > > 
> > > Anyway, this seems to work for me, so:
> > > 
> > > Tested-by: Dr. David Alan Gilbert 
> > > 
> > 
> > I tested on two platforms and the patches worked fine. So no objection on 
> > the
> > design.
> > 
> > Tested-by: Wei Huang 
> > 
> > > > Vitaly Kuznetsov (2):
> > > >   KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> > > >   KVM: x86: bump KVM_MAX_CPUID_ENTRIES
> > > > 
> > > >  arch/x86/include/asm/kvm_host.h |  4 +--
> > > >  arch/x86/kvm/cpuid.c| 55 -
> > > >  arch/x86/kvm/x86.c  |  1 +
> > > >  3 files changed, 43 insertions(+), 17 deletions(-)
> > > > 
> > > > -- 
> > > > 2.25.4
> > > > 
> > > -- 
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > > 
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

2020-09-15 Thread Wei Huang

On 09/15 05:51, Dr. David Alan Gilbert wrote:
> * Vitaly Kuznetsov (vkuzn...@redhat.com) wrote:
> > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for

Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions which
include Fn_, Fn4000_ and Fn8000_.

> > KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> > from '40' in 2010. We can, of course, just bump it a little bit to fix
> > the immediate issue but the report made me wonder why we need to pre-
> > allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
> > This RFC is intended to feed my curiosity.
> > 
> > Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> > break. I also don't have access to the system where the original issue
> > was reported but chances we're fixing it are very good IMO as just the
> > second patch alone was reported to be sufficient.
> > 
> > Reported-by: Dr. David Alan Gilbert 
> 
> Oh nice, I was just going to bump the magic number :-)
> 
> Anyway, this seems to work for me, so:
> 
> Tested-by: Dr. David Alan Gilbert 
> 

I tested on two platforms and the patches worked fine. So no objection on the
design.

Tested-by: Wei Huang 

> > Vitaly Kuznetsov (2):
> >   KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> >   KVM: x86: bump KVM_MAX_CPUID_ENTRIES
> > 
> >  arch/x86/include/asm/kvm_host.h |  4 +--
> >  arch/x86/kvm/cpuid.c| 55 -
> >  arch/x86/kvm/x86.c  |  1 +
> >  3 files changed, 43 insertions(+), 17 deletions(-)
> > 
> > -- 
> > 2.25.4
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

[PATCH 1/1] hwmon: (k10temp) Add support for Zen3 CPUs

2020-09-14 Thread Wei Huang

Zen3 thermal info is supported via a new PCI device ID. Also the voltage
telemetry registers and the current factors need to be defined. k10temp
driver then searches for CPU family 0x19 and configures k10temp_data
accordingly.

Signed-off-by: Wei Huang 
---
 drivers/hwmon/k10temp.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 49e8ebf8da32..a250481b5a97 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -95,6 +95,13 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
 #define F17H_M31H_CFACTOR_ICORE100 /* 1A / LSB 
*/
 #define F17H_M31H_CFACTOR_ISOC 31  /* 0.31A / LSB  */
 
+/* F19h thermal registers through SMN */
+#define F19H_M01_SVI_TEL_PLANE0(ZEN_SVI_BASE + 0x14)
+#define F19H_M01_SVI_TEL_PLANE1(ZEN_SVI_BASE + 0x10)
+
+#define F19H_M01H_CFACTOR_ICORE100 /* 1A / LSB 
*/
+#define F19H_M01H_CFACTOR_ISOC 31  /* 0.31A / LSB  */
+
 struct k10temp_data {
struct pci_dev *pdev;
void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
@@ -527,6 +534,22 @@ static int k10temp_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
k10temp_get_ccd_support(pdev, data, 8);
break;
}
+   } else if (boot_cpu_data.x86 == 0x19) {
+   data->temp_adjust_mask = ZEN_CUR_TEMP_RANGE_SEL_MASK;
+   data->read_tempreg = read_tempreg_nb_zen;
+   data->show_temp |= BIT(TDIE_BIT);
+   data->is_zen = true;
+
+   switch (boot_cpu_data.x86_model) {
+   case 0x0 ... 0x1:   /* Zen3 */
+   data->show_current = true;
+   data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
+   data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
+   data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
+   data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
+   k10temp_get_ccd_support(pdev, data, 8);
+   break;
+   }
} else {
data->read_htcreg = read_htcreg_pci;
data->read_tempreg = read_tempreg_pci;
@@ -564,6 +587,7 @@ static const struct pci_device_id k10temp_id_table[] = {
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M30H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
+   { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
{ PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
{}
 };
-- 
2.24.1

[PATCH 1/2] hwmon: (k10temp) Create common functions and macros for Zen CPU families

2020-08-26 Thread Wei Huang

Many SMN thermal registers in Zen CPU families are common across different
generations. For long-term code maintenance, it is better to rename these
macro and function names to Zen.

Signed-off-by: Wei Huang 
---
 drivers/hwmon/k10temp.c | 56 +
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 8f12995ec133..f3addb97b021 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -73,22 +73,24 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
 #define F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET0xd8200c64
 #define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET0xd8200ca4
 
-/* F17h M01h Access througn SMN */
-#define F17H_M01H_REPORTED_TEMP_CTRL_OFFSET0x00059800
+/* Common for Zen CPU families (Family 17h and 18h) */
+#define ZEN_REPORTED_TEMP_CTRL_OFFSET  0x00059800
 
-#define F17H_M70H_CCD_TEMP(x)  (0x00059954 + ((x) * 4))
-#define F17H_M70H_CCD_TEMP_VALID   BIT(11)
-#define F17H_M70H_CCD_TEMP_MASKGENMASK(10, 0)
+#define ZEN_CCD_TEMP(x)(0x00059954 + ((x) * 4))
+#define ZEN_CCD_TEMP_VALID BIT(11)
+#define ZEN_CCD_TEMP_MASK  GENMASK(10, 0)
 
-#define F17H_M01H_SVI  0x0005A000
-#define F17H_M01H_SVI_TEL_PLANE0   (F17H_M01H_SVI + 0xc)
-#define F17H_M01H_SVI_TEL_PLANE1   (F17H_M01H_SVI + 0x10)
+#define ZEN_CUR_TEMP_SHIFT 21
+#define ZEN_CUR_TEMP_RANGE_SEL_MASKBIT(19)
 
-#define CUR_TEMP_SHIFT 21
-#define CUR_TEMP_RANGE_SEL_MASKBIT(19)
+#define ZEN_SVI_BASE   0x0005A000
 
-#define CFACTOR_ICORE  100 /* 1A / LSB */
-#define CFACTOR_ISOC   25  /* 0.25A / LSB  */
+/* F17h thermal registers through SMN */
+#define F17H_M01H_SVI_TEL_PLANE0   (ZEN_SVI_BASE + 0xc)
+#define F17H_M01H_SVI_TEL_PLANE1   (ZEN_SVI_BASE + 0x10)
+
+#define F17H_CFACTOR_ICORE 100 /* 1A / LSB */
+#define F17H_CFACTOR_ISOC  25  /* 0.25A / LSB  */
 
 struct k10temp_data {
struct pci_dev *pdev;
@@ -168,10 +170,10 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 
*regval)
  F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
 }
 
-static void read_tempreg_nb_f17(struct pci_dev *pdev, u32 *regval)
+static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
 {
amd_smn_read(amd_pci_dev_to_node_id(pdev),
-F17H_M01H_REPORTED_TEMP_CTRL_OFFSET, regval);
+ZEN_REPORTED_TEMP_CTRL_OFFSET, regval);
 }
 
 static long get_raw_temp(struct k10temp_data *data)
@@ -180,7 +182,7 @@ static long get_raw_temp(struct k10temp_data *data)
long temp;
 
data->read_tempreg(data->pdev, );
-   temp = (regval >> CUR_TEMP_SHIFT) * 125;
+   temp = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
if (regval & data->temp_adjust_mask)
temp -= 49000;
return temp;
@@ -288,8 +290,8 @@ static int k10temp_read_temp(struct device *dev, u32 attr, 
int channel,
break;
case 2 ... 9:   /* Tccd{1-8} */
amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
-F17H_M70H_CCD_TEMP(channel - 2), );
-   *val = (regval & F17H_M70H_CCD_TEMP_MASK) * 125 - 49000;
+ZEN_CCD_TEMP(channel - 2), );
+   *val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
break;
default:
return -EOPNOTSUPP;
@@ -438,7 +440,7 @@ static int svi_show(struct seq_file *s, void *unused)
 {
struct k10temp_data *data = s->private;
 
-   k10temp_smn_regs_show(s, data->pdev, F17H_M01H_SVI, 32);
+   k10temp_smn_regs_show(s, data->pdev, ZEN_SVI_BASE, 32);
return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(svi);
@@ -448,7 +450,7 @@ static int thm_show(struct seq_file *s, void *unused)
struct k10temp_data *data = s->private;
 
k10temp_smn_regs_show(s, data->pdev,
- F17H_M01H_REPORTED_TEMP_CTRL_OFFSET, 256);
+ ZEN_REPORTED_TEMP_CTRL_OFFSET, 256);
return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(thm);
@@ -528,8 +530,8 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
 
for (i = 0; i < limit; i++) {
amd_smn_read(amd_pci_dev_to_node_id(pdev),
-F17H_M70H_CCD_TEMP(i), );
-   if (regval & F17H_M70H_CCD_TEMP_VALID)
+ZEN_CCD_TEMP(i), );
+   if (regval & ZEN_CCD_TEMP_VALID)
data->show_temp

[PATCH 2/2] hwmon: (k10temp) Define SVI telemetry and current factors for Zen2 CPUs

2020-08-26 Thread Wei Huang

The voltage telemetry registers for Zen2 are different from Zen1. Also
the factors of CPU current values are changed on Zen2. Add new definitions
for these register.

Signed-off-by: Wei Huang 
---
 drivers/hwmon/k10temp.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index f3addb97b021..de9f68570a4f 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -88,9 +88,13 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
 /* F17h thermal registers through SMN */
 #define F17H_M01H_SVI_TEL_PLANE0   (ZEN_SVI_BASE + 0xc)
 #define F17H_M01H_SVI_TEL_PLANE1   (ZEN_SVI_BASE + 0x10)
+#define F17H_M31H_SVI_TEL_PLANE0   (ZEN_SVI_BASE + 0x14)
+#define F17H_M31H_SVI_TEL_PLANE1   (ZEN_SVI_BASE + 0x10)
 
-#define F17H_CFACTOR_ICORE 100 /* 1A / LSB */
-#define F17H_CFACTOR_ISOC  25  /* 0.25A / LSB  */
+#define F17H_M01H_CFACTOR_ICORE100 /* 1A / LSB 
*/
+#define F17H_M01H_CFACTOR_ISOC 25  /* 0.25A / LSB  */
+#define F17H_M31H_CFACTOR_ICORE100 /* 1A / LSB 
*/
+#define F17H_M31H_CFACTOR_ISOC 31  /* 0.31A / LSB  */
 
 struct k10temp_data {
struct pci_dev *pdev;
@@ -580,17 +584,17 @@ static int k10temp_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
data->show_current = !is_threadripper() && !is_epyc();
data->svi_addr[0] = F17H_M01H_SVI_TEL_PLANE0;
data->svi_addr[1] = F17H_M01H_SVI_TEL_PLANE1;
-   data->cfactor[0] = F17H_CFACTOR_ICORE;
-   data->cfactor[1] = F17H_CFACTOR_ISOC;
+   data->cfactor[0] = F17H_M01H_CFACTOR_ICORE;
+   data->cfactor[1] = F17H_M01H_CFACTOR_ISOC;
k10temp_get_ccd_support(pdev, data, 4);
break;
case 0x31:  /* Zen2 Threadripper */
case 0x71:  /* Zen2 */
data->show_current = !is_threadripper() && !is_epyc();
-   data->cfactor[0] = F17H_CFACTOR_ICORE;
-   data->cfactor[1] = F17H_CFACTOR_ISOC;
-   data->svi_addr[0] = F17H_M01H_SVI_TEL_PLANE1;
-   data->svi_addr[1] = F17H_M01H_SVI_TEL_PLANE0;
+   data->cfactor[0] = F17H_M31H_CFACTOR_ICORE;
+   data->cfactor[1] = F17H_M31H_CFACTOR_ISOC;
+   data->svi_addr[0] = F17H_M31H_SVI_TEL_PLANE0;
+   data->svi_addr[1] = F17H_M31H_SVI_TEL_PLANE1;
k10temp_get_ccd_support(pdev, data, 8);
break;
}
-- 
2.25.2

[PATCH] platform/x86: acerhdf: replace space by * in modalias

2020-05-06 Thread Chih-Wei Huang

Using space in module alias makes it harder to parse modules.alias.
Replace it by a star(*).

Reviewed-by: Peter Kästle 
Signed-off-by: Chih-Wei Huang 
---
 drivers/platform/x86/acerhdf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 505224225378..306ea92d5b10 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -837,7 +837,7 @@ MODULE_ALIAS("dmi:*:*Packard*Bell*:pnDOTMU*:");
 MODULE_ALIAS("dmi:*:*Packard*Bell*:pnENBFT*:");
 MODULE_ALIAS("dmi:*:*Packard*Bell*:pnDOTMA*:");
 MODULE_ALIAS("dmi:*:*Packard*Bell*:pnDOTVR46*:");
-MODULE_ALIAS("dmi:*:*Acer*:pnExtensa 5420*:");
+MODULE_ALIAS("dmi:*:*Acer*:pnExtensa*5420*:");
 
 module_init(acerhdf_init);
 module_exit(acerhdf_exit);
-- 
2.21.1

Re: [PATCH] [v3] kvm: x86: support APERF/MPERF registers

2020-05-01 Thread Wei Huang

On 04/30 06:45, Li RongQing wrote:
> Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
> this is confused to user when turbo is enable, and aperf/mperf
> can be used to show current cpu frequency after 7d5905dc14a
> "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
> so guest should support aperf/mperf capability
> 
> this patch implements aperf/mperf by three mode: none, software
  
  This

> emulation, and pass-through
> 
> none: default mode, guest does not support aperf/mperf
> 
> software emulation: the period of aperf/mperf in guest mode are
> accumulated as emulated value
> 
> pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
> that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
> no over-commit.

If we save/restore the values of aperf/mperf properly during vcpu migration
among different cores, is pinning still required?

> 
> and a per-VM capability is added to configure aperfmperf mode
> 
> Signed-off-by: Li RongQing 
> Signed-off-by: Chai Wen 
> Signed-off-by: Jia Lina 
> ---
> diff v2:
> support aperfmperf pass though
> move common codes to kvm_get_msr_common
> 
> diff v1:
> 1. support AMD, but not test

pt-mode doesn't work doesn't work on AMD. See below.

> 2. support per-vm capability to enable
>  Documentation/virt/kvm/api.rst  | 10 ++
>  arch/x86/include/asm/kvm_host.h | 11 +++
>  arch/x86/kvm/cpuid.c| 13 -
>  arch/x86/kvm/svm.c  |  8 
>  arch/x86/kvm/vmx/vmx.c  |  6 ++
>  arch/x86/kvm/x86.c  | 42 
> +
>  arch/x86/kvm/x86.h  | 15 +++
>  include/uapi/linux/kvm.h|  1 +
>  8 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index efbbe570aa9b..c3be3b6a1717 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6109,3 +6109,13 @@ KVM can therefore start protected VMs.
>  This capability governs the KVM_S390_PV_COMMAND ioctl and the
>  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>  guests when the state change is invalid.
> +
> +8.23 KVM_CAP_APERFMPERF
> +
> +
> +:Architectures: x86
> +:Parameters: args[0] is aperfmperf mode;
> + 0 for not support, 1 for software emulation, 2 for pass-through
> +:Returns: 0 on success; -1 on error
> +
> +This capability indicates that KVM supports APERF and MPERF MSR registers
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 42a2d0d3984a..81477f676f60 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -820,6 +820,9 @@ struct kvm_vcpu_arch {
>  
>   /* AMD MSRC001_0015 Hardware Configuration */
>   u64 msr_hwcr;
> +
> + u64 v_mperf;
> + u64 v_aperf;
>  };
>  
>  struct kvm_lpage_info {
> @@ -885,6 +888,12 @@ enum kvm_irqchip_mode {
>   KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */
>  };
>  
> +enum kvm_aperfmperf_mode {
> + KVM_APERFMPERF_NONE,
> + KVM_APERFMPERF_SOFT,  /* software emulate aperfmperf */
> + KVM_APERFMPERF_PT,/* pass-through aperfmperf to guest */
> +};
> +
>  #define APICV_INHIBIT_REASON_DISABLE0
>  #define APICV_INHIBIT_REASON_HYPERV 1
>  #define APICV_INHIBIT_REASON_NESTED 2
> @@ -982,6 +991,8 @@ struct kvm_arch {
>  
>   struct kvm_pmu_event_filter *pmu_event_filter;
>   struct task_struct *nx_lpage_recovery_thread;
> +
> + enum kvm_aperfmperf_mode aperfmperf_mode;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 901cd1fdecd9..7a64ea2c3eef 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -124,6 +124,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>  MSR_IA32_MISC_ENABLE_MWAIT);
>   }
>  
> + best = kvm_find_cpuid_entry(vcpu, 6, 0);
> + if (best) {
> + if (guest_has_aperfmperf(vcpu->kvm) &&
> + boot_cpu_has(X86_FEATURE_APERFMPERF))
> + best->ecx |= 1;
> + else
> + best->ecx &= ~1;
> + }
>   /* Update physical-address width */
>   vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>   kvm_mmu_reset_context(vcpu);
> @@ -558,7 +566,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
> *array, u32 function)
>   case 6: /* Thermal management */
>   entry->eax = 0x4; /* allow ARAT */
>   entry->ebx = 0;
> - entry->ecx = 0;
> + if (boot_cpu_has(X86_FEATURE_APERFMPERF))
> + entry->ecx = 0x1;
> + else
> + entry->ecx = 0x0;
>   entry->edx = 0;
>   break;
>   /* function 7 has additional index. */
> diff --git

Re: [PATCH] x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting

2019-02-06 Thread Wei Huang




On 2/6/19 5:52 AM, Kirill A. Shutemov wrote:
> RDMSR in the trampoline code overrides EDX, but we use the register to
> indicate if 5-level paging has to enabled. It leads to failure to boot
> on a 5-level paging machine.
> 
> Preserve EDX on the stack while we are dealing with EFER.
> 
> Signed-off-by: Kirill A. Shutemov 
> Fixes: b677dfae5aa1 ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit 
> trampoline before returning to long mode")
> Reported-by: Kyle D Pelton 
> Cc: Wei Huang 
> ---
>  arch/x86/boot/compressed/head_64.S | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S 
> b/arch/x86/boot/compressed/head_64.S
> index f105ae8651c9..f62e347862cc 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -602,10 +602,12 @@ ENTRY(trampoline_32bit_src)
>  3:
>   /* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
>   pushl   %ecx
> + pushl   %edx
>   movl$MSR_EFER, %ecx
>   rdmsr
>   btsl$_EFER_LME, %eax
>   wrmsr
> + popl%edx
>   popl%ecx
>  
>   /* Enable PAE and LA57 (if required) paging modes */
> 

Oops, rdmsr indeed corrupts EDX.

Acked-by: Wei Huang

[tip:x86/urgent] x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode

2019-01-29 Thread tip-bot for Wei Huang

Commit-ID:  b677dfae5aa197afc5191755a76a8727ffca538a
Gitweb: https://git.kernel.org/tip/b677dfae5aa197afc5191755a76a8727ffca538a
Author: Wei Huang 
AuthorDate: Thu, 3 Jan 2019 23:44:11 -0600
Committer:  Thomas Gleixner 
CommitDate: Tue, 29 Jan 2019 21:58:59 +0100

x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to 
long mode

In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0.  As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.

Signed-off-by: Wei Huang 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Cc: b...@alien8.de
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-...@redhat.com

---
 arch/x86/boot/compressed/head_64.S | 8 
 arch/x86/boot/compressed/pgtable.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 64037895b085..f105ae8651c9 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -600,6 +600,14 @@ ENTRY(trampoline_32bit_src)
lealTRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
movl%eax, %cr3
 3:
+   /* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
+   pushl   %ecx
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_LME, %eax
+   wrmsr
+   popl%ecx
+
/* Enable PAE and LA57 (if required) paging modes */
movl$X86_CR4_PAE, %eax
cmpl$0, %edx
diff --git a/arch/x86/boot/compressed/pgtable.h 
b/arch/x86/boot/compressed/pgtable.h
index 91f75638f6e6..6ff7e81b5628 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,7 +6,7 @@
 #define TRAMPOLINE_32BIT_PGTABLE_OFFSET0
 
 #define TRAMPOLINE_32BIT_CODE_OFFSET   PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE 0x60
+#define TRAMPOLINE_32BIT_CODE_SIZE 0x70
 
 #define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE

Re: [PATCH 1/1] x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline code before returning to long mode

2019-01-07 Thread Wei Huang




On 1/7/19 3:53 PM, Benjamin Gilbert wrote:
> On Mon, Jan 07, 2019 at 02:03:15PM -0600, Wei Huang wrote:
>> On 1/7/19 2:25 AM, Kirill A. Shutemov wrote:
>>> On Fri, Jan 04, 2019 at 05:44:11AM +, Wei Huang wrote:
>>>> In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
>>>> when the hypervsior detects guest sets CR0.PG to 0. This causes guest OS
>>>> to reboot when it tries to return from 32-bit trampoline code because CPU
>>>> is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but EFER.LME=0.
>>>> As a precaution, this patch sets EFER.LME=1 as part of long mode
>>>> activation procedure. This extra step won't cause any harm when Linux is
>>>> booting on bare-metal machine.
>>>>
>>>> Signed-off-by: Wei Huang 
>>>
>>> Thanks for tracking this down.
>>
>> BTW I think this patch _might_ be related the recent reboot issue
>> reported in https://lkml.org/lkml/2018/7/1/836 since the symptoms are
>> exactly the same.
> 
> The problem in that case turned out to be https://lkml.org/lkml/2018/7/4/723
> which was fixed by d503ac531a.

OK, then it is a different problem. For this specific patch, without it,
the latest kernel can't boot on RHEL6 (and other old KVM distros) as a
guest VM on AMD box.

> 
> --Benjamin Gilbert
>

Re: [PATCH 1/1] x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline code before returning to long mode

2019-01-07 Thread Wei Huang

[adding lkml and linux-x86_64]

On 1/7/19 2:25 AM, Kirill A. Shutemov wrote:
> On Fri, Jan 04, 2019 at 05:44:11AM +0000, Wei Huang wrote:
>> In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
>> when the hypervsior detects guest sets CR0.PG to 0. This causes guest OS
>> to reboot when it tries to return from 32-bit trampoline code because CPU
>> is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but EFER.LME=0.
>> As a precaution, this patch sets EFER.LME=1 as part of long mode
>> activation procedure. This extra step won't cause any harm when Linux is
>> booting on bare-metal machine.
>>
>> Signed-off-by: Wei Huang 
> 
> Thanks for tracking this down.

BTW I think this patch _might_ be related the recent reboot issue
reported in https://lkml.org/lkml/2018/7/1/836 since the symptoms are
exactly the same.

> 
> Acked-by: Kirill A. Shutemov 
> Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during 
> decompression stage")
>

[PATCH 1/1] kvm: selftests: add cr4_cpuid_sync_test

2018-06-25 Thread Wei Huang

KVM is supposed to update some guest VM's CPUID bits (e.g. OSXSAVE) when
CR4 is changed. A bug was found in KVM recently and it was fixed by
Commit c4d2188206ba ("KVM: x86: Update cpuid properly when CR4.OSXAVE or
CR4.PKE is changed"). This patch adds a test to verify the synchronization
between guest VM's CR4 and CPUID bits.

Signed-off-by: Wei Huang 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 tools/testing/selftests/kvm/cr4_cpuid_sync_test.c | 129 ++
 2 files changed, 130 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/cr4_cpuid_sync_test.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index d9d0031..65bda4f 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -9,6 +9,7 @@ LIBKVM_x86_64 = lib/x86.c lib/vmx.c
 TEST_GEN_PROGS_x86_64 = set_sregs_test
 TEST_GEN_PROGS_x86_64 += sync_regs_test
 TEST_GEN_PROGS_x86_64 += vmx_tsc_adjust_test
+TEST_GEN_PROGS_x86_64 += cr4_cpuid_sync_test
 
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c 
b/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c
new file mode 100644
index 000..dbbaf3c
--- /dev/null
+++ b/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CR4 and CPUID sync test
+ *
+ * Copyright 2018, Red Hat, Inc. and/or its affiliates.
+ *
+ * Author:
+ *   Wei Huang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "x86.h"
+
+#define X86_FEATURE_XSAVE  (1<<26)
+#define X86_FEATURE_OSXSAVE(1<<27)
+#define VCPU_ID1
+
+enum {
+   GUEST_UPDATE_CR4 = 0x1000,
+   GUEST_FAILED,
+   GUEST_DONE,
+};
+
+static void exit_to_hv(uint16_t port)
+{
+   __asm__ __volatile__("in %[port], %%al"
+:
+: [port]"d"(port)
+: "rax");
+}
+
+static inline bool cr4_cpuid_is_sync(void)
+{
+   int func, subfunc;
+   uint32_t eax, ebx, ecx, edx;
+   uint64_t cr4;
+
+   func = 0x1;
+   subfunc = 0x0;
+   __asm__ __volatile__("cpuid"
+: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
+: "a"(func), "c"(subfunc));
+
+   cr4 = get_cr4();
+
+   return (!!(ecx & X86_FEATURE_OSXSAVE)) == (!!(cr4 & X86_CR4_OSXSAVE));
+}
+
+static void guest_code(void)
+{
+   uint64_t cr4;
+
+   /* turn on CR4.OSXSAVE */
+   cr4 = get_cr4();
+   cr4 |= X86_CR4_OSXSAVE;
+   set_cr4(cr4);
+
+   /* verify CR4.OSXSAVE == CPUID.OSXSAVE */
+   if (!cr4_cpuid_is_sync())
+   exit_to_hv(GUEST_FAILED);
+
+   /* notify hypervisor to change CR4 */
+   exit_to_hv(GUEST_UPDATE_CR4);
+
+   /* check again */
+   if (!cr4_cpuid_is_sync())
+   exit_to_hv(GUEST_FAILED);
+
+   exit_to_hv(GUEST_DONE);
+}
+
+int main(int argc, char *argv[])
+{
+   struct kvm_run *run;
+   struct kvm_vm *vm;
+   struct kvm_sregs sregs;
+   struct kvm_cpuid_entry2 *entry;
+   int rc;
+
+   entry = kvm_get_supported_cpuid_entry(1);
+   if (!(entry->ecx & X86_FEATURE_XSAVE)) {
+   printf("XSAVE feature not supported, skipping test\n");
+   return 0;
+   }
+
+   /* Tell stdout not to buffer its content */
+   setbuf(stdout, NULL);
+
+   /* Create VM */
+   vm = vm_create_default_vmx(VCPU_ID, guest_code);
+   vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+   run = vcpu_state(vm, VCPU_ID);
+
+   while (1) {
+   rc = _vcpu_run(vm, VCPU_ID);
+
+   if (run->exit_reason == KVM_EXIT_IO) {
+   switch (run->io.port) {
+   case GUEST_UPDATE_CR4:
+   /* emulate hypervisor clearing CR4.OSXSAVE */
+   vcpu_sregs_get(vm, VCPU_ID, );
+   sregs.cr4 &= ~X86_CR4_OSXSAVE;
+   vcpu_sregs_set(vm, VCPU_ID, );
+   break;
+   case GUEST_FAILED:
+   TEST_ASSERT(false, "Guest CR4 bit (OSXSAVE) 
unsynchronized with CPUID bit.");
+   break;
+   case GUEST_DONE:
+   goto done;
+   default:
+   TEST_ASSERT(false, "Unknown port 0x%x.",
+   run->io.port);
+   }
+   }
+   }
+
+   kvm_vm_free(vm);
+
+done:
+   return 0;
+}
-- 
1.8.3.1

[PATCH 1/1] kvm: selftests: add cr4_cpuid_sync_test

2018-06-25 Thread Wei Huang

KVM is supposed to update some guest VM's CPUID bits (e.g. OSXSAVE) when
CR4 is changed. A bug was found in KVM recently and it was fixed by
Commit c4d2188206ba ("KVM: x86: Update cpuid properly when CR4.OSXAVE or
CR4.PKE is changed"). This patch adds a test to verify the synchronization
between guest VM's CR4 and CPUID bits.

Signed-off-by: Wei Huang 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 tools/testing/selftests/kvm/cr4_cpuid_sync_test.c | 129 ++
 2 files changed, 130 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/cr4_cpuid_sync_test.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index d9d0031..65bda4f 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -9,6 +9,7 @@ LIBKVM_x86_64 = lib/x86.c lib/vmx.c
 TEST_GEN_PROGS_x86_64 = set_sregs_test
 TEST_GEN_PROGS_x86_64 += sync_regs_test
 TEST_GEN_PROGS_x86_64 += vmx_tsc_adjust_test
+TEST_GEN_PROGS_x86_64 += cr4_cpuid_sync_test
 
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c 
b/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c
new file mode 100644
index 000..dbbaf3c
--- /dev/null
+++ b/tools/testing/selftests/kvm/cr4_cpuid_sync_test.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CR4 and CPUID sync test
+ *
+ * Copyright 2018, Red Hat, Inc. and/or its affiliates.
+ *
+ * Author:
+ *   Wei Huang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "x86.h"
+
+#define X86_FEATURE_XSAVE  (1<<26)
+#define X86_FEATURE_OSXSAVE(1<<27)
+#define VCPU_ID1
+
+enum {
+   GUEST_UPDATE_CR4 = 0x1000,
+   GUEST_FAILED,
+   GUEST_DONE,
+};
+
+static void exit_to_hv(uint16_t port)
+{
+   __asm__ __volatile__("in %[port], %%al"
+:
+: [port]"d"(port)
+: "rax");
+}
+
+static inline bool cr4_cpuid_is_sync(void)
+{
+   int func, subfunc;
+   uint32_t eax, ebx, ecx, edx;
+   uint64_t cr4;
+
+   func = 0x1;
+   subfunc = 0x0;
+   __asm__ __volatile__("cpuid"
+: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
+: "a"(func), "c"(subfunc));
+
+   cr4 = get_cr4();
+
+   return (!!(ecx & X86_FEATURE_OSXSAVE)) == (!!(cr4 & X86_CR4_OSXSAVE));
+}
+
+static void guest_code(void)
+{
+   uint64_t cr4;
+
+   /* turn on CR4.OSXSAVE */
+   cr4 = get_cr4();
+   cr4 |= X86_CR4_OSXSAVE;
+   set_cr4(cr4);
+
+   /* verify CR4.OSXSAVE == CPUID.OSXSAVE */
+   if (!cr4_cpuid_is_sync())
+   exit_to_hv(GUEST_FAILED);
+
+   /* notify hypervisor to change CR4 */
+   exit_to_hv(GUEST_UPDATE_CR4);
+
+   /* check again */
+   if (!cr4_cpuid_is_sync())
+   exit_to_hv(GUEST_FAILED);
+
+   exit_to_hv(GUEST_DONE);
+}
+
+int main(int argc, char *argv[])
+{
+   struct kvm_run *run;
+   struct kvm_vm *vm;
+   struct kvm_sregs sregs;
+   struct kvm_cpuid_entry2 *entry;
+   int rc;
+
+   entry = kvm_get_supported_cpuid_entry(1);
+   if (!(entry->ecx & X86_FEATURE_XSAVE)) {
+   printf("XSAVE feature not supported, skipping test\n");
+   return 0;
+   }
+
+   /* Tell stdout not to buffer its content */
+   setbuf(stdout, NULL);
+
+   /* Create VM */
+   vm = vm_create_default_vmx(VCPU_ID, guest_code);
+   vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+   run = vcpu_state(vm, VCPU_ID);
+
+   while (1) {
+   rc = _vcpu_run(vm, VCPU_ID);
+
+   if (run->exit_reason == KVM_EXIT_IO) {
+   switch (run->io.port) {
+   case GUEST_UPDATE_CR4:
+   /* emulate hypervisor clearing CR4.OSXSAVE */
+   vcpu_sregs_get(vm, VCPU_ID, );
+   sregs.cr4 &= ~X86_CR4_OSXSAVE;
+   vcpu_sregs_set(vm, VCPU_ID, );
+   break;
+   case GUEST_FAILED:
+   TEST_ASSERT(false, "Guest CR4 bit (OSXSAVE) 
unsynchronized with CPUID bit.");
+   break;
+   case GUEST_DONE:
+   goto done;
+   default:
+   TEST_ASSERT(false, "Unknown port 0x%x.",
+   run->io.port);
+   }
+   }
+   }
+
+   kvm_vm_free(vm);
+
+done:
+   return 0;
+}
-- 
1.8.3.1

Re: [PATCH 1/1] drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off

2017-05-25 Thread Wei Huang



On 05/25/2017 10:28 AM, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, May 24, 2017 at 09:36:41AM -0500, Wei Huang wrote:
>> We saw perf IRQ init failures when running Linux kernel in an ACPI
>> guest without PMU (i.e. pmu=off). This is because perf IRQ is not
>> present when pmu=off, but arm_pmu_acpi still tries to register
>> or unregister GSI. This patch addresses the problem by checking
>> gicc->performance_interrupt. If it is 0, which is the value set
>> by qemu when pmu=off, we skip the IRQ register/unregister process.
>>
>> [4.069470] bc00: 00040b00 089db190
>> [4.070267] [] enable_percpu_irq+0xdc/0xe4
>> [4.071192] [] arm_perf_starting_cpu+0x108/0x10c
>> [4.072200] [] cpuhp_invoke_callback+0x14c/0x4ac
>> [4.073210] [] cpuhp_thread_fun+0xd4/0x11c
>> [4.074132] [] smpboot_thread_fn+0x1b4/0x1c4
>> [4.075081] [] kthread+0x10c/0x138
>> [4.075921] [] ret_from_fork+0x10/0x50
>> [    4.076947] genirq: Setting trigger mode 4 for irq 43 failed
>> (gic_set_type+0x0/0x74)
>>
>> Signed-off-by: Wei Huang <w...@redhat.com>
>> ---
>>  drivers/perf/arm_pmu_acpi.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
>> index 34c862f..d6bb75d 100644
>> --- a/drivers/perf/arm_pmu_acpi.c
>> +++ b/drivers/perf/arm_pmu_acpi.c
>> @@ -29,6 +29,9 @@ static int arm_pmu_acpi_register_irq(int cpu)
>>  return -EINVAL;
>>  
>>  gsi = gicc->performance_interrupt;
>> +if (!gsi)
>> +return 0;
> 
> So a GSI of zero means we return an IRQ of zero, which correctly gets
> treated as "No ACPI PMU"...

Yes, returning 0 here and dmesg prints out "No ACPI PMU IRQ for CPU" is
acceptable.

> 
>>  if (gicc->flags & ACPI_MADT_PERFORMANCE_IRQ_MODE)
>>  trigger = ACPI_EDGE_SENSITIVE;
>>  else
>> @@ -58,7 +61,8 @@ static void arm_pmu_acpi_unregister_irq(int cpu)
>>  return;
>>  
>>  gsi = gicc->performance_interrupt;
>> -acpi_unregister_gsi(gsi);
>> +if (gsi)
>> +acpi_unregister_gsi(gsi);
> 
> ... but then I don't see how we can get here, so I'll drop this hunk.

I am OK to drop it. It was added just to be cautious... Do you need
another version from me or you will remove this hunk?

> 
> Will
>

Re: [PATCH 1/1] drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off

2017-05-25 Thread Wei Huang



On 05/25/2017 10:28 AM, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, May 24, 2017 at 09:36:41AM -0500, Wei Huang wrote:
>> We saw perf IRQ init failures when running Linux kernel in an ACPI
>> guest without PMU (i.e. pmu=off). This is because perf IRQ is not
>> present when pmu=off, but arm_pmu_acpi still tries to register
>> or unregister GSI. This patch addresses the problem by checking
>> gicc->performance_interrupt. If it is 0, which is the value set
>> by qemu when pmu=off, we skip the IRQ register/unregister process.
>>
>> [4.069470] bc00: 00040b00 089db190
>> [4.070267] [] enable_percpu_irq+0xdc/0xe4
>> [4.071192] [] arm_perf_starting_cpu+0x108/0x10c
>> [4.072200] [] cpuhp_invoke_callback+0x14c/0x4ac
>> [4.073210] [] cpuhp_thread_fun+0xd4/0x11c
>> [4.074132] [] smpboot_thread_fn+0x1b4/0x1c4
>> [4.075081] [] kthread+0x10c/0x138
>> [4.075921] [] ret_from_fork+0x10/0x50
>> [    4.076947] genirq: Setting trigger mode 4 for irq 43 failed
>> (gic_set_type+0x0/0x74)
>>
>> Signed-off-by: Wei Huang 
>> ---
>>  drivers/perf/arm_pmu_acpi.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
>> index 34c862f..d6bb75d 100644
>> --- a/drivers/perf/arm_pmu_acpi.c
>> +++ b/drivers/perf/arm_pmu_acpi.c
>> @@ -29,6 +29,9 @@ static int arm_pmu_acpi_register_irq(int cpu)
>>  return -EINVAL;
>>  
>>  gsi = gicc->performance_interrupt;
>> +if (!gsi)
>> +return 0;
> 
> So a GSI of zero means we return an IRQ of zero, which correctly gets
> treated as "No ACPI PMU"...

Yes, returning 0 here and dmesg prints out "No ACPI PMU IRQ for CPU" is
acceptable.

> 
>>  if (gicc->flags & ACPI_MADT_PERFORMANCE_IRQ_MODE)
>>  trigger = ACPI_EDGE_SENSITIVE;
>>  else
>> @@ -58,7 +61,8 @@ static void arm_pmu_acpi_unregister_irq(int cpu)
>>  return;
>>  
>>  gsi = gicc->performance_interrupt;
>> -acpi_unregister_gsi(gsi);
>> +if (gsi)
>> +acpi_unregister_gsi(gsi);
> 
> ... but then I don't see how we can get here, so I'll drop this hunk.

I am OK to drop it. It was added just to be cautious... Do you need
another version from me or you will remove this hunk?

> 
> Will
>

[PATCH 1/1] drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off

2017-05-24 Thread Wei Huang

We saw perf IRQ init failures when running Linux kernel in an ACPI
guest without PMU (i.e. pmu=off). This is because perf IRQ is not
present when pmu=off, but arm_pmu_acpi still tries to register
or unregister GSI. This patch addresses the problem by checking
gicc->performance_interrupt. If it is 0, which is the value set
by qemu when pmu=off, we skip the IRQ register/unregister process.

[4.069470] bc00: 00040b00 089db190
[4.070267] [] enable_percpu_irq+0xdc/0xe4
[4.071192] [] arm_perf_starting_cpu+0x108/0x10c
[4.072200] [] cpuhp_invoke_callback+0x14c/0x4ac
[4.073210] [] cpuhp_thread_fun+0xd4/0x11c
[4.074132] [] smpboot_thread_fn+0x1b4/0x1c4
[4.075081] [] kthread+0x10c/0x138
[4.075921] [] ret_from_fork+0x10/0x50
[4.076947] genirq: Setting trigger mode 4 for irq 43 failed
(gic_set_type+0x0/0x74)

Signed-off-by: Wei Huang <w...@redhat.com>
---
 drivers/perf/arm_pmu_acpi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
index 34c862f..d6bb75d 100644
--- a/drivers/perf/arm_pmu_acpi.c
+++ b/drivers/perf/arm_pmu_acpi.c
@@ -29,6 +29,9 @@ static int arm_pmu_acpi_register_irq(int cpu)
return -EINVAL;
 
gsi = gicc->performance_interrupt;
+   if (!gsi)
+   return 0;
+
if (gicc->flags & ACPI_MADT_PERFORMANCE_IRQ_MODE)
trigger = ACPI_EDGE_SENSITIVE;
else
@@ -58,7 +61,8 @@ static void arm_pmu_acpi_unregister_irq(int cpu)
return;
 
gsi = gicc->performance_interrupt;
-   acpi_unregister_gsi(gsi);
+   if (gsi)
+   acpi_unregister_gsi(gsi);
 }
 
 static int arm_pmu_acpi_parse_irqs(void)
-- 
2.7.4

[PATCH 1/1] drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off

2017-05-24 Thread Wei Huang

We saw perf IRQ init failures when running Linux kernel in an ACPI
guest without PMU (i.e. pmu=off). This is because perf IRQ is not
present when pmu=off, but arm_pmu_acpi still tries to register
or unregister GSI. This patch addresses the problem by checking
gicc->performance_interrupt. If it is 0, which is the value set
by qemu when pmu=off, we skip the IRQ register/unregister process.

[4.069470] bc00: 00040b00 089db190
[4.070267] [] enable_percpu_irq+0xdc/0xe4
[4.071192] [] arm_perf_starting_cpu+0x108/0x10c
[4.072200] [] cpuhp_invoke_callback+0x14c/0x4ac
[4.073210] [] cpuhp_thread_fun+0xd4/0x11c
[4.074132] [] smpboot_thread_fn+0x1b4/0x1c4
[4.075081] [] kthread+0x10c/0x138
[4.075921] [] ret_from_fork+0x10/0x50
[4.076947] genirq: Setting trigger mode 4 for irq 43 failed
(gic_set_type+0x0/0x74)

Signed-off-by: Wei Huang 
---
 drivers/perf/arm_pmu_acpi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
index 34c862f..d6bb75d 100644
--- a/drivers/perf/arm_pmu_acpi.c
+++ b/drivers/perf/arm_pmu_acpi.c
@@ -29,6 +29,9 @@ static int arm_pmu_acpi_register_irq(int cpu)
return -EINVAL;
 
gsi = gicc->performance_interrupt;
+   if (!gsi)
+   return 0;
+
if (gicc->flags & ACPI_MADT_PERFORMANCE_IRQ_MODE)
trigger = ACPI_EDGE_SENSITIVE;
else
@@ -58,7 +61,8 @@ static void arm_pmu_acpi_unregister_irq(int cpu)
return;
 
gsi = gicc->performance_interrupt;
-   acpi_unregister_gsi(gsi);
+   if (gsi)
+   acpi_unregister_gsi(gsi);
 }
 
 static int arm_pmu_acpi_parse_irqs(void)
-- 
2.7.4

Re: [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking

2016-11-21 Thread Wei Huang



On 11/21/2016 03:40 PM, Christopher Covington wrote:
> Hi Wei,
> 
> On 11/21/2016 03:24 PM, Wei Huang wrote:
>> From: Christopher Covington <c...@codeaurora.org>
> 
> I really appreciate your work on these patches. If for any or all of these
> you have more lines added/modified than me (or using any other better
> metric), please make sure to change the author to be you with
> `git commit --amend --reset-author` or equivalent.

Sure, I will if needed. Regarding your comments below, I will fix the
patch series after Drew's comments, if any.

> 
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> Signed-off-by: Wei Huang <w...@redhat.com>
>> ---
>>  arm/pmu.c | 119 
>> +-
>>  arm/unittests.cfg |  14 +++
>>  2 files changed, 132 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 176b070..129ef1e 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
>>  asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
>>  return val;
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 
>> 2*loop.
   
I will change the comment above to "Total instrs".

>> + */
>> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> 
> Nit: I would call this precise_instrs_loop. How many cycles it takes is
> IMPLEMENTATION DEFINED.

You are right. The cycle indeed depends on the design. Will fix.

> 
>> +{
>> +asm volatile(
>> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>> +"   isb\n"
>> +"1: subs%[loop], %[loop], #1\n"
>> +"   bgt 1b\n"
> 
> Is there any chance we might need an isb here, to prevent the stop from 
> happening
> before or during the loop? Where ISBs are required, the Linux best practice 
> is to

In theory, I think this can happen when mcr is executed before all loop
instructions completed, causing pmccntr_read() to miss some cycles. But
QEMU TCG mode doesn't support out-order-execution. So the test
condition, "cpi > 0 && cycles != i * cpi", will never be TRUE. Because
cpi==0 in KVM, this same test condition won't be TRUE under KVM mode either.

> diligently comment why they are needed. Perhaps it would be a good habit to
> carry over into kvm-unit-tests.

Agreed. Most isb() instructions were added following CP15 writes (not
all CP15 writes, but at limited locations). We tried to follow what
Linux kernel does in perf_event.c. If you feel that any isb() place
needs special comment, I will be more than happy to add it.

Re: [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking

2016-11-21 Thread Wei Huang



On 11/21/2016 03:40 PM, Christopher Covington wrote:
> Hi Wei,
> 
> On 11/21/2016 03:24 PM, Wei Huang wrote:
>> From: Christopher Covington 
> 
> I really appreciate your work on these patches. If for any or all of these
> you have more lines added/modified than me (or using any other better
> metric), please make sure to change the author to be you with
> `git commit --amend --reset-author` or equivalent.

Sure, I will if needed. Regarding your comments below, I will fix the
patch series after Drew's comments, if any.

> 
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington 
>> Signed-off-by: Wei Huang 
>> ---
>>  arm/pmu.c | 119 
>> +-
>>  arm/unittests.cfg |  14 +++
>>  2 files changed, 132 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 176b070..129ef1e 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
>>  asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
>>  return val;
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 
>> 2*loop.
   
I will change the comment above to "Total instrs".

>> + */
>> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> 
> Nit: I would call this precise_instrs_loop. How many cycles it takes is
> IMPLEMENTATION DEFINED.

You are right. The cycle indeed depends on the design. Will fix.

> 
>> +{
>> +asm volatile(
>> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>> +"   isb\n"
>> +"1: subs%[loop], %[loop], #1\n"
>> +"   bgt 1b\n"
> 
> Is there any chance we might need an isb here, to prevent the stop from 
> happening
> before or during the loop? Where ISBs are required, the Linux best practice 
> is to

In theory, I think this can happen when mcr is executed before all loop
instructions completed, causing pmccntr_read() to miss some cycles. But
QEMU TCG mode doesn't support out-order-execution. So the test
condition, "cpi > 0 && cycles != i * cpi", will never be TRUE. Because
cpi==0 in KVM, this same test condition won't be TRUE under KVM mode either.

> diligently comment why they are needed. Perhaps it would be a good habit to
> carry over into kvm-unit-tests.

Agreed. Most isb() instructions were added following CP15 writes (not
all CP15 writes, but at limited locations). We tried to follow what
Linux kernel does in perf_event.c. If you feel that any isb() place
needs special comment, I will be more than happy to add it.

[kvm-unit-tests PATCH v10 0/3] ARM PMU tests

2016-11-21 Thread Wei Huang

Changes from v9:
* Move PMCCNTR related configuration from pmu_init() to sub-tests
* Change the name of loop test function to precise_cycles_loop()
* Print out error detail for each test case in check_cpi()
* Fix cpi convertion from argv
* Change the loop calculation in measure_instrs() after cpi is fixed

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Christopher Covington (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c   | 347 
 arm/unittests.cfg   |  19 +++
 3 files changed, 368 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

[kvm-unit-tests PATCH v10 0/3] ARM PMU tests

2016-11-21 Thread Wei Huang

Changes from v9:
* Move PMCCNTR related configuration from pmu_init() to sub-tests
* Change the name of loop test function to precise_cycles_loop()
* Print out error detail for each test case in check_cpi()
* Fix cpi convertion from argv
* Change the loop calculation in measure_instrs() after cpi is fixed

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Christopher Covington (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c   | 347 
 arm/unittests.cfg   |  19 +++
 3 files changed, 368 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

[kvm-unit-tests PATCH v10 1/3] arm: Add PMU test

2016-11-21 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c   | 74 +
 arm/unittests.cfg   |  5 
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f37b5c2..5da2fdd 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -12,7 +12,8 @@ endif
 tests-common = \
$(TEST_DIR)/selftest.flat \
$(TEST_DIR)/spinlock-test.flat \
-   $(TEST_DIR)/pci-test.flat
+   $(TEST_DIR)/pci-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..9d9c53b
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,74 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   uint32_t pmcr;
+
+   pmcr = pmcr_read();
+
+   printf("PMU implementer: %c\n",
+  (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+   printf("Identification code: 0x%x\n",
+  (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+   printf("Event counters:  %d\n",
+  (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+   return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ae32a42..816f494 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -58,3 +58,8 @@ groups = selftest
 [pci-test]
 file = pci-test.flat
 groups = pci
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

[kvm-unit-tests PATCH v10 1/3] arm: Add PMU test

2016-11-21 Thread Wei Huang

From: Christopher Covington 

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
Reviewed-by: Andrew Jones 
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c   | 74 +
 arm/unittests.cfg   |  5 
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f37b5c2..5da2fdd 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -12,7 +12,8 @@ endif
 tests-common = \
$(TEST_DIR)/selftest.flat \
$(TEST_DIR)/spinlock-test.flat \
-   $(TEST_DIR)/pci-test.flat
+   $(TEST_DIR)/pci-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..9d9c53b
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,74 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   uint32_t pmcr;
+
+   pmcr = pmcr_read();
+
+   printf("PMU implementer: %c\n",
+  (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+   printf("Identification code: 0x%x\n",
+  (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+   printf("Event counters:  %d\n",
+  (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+   return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ae32a42..816f494 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -58,3 +58,8 @@ groups = selftest
 [pci-test]
 file = pci-test.flat
 groups = pci
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

[kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking

2016-11-21 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
---
 arm/pmu.c | 119 +-
 arm/unittests.cfg |  14 +++
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 176b070..129ef1e 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
return val;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "   isb\n"
+   "1: subs%[loop], %[loop], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   "   isb\n"
+   : [loop] "+r" (loop)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
return id;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + msr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "   isb\n"
+   "1: subs%[loop], %[loop], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   "   isb\n"
+   : [loop] "+r" (loop)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 /*
@@ -208,6 +246,79 @@ static bool check_cycles_increase(void)
return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int loop = (num - 2) / 2;
+
+   assert(num >= 4 && ((num - 2) % 2 == 0));
+   precise_cycles_loop(loop, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+
+   /* init before event access, this test only cares about cycle count */
+   pmcntenset_write(1 << PMU_CYCLE_IDX);
+   pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (unsigned int i = 4; i < 300; i += 32) {
+   uint64_t avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   uint64_t cycles;
+
+   pmccntr_write(0);
+   measure_instrs(i, pmcr);
+   cycles = pmccntr_read();
+   printf(" %"PRId64"", cycles);
+
+   if (!cycles) {
+   printf("\ncycles not incrementing!\n");
+   return false;
+   } else if (cpi > 0 && cycles != i * cpi) {
+   printf("\nunexpected cycle count

[kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking

2016-11-21 Thread Wei Huang

From: Christopher Covington 

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
---
 arm/pmu.c | 119 +-
 arm/unittests.cfg |  14 +++
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 176b070..129ef1e 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
return val;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "   isb\n"
+   "1: subs%[loop], %[loop], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   "   isb\n"
+   : [loop] "+r" (loop)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
return id;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + msr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "   isb\n"
+   "1: subs%[loop], %[loop], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   "   isb\n"
+   : [loop] "+r" (loop)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 /*
@@ -208,6 +246,79 @@ static bool check_cycles_increase(void)
return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int loop = (num - 2) / 2;
+
+   assert(num >= 4 && ((num - 2) % 2 == 0));
+   precise_cycles_loop(loop, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+
+   /* init before event access, this test only cares about cycle count */
+   pmcntenset_write(1 << PMU_CYCLE_IDX);
+   pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (unsigned int i = 4; i < 300; i += 32) {
+   uint64_t avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   uint64_t cycles;
+
+   pmccntr_write(0);
+   measure_instrs(i, pmcr);
+   cycles = pmccntr_read();
+   printf(" %"PRId64"", cycles);
+
+   if (!cycles) {
+   printf("\ncycles not incrementing!\n");
+   return false;
+   } else if (cpi > 0 && cycles != i * cpi) {
+   printf("\nunexpected cycle count received!\n");
+   return false;
+

[kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases

2016-11-21 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
---
 arm/pmu.c | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 9d9c53b..176b070 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,9 @@
 #include "libcflat.h"
 #include "asm/barrier.h"
 
+#define PMU_PMCR_E (1 << 0)
+#define PMU_PMCR_C (1 << 2)
+#define PMU_PMCR_LC(1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -22,6 +25,14 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX 31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+   isb();
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+   isb();
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint32_t lo, hi = 0;
+
+   if (pmu_version == 0x3)
+   asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
+   else
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
+
+   return ((uint64_t)hi << 32) | lo;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   uint32_t lo, hi;
+
+   lo = value & 0x;
+   hi = (value >> 32) & 0x;
+
+   if (pmu_version == 0x3)
+   asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
+   else
+   asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+   pmselr_write(PMU_CYCLE_IDX);
+   pmxevtyper_write(value);
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t val;
+
+   asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
+   return val;
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint64_t cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   asm volatile("msr pmccntr_el0, %0" : : "r" (value));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+   asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t id;
+
+   asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
+   return id;
+}
 #endif
 
 /*
@@ -64,11 +176,55 @@ static bool check_pmcr(void)
return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   bool success = true;
+
+   /* init before event access, this test only cares about cycle count */
+   pmcntenset_write(1 << PMU_CYCLE_IDX);
+   pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+   pmccntr_write(0);
+
+   pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   uint64_t a, b;
+
+

[kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases

2016-11-21 Thread Wei Huang

From: Christopher Covington 

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
---
 arm/pmu.c | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 9d9c53b..176b070 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,9 @@
 #include "libcflat.h"
 #include "asm/barrier.h"
 
+#define PMU_PMCR_E (1 << 0)
+#define PMU_PMCR_C (1 << 2)
+#define PMU_PMCR_LC(1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -22,6 +25,14 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX 31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+   isb();
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+   isb();
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint32_t lo, hi = 0;
+
+   if (pmu_version == 0x3)
+   asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
+   else
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
+
+   return ((uint64_t)hi << 32) | lo;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   uint32_t lo, hi;
+
+   lo = value & 0x;
+   hi = (value >> 32) & 0x;
+
+   if (pmu_version == 0x3)
+   asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
+   else
+   asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+   pmselr_write(PMU_CYCLE_IDX);
+   pmxevtyper_write(value);
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t val;
+
+   asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
+   return val;
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint64_t cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   asm volatile("msr pmccntr_el0, %0" : : "r" (value));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+   asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t id;
+
+   asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
+   return id;
+}
 #endif
 
 /*
@@ -64,11 +176,55 @@ static bool check_pmcr(void)
return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   bool success = true;
+
+   /* init before event access, this test only cares about cycle count */
+   pmcntenset_write(1 << PMU_CYCLE_IDX);
+   pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+   pmccntr_write(0);
+
+   pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   uint64_t a, b;
+
+   a = pmccntr_read();
+   b = pmccntr_read();
+
+

[kvm-unit-tests PATCH v9 1/3] arm: Add PMU test

2016-11-18 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c   | 74 +
 arm/unittests.cfg   |  5 
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index ccb554d..f98f422 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..9d9c53b
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,74 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   uint32_t pmcr;
+
+   pmcr = pmcr_read();
+
+   printf("PMU implementer: %c\n",
+  (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+   printf("Identification code: 0x%x\n",
+  (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+   printf("Event counters:  %d\n",
+  (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+   return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 3f6fa45..7645180 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -54,3 +54,8 @@ file = selftest.flat
 smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

[kvm-unit-tests PATCH v9 1/3] arm: Add PMU test

2016-11-18 Thread Wei Huang

From: Christopher Covington 

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
Reviewed-by: Andrew Jones 
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c   | 74 +
 arm/unittests.cfg   |  5 
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index ccb554d..f98f422 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..9d9c53b
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,74 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   uint32_t pmcr;
+
+   pmcr = pmcr_read();
+
+   printf("PMU implementer: %c\n",
+  (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+   printf("Identification code: 0x%x\n",
+  (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+   printf("Event counters:  %d\n",
+  (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+   return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 3f6fa45..7645180 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -54,3 +54,8 @@ file = selftest.flat
 smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

[kvm-unit-tests PATCH v9 0/3] ARM PMU tests

2016-11-18 Thread Wei Huang

Changes from v8:
* Probe PMU version based on ID_DFR0
* pmccntr_read() now returns 64bit and can handle both 32bit and 64bit
  PMCCNTR based on PMU version.
* Add pmccntr_write() support
* Use a common printf format PRId64 to support 64bit variable smoothly in
  test functions
* Add barriers to several PMU write functions
* Verfied on different execution modes

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Wei Huang (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c   | 339 
 arm/unittests.cfg   |  19 +++
 3 files changed, 360 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

[kvm-unit-tests PATCH v9 2/3] arm: pmu: Check cycle count increases

2016-11-18 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
---
 arm/pmu.c | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 9d9c53b..fa87de4 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,9 @@
 #include "libcflat.h"
 #include "asm/barrier.h"
 
+#define PMU_PMCR_E (1 << 0)
+#define PMU_PMCR_C (1 << 2)
+#define PMU_PMCR_LC(1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -22,6 +25,14 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX 31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+   isb();
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+   isb();
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint32_t lo, hi = 0;
+
+   if (pmu_version == 0x3)
+   asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
+   else
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
+
+   return ((uint64_t)hi << 32) | lo;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   uint32_t lo, hi;
+
+   lo = value & 0x;
+   hi = (value >> 32) & 0x;
+
+   if (pmu_version == 0x3)
+   asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
+   else
+   asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+   pmselr_write(PMU_CYCLE_IDX);
+   pmxevtyper_write(value);
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t val;
+
+   asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
+   return val;
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint64_t cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   asm volatile("msr pmccntr_el0, %0" : : "r" (value));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+   asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t id;
+
+   asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
+   return id;
+}
 #endif
 
 /*
@@ -64,11 +176,55 @@ static bool check_pmcr(void)
return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   bool success = true;
+
+   pmccntr_write(0);
+   pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   uint64_t a, b;
+
+   a = pmccntr_read();
+   b = pmccntr_read();
+
+   if (a >= b) {
+   printf("Read %"PRId64" then %"PRId64".\n", a, b);
+

[kvm-unit-tests PATCH v9 3/3] arm: pmu: Add CPI checking

2016-11-18 Thread Wei Huang

From: Christopher Covington <c...@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Signed-off-by: Wei Huang <w...@redhat.com>
---
 arm/pmu.c | 111 +-
 arm/unittests.cfg |  14 +++
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index fa87de4..b36c4fb 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
return val;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "   isb\n"
+   "1: subs%[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   "   isb\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
return id;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "   isb\n"
+   "1: subs%[i], %[i], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   "   isb\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 /*
@@ -204,6 +242,71 @@ static bool check_cycles_increase(void)
return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+   
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (unsigned int i = 3; i < 300; i += 32) {
+   uint64_t avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   uint64_t cycles;
+
+   pmccntr_write(0);
+   measure_instrs(i, pmcr);
+   cycles = pmccntr_read();
+   printf(" %"PRId64"", cycles);
+
+   /*
+* The cycles taken by the loop above should fit in
+* 32 bits easily. We check the upper 32 bits of the
+* cycle counter to make sure there is no supprise.
+*/
+   if (!cycles || (cpi > 0 && cycles != i * cpi) ||
+   (cycles & 0x)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%"PRId64" a

[kvm-unit-tests PATCH v9 0/3] ARM PMU tests

2016-11-18 Thread Wei Huang

Changes from v8:
* Probe PMU version based on ID_DFR0
* pmccntr_read() now returns 64bit and can handle both 32bit and 64bit
  PMCCNTR based on PMU version.
* Add pmccntr_write() support
* Use a common printf format PRId64 to support 64bit variable smoothly in
  test functions
* Add barriers to several PMU write functions
* Verfied on different execution modes

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Wei Huang (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c   | 339 
 arm/unittests.cfg   |  19 +++
 3 files changed, 360 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

[kvm-unit-tests PATCH v9 2/3] arm: pmu: Check cycle count increases

2016-11-18 Thread Wei Huang

From: Christopher Covington 

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
---
 arm/pmu.c | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 9d9c53b..fa87de4 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,9 @@
 #include "libcflat.h"
 #include "asm/barrier.h"
 
+#define PMU_PMCR_E (1 << 0)
+#define PMU_PMCR_C (1 << 2)
+#define PMU_PMCR_LC(1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -22,6 +25,14 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX 31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+   isb();
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+   isb();
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint32_t lo, hi = 0;
+
+   if (pmu_version == 0x3)
+   asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
+   else
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
+
+   return ((uint64_t)hi << 32) | lo;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   uint32_t lo, hi;
+
+   lo = value & 0x;
+   hi = (value >> 32) & 0x;
+
+   if (pmu_version == 0x3)
+   asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
+   else
+   asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+   pmselr_write(PMU_CYCLE_IDX);
+   pmxevtyper_write(value);
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t val;
+
+   asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
+   return val;
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+   uint64_t cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+   asm volatile("msr pmccntr_el0, %0" : : "r" (value));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+   asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+   asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+   isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+   uint32_t id;
+
+   asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
+   return id;
+}
 #endif
 
 /*
@@ -64,11 +176,55 @@ static bool check_pmcr(void)
return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   bool success = true;
+
+   pmccntr_write(0);
+   pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   uint64_t a, b;
+
+   a = pmccntr_read();
+   b = pmccntr_read();
+
+   if (a >= b) {
+   printf("Read %"PRId64" then %"PRId64".\n", a, b);
+   success = false;
+   break;
+

[kvm-unit-tests PATCH v9 3/3] arm: pmu: Add CPI checking

2016-11-18 Thread Wei Huang

From: Christopher Covington 

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington 
Signed-off-by: Wei Huang 
---
 arm/pmu.c | 111 +-
 arm/unittests.cfg |  14 +++
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index fa87de4..b36c4fb 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
return val;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "   isb\n"
+   "1: subs%[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   "   isb\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
return id;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "   isb\n"
+   "1: subs%[i], %[i], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   "   isb\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 /*
@@ -204,6 +242,71 @@ static bool check_cycles_increase(void)
return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+   
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (unsigned int i = 3; i < 300; i += 32) {
+   uint64_t avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   uint64_t cycles;
+
+   pmccntr_write(0);
+   measure_instrs(i, pmcr);
+   cycles = pmccntr_read();
+   printf(" %"PRId64"", cycles);
+
+   /*
+* The cycles taken by the loop above should fit in
+* 32 bits easily. We check the upper 32 bits of the
+* cycle counter to make sure there is no supprise.
+*/
+   if (!cycles || (cpi > 0 && cycles != i * cpi) ||
+   (cycles & 0x)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
+

[PATCH v3 1/1] KVM: ARM64: Fix the issues when guest PMCCFILTR is configured

2016-11-16 Thread Wei Huang

KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

To support this patch and to prevent duplicated definitions, a limited
set of ARMv8 perf event types were relocated from perf_event.c to
asm/perf_event.h.

Signed-off-by: Wei Huang <w...@redhat.com>
---
 arch/arm64/include/asm/perf_event.h | 10 +-
 arch/arm64/kernel/perf_event.c  | 10 +-
 virt/kvm/arm/pmu.c  |  8 +---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2065f46..38b6a2b 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -46,7 +46,15 @@
 #defineARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
bits */
 #defineARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
*/
 
-#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR 0   /* Software increment event */
+/*
+ * PMUv3 event types: required events
+ */
+#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
+#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
+#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
 
 /*
  * Event filters for PMUv3
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index a9310a6..57ae9d9 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -31,17 +31,9 @@
 
 /*
  * ARMv8 PMUv3 Performance Events handling code.
- * Common event types.
+ * Common event types (some are defined in asm/perf_event.h).
  */
 
-/* Required events. */
-#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
-#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
-#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
-#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
-
 /* At least one of the following is required. */
 #define ARMV8_PMUV3_PERFCTR_INST_RETIRED   0x08
 #define ARMV8_PMUV3_PERFCTR_INST_SPEC  0x1B
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e9c40e..69ccce3 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -305,7 +305,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
continue;
type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
   & ARMV8_PMU_EVTYPE_EVENT;
-   if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
&& (enable & BIT(i))) {
reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
reg = lower_32_bits(reg);
@@ -379,7 +379,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
 
/* Software increment event does't need to be backed by a perf event */
-   if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR &&
+   select_idx != ARMV8_PMU_CYCLE_IDX)
return;
 
memset(, 0, sizeof(struct perf_event_attr));
@@ -391,7 +392,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0;
attr.exclude_hv = 1; /* Don't count EL2 events */
attr.exclude_host = 1; /* Don't count host events */
-   attr.config = eventsel;
+   attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ?
+   ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
 
counter = kvm_pmu_get_counter_value(vcpu, select_idx);
/* The initial sample period (overflow count) of an event. */
-- 
2.7.4

[PATCH v3 1/1] KVM: ARM64: Fix the issues when guest PMCCFILTR is configured

2016-11-16 Thread Wei Huang

KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

To support this patch and to prevent duplicated definitions, a limited
set of ARMv8 perf event types were relocated from perf_event.c to
asm/perf_event.h.

Signed-off-by: Wei Huang 
---
 arch/arm64/include/asm/perf_event.h | 10 +-
 arch/arm64/kernel/perf_event.c  | 10 +-
 virt/kvm/arm/pmu.c  |  8 +---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2065f46..38b6a2b 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -46,7 +46,15 @@
 #defineARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
bits */
 #defineARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
*/
 
-#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR 0   /* Software increment event */
+/*
+ * PMUv3 event types: required events
+ */
+#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
+#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
+#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
 
 /*
  * Event filters for PMUv3
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index a9310a6..57ae9d9 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -31,17 +31,9 @@
 
 /*
  * ARMv8 PMUv3 Performance Events handling code.
- * Common event types.
+ * Common event types (some are defined in asm/perf_event.h).
  */
 
-/* Required events. */
-#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
-#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
-#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
-#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
-
 /* At least one of the following is required. */
 #define ARMV8_PMUV3_PERFCTR_INST_RETIRED   0x08
 #define ARMV8_PMUV3_PERFCTR_INST_SPEC  0x1B
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e9c40e..69ccce3 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -305,7 +305,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
continue;
type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
   & ARMV8_PMU_EVTYPE_EVENT;
-   if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
&& (enable & BIT(i))) {
reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
reg = lower_32_bits(reg);
@@ -379,7 +379,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
 
/* Software increment event does't need to be backed by a perf event */
-   if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR &&
+   select_idx != ARMV8_PMU_CYCLE_IDX)
return;
 
memset(, 0, sizeof(struct perf_event_attr));
@@ -391,7 +392,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0;
attr.exclude_hv = 1; /* Don't count EL2 events */
attr.exclude_host = 1; /* Don't count host events */
-   attr.config = eventsel;
+   attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ?
+   ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
 
counter = kvm_pmu_get_counter_value(vcpu, select_idx);
/* The initial sample period (overflow count) of an event. */
-- 
2.7.4

Re: [PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-10 Thread Wei Huang



On 11/10/2016 11:17 AM, Will Deacon wrote:
> On Thu, Nov 10, 2016 at 03:32:12PM +, Marc Zyngier wrote:
>> On 10/11/16 15:12, Wei Huang wrote:
>>>
>>>
>>> On 11/10/2016 03:10 AM, Marc Zyngier wrote:
>>>> Hi Wei,
>>>>
>>>> On 09/11/16 19:57, Wei Huang wrote:
>>>>> This patch moves ARMv8-related perf event definitions from perf_event.c
>>>>> to asm/perf_event.h; so KVM code can use them directly. This also help
>>>>> remove a duplicated definition of SW_INCR in perf_event.h.
>>>>>
>>>>> Signed-off-by: Wei Huang <w...@redhat.com>
>>>>> ---
>>>>>  arch/arm64/include/asm/perf_event.h | 161 
>>>>> +++-
>>>>>  arch/arm64/kernel/perf_event.c  | 161 
>>>>> 
>>>>>  2 files changed, 160 insertions(+), 162 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/perf_event.h 
>>>>> b/arch/arm64/include/asm/perf_event.h
>>>>> index 2065f46..6c7b18b 100644
>>>>> --- a/arch/arm64/include/asm/perf_event.h
>>>>> +++ b/arch/arm64/include/asm/perf_event.h
>>>>> @@ -46,7 +46,166 @@
>>>>>  #define  ARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
>>>>> bits */
>>>>>  #define  ARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
>>>>> */
>>>>>  
>>>>> -#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR   0   /* Software increment 
>>>>> event */
>>>>> +/*
>>>>> + * ARMv8 PMUv3 Performance Events handling code.
>>>>> + * Common event types.
>>>>> + */
>>>>> +
>>>>> +/* Required events. */
>>>>> +#define ARMV8_PMUV3_PERFCTR_SW_INCR  0x00
>>>>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL 0x03
>>>>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE0x04
>>>>> +#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED  0x10
>>>>> +#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES   0x11
>>>>> +#define ARMV8_PMUV3_PERFCTR_BR_PRED  0x12
>>>>
>>>> In my initial review, I asked for the "required" events to be moved to a
>>>> shared location. What's the rational for moving absolutely everything?
>>>
>>> I did notice the phrase "required" in the original email. However I
>>> think it is weird to have two places for a same set of PMU definitions.
>>> Other developers might think these two are missing if they don't search
>>> kernel files carefully.
>>>
>>> If Will Deacon and you insist, I can move only two defs to perf_event.h,
>>> consolidated with the 2nd patch into a single one.
>>
>> My personal feeling is that only architected events should be in a
>> public header. The CPU-specific ones are probably better kept private,
>> as it is doubtful that other users would appear).
>>
>> I'll leave it up to Will to decide, as all I want to avoid is the
>> duplication of constants between the PMU and KVM code bases.
> 
> Yeah, just take the sets that you need (i.e. the architected events).

Hi Will,

Just to clarify what "architected" means:

We need two for KVM: SW_INCR (architectural) and CPU_CYCLES
(micro-architectural). Looking at perf_event.c file, I can either
relocate the  "Required events" (6 events) or the whole set of
ARMV8_PMUV3_PERFCTR_* (~50 events) to perf_event.h. Which way you prefer?

Thanks,
-Wei

> 
> Also, check that it builds.
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-10 Thread Wei Huang



On 11/10/2016 11:17 AM, Will Deacon wrote:
> On Thu, Nov 10, 2016 at 03:32:12PM +, Marc Zyngier wrote:
>> On 10/11/16 15:12, Wei Huang wrote:
>>>
>>>
>>> On 11/10/2016 03:10 AM, Marc Zyngier wrote:
>>>> Hi Wei,
>>>>
>>>> On 09/11/16 19:57, Wei Huang wrote:
>>>>> This patch moves ARMv8-related perf event definitions from perf_event.c
>>>>> to asm/perf_event.h; so KVM code can use them directly. This also help
>>>>> remove a duplicated definition of SW_INCR in perf_event.h.
>>>>>
>>>>> Signed-off-by: Wei Huang 
>>>>> ---
>>>>>  arch/arm64/include/asm/perf_event.h | 161 
>>>>> +++-
>>>>>  arch/arm64/kernel/perf_event.c  | 161 
>>>>> 
>>>>>  2 files changed, 160 insertions(+), 162 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/perf_event.h 
>>>>> b/arch/arm64/include/asm/perf_event.h
>>>>> index 2065f46..6c7b18b 100644
>>>>> --- a/arch/arm64/include/asm/perf_event.h
>>>>> +++ b/arch/arm64/include/asm/perf_event.h
>>>>> @@ -46,7 +46,166 @@
>>>>>  #define  ARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
>>>>> bits */
>>>>>  #define  ARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
>>>>> */
>>>>>  
>>>>> -#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR   0   /* Software increment 
>>>>> event */
>>>>> +/*
>>>>> + * ARMv8 PMUv3 Performance Events handling code.
>>>>> + * Common event types.
>>>>> + */
>>>>> +
>>>>> +/* Required events. */
>>>>> +#define ARMV8_PMUV3_PERFCTR_SW_INCR  0x00
>>>>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL 0x03
>>>>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE0x04
>>>>> +#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED  0x10
>>>>> +#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES   0x11
>>>>> +#define ARMV8_PMUV3_PERFCTR_BR_PRED  0x12
>>>>
>>>> In my initial review, I asked for the "required" events to be moved to a
>>>> shared location. What's the rational for moving absolutely everything?
>>>
>>> I did notice the phrase "required" in the original email. However I
>>> think it is weird to have two places for a same set of PMU definitions.
>>> Other developers might think these two are missing if they don't search
>>> kernel files carefully.
>>>
>>> If Will Deacon and you insist, I can move only two defs to perf_event.h,
>>> consolidated with the 2nd patch into a single one.
>>
>> My personal feeling is that only architected events should be in a
>> public header. The CPU-specific ones are probably better kept private,
>> as it is doubtful that other users would appear).
>>
>> I'll leave it up to Will to decide, as all I want to avoid is the
>> duplication of constants between the PMU and KVM code bases.
> 
> Yeah, just take the sets that you need (i.e. the architected events).

Hi Will,

Just to clarify what "architected" means:

We need two for KVM: SW_INCR (architectural) and CPU_CYCLES
(micro-architectural). Looking at perf_event.c file, I can either
relocate the  "Required events" (6 events) or the whole set of
ARMV8_PMUV3_PERFCTR_* (~50 events) to perf_event.h. Which way you prefer?

Thanks,
-Wei

> 
> Also, check that it builds.
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-10 Thread Wei Huang



On 11/10/2016 03:10 AM, Marc Zyngier wrote:
> Hi Wei,
> 
> On 09/11/16 19:57, Wei Huang wrote:
>> This patch moves ARMv8-related perf event definitions from perf_event.c
>> to asm/perf_event.h; so KVM code can use them directly. This also help
>> remove a duplicated definition of SW_INCR in perf_event.h.
>>
>> Signed-off-by: Wei Huang <w...@redhat.com>
>> ---
>>  arch/arm64/include/asm/perf_event.h | 161 
>> +++-
>>  arch/arm64/kernel/perf_event.c  | 161 
>> 
>>  2 files changed, 160 insertions(+), 162 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h 
>> b/arch/arm64/include/asm/perf_event.h
>> index 2065f46..6c7b18b 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -46,7 +46,166 @@
>>  #define ARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
>> bits */
>>  #define ARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
>> */
>>  
>> -#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR  0   /* Software increment 
>> event */
>> +/*
>> + * ARMv8 PMUv3 Performance Events handling code.
>> + * Common event types.
>> + */
>> +
>> +/* Required events. */
>> +#define ARMV8_PMUV3_PERFCTR_SW_INCR 0x00
>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL0x03
>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE   0x04
>> +#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED 0x10
>> +#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES  0x11
>> +#define ARMV8_PMUV3_PERFCTR_BR_PRED 0x12
> 
> In my initial review, I asked for the "required" events to be moved to a
> shared location. What's the rational for moving absolutely everything?

I did notice the phrase "required" in the original email. However I
think it is weird to have two places for a same set of PMU definitions.
Other developers might think these two are missing if they don't search
kernel files carefully.

If Will Deacon and you insist, I can move only two defs to perf_event.h,
consolidated with the 2nd patch into a single one.

> KVM only needs to know about ARMV8_PMUV3_PERFCTR_SW_INCR and
> ARMV8_PMUV3_PERFCTR_CPU_CYCLES, so I thought that moving the above six
> events (and maybe the following two) would be enough.
> 
> Also, you've now broken the build by dropping
> ARMV8_PMU_EVTYPE_EVENT_SW_INCR without amending it use in the KVM PMU
> code (see the kbuild report).
> 

My bad. I tested compilation only after two patches applied. Will fix it.



>> +
>>  /* PMUv3 HW events mapping. */
>>  
>>  /*
>>
> 
> Thanks,
> 
>   M.
>

Re: [PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-10 Thread Wei Huang



On 11/10/2016 03:10 AM, Marc Zyngier wrote:
> Hi Wei,
> 
> On 09/11/16 19:57, Wei Huang wrote:
>> This patch moves ARMv8-related perf event definitions from perf_event.c
>> to asm/perf_event.h; so KVM code can use them directly. This also help
>> remove a duplicated definition of SW_INCR in perf_event.h.
>>
>> Signed-off-by: Wei Huang 
>> ---
>>  arch/arm64/include/asm/perf_event.h | 161 
>> +++-
>>  arch/arm64/kernel/perf_event.c  | 161 
>> 
>>  2 files changed, 160 insertions(+), 162 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/perf_event.h 
>> b/arch/arm64/include/asm/perf_event.h
>> index 2065f46..6c7b18b 100644
>> --- a/arch/arm64/include/asm/perf_event.h
>> +++ b/arch/arm64/include/asm/perf_event.h
>> @@ -46,7 +46,166 @@
>>  #define ARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
>> bits */
>>  #define ARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
>> */
>>  
>> -#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR  0   /* Software increment 
>> event */
>> +/*
>> + * ARMv8 PMUv3 Performance Events handling code.
>> + * Common event types.
>> + */
>> +
>> +/* Required events. */
>> +#define ARMV8_PMUV3_PERFCTR_SW_INCR 0x00
>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL0x03
>> +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE   0x04
>> +#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED 0x10
>> +#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES  0x11
>> +#define ARMV8_PMUV3_PERFCTR_BR_PRED 0x12
> 
> In my initial review, I asked for the "required" events to be moved to a
> shared location. What's the rational for moving absolutely everything?

I did notice the phrase "required" in the original email. However I
think it is weird to have two places for a same set of PMU definitions.
Other developers might think these two are missing if they don't search
kernel files carefully.

If Will Deacon and you insist, I can move only two defs to perf_event.h,
consolidated with the 2nd patch into a single one.

> KVM only needs to know about ARMV8_PMUV3_PERFCTR_SW_INCR and
> ARMV8_PMUV3_PERFCTR_CPU_CYCLES, so I thought that moving the above six
> events (and maybe the following two) would be enough.
> 
> Also, you've now broken the build by dropping
> ARMV8_PMU_EVTYPE_EVENT_SW_INCR without amending it use in the KVM PMU
> code (see the kbuild report).
> 

My bad. I tested compilation only after two patches applied. Will fix it.



>> +
>>  /* PMUv3 HW events mapping. */
>>  
>>  /*
>>
> 
> Thanks,
> 
>   M.
>

[PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-09 Thread Wei Huang

This patch moves ARMv8-related perf event definitions from perf_event.c
to asm/perf_event.h; so KVM code can use them directly. This also help
remove a duplicated definition of SW_INCR in perf_event.h.

Signed-off-by: Wei Huang <w...@redhat.com>
---
 arch/arm64/include/asm/perf_event.h | 161 +++-
 arch/arm64/kernel/perf_event.c  | 161 
 2 files changed, 160 insertions(+), 162 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2065f46..6c7b18b 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -46,7 +46,166 @@
 #defineARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
bits */
 #defineARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
*/
 
-#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR 0   /* Software increment event */
+/*
+ * ARMv8 PMUv3 Performance Events handling code.
+ * Common event types.
+ */
+
+/* Required events. */
+#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
+#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
+#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
+
+/* At least one of the following is required. */
+#define ARMV8_PMUV3_PERFCTR_INST_RETIRED   0x08
+#define ARMV8_PMUV3_PERFCTR_INST_SPEC  0x1B
+
+/* Common architectural events. */
+#define ARMV8_PMUV3_PERFCTR_LD_RETIRED 0x06
+#define ARMV8_PMUV3_PERFCTR_ST_RETIRED 0x07
+#define ARMV8_PMUV3_PERFCTR_EXC_TAKEN  0x09
+#define ARMV8_PMUV3_PERFCTR_EXC_RETURN 0x0A
+#define ARMV8_PMUV3_PERFCTR_CID_WRITE_RETIRED  0x0B
+#define ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED   0x0C
+#define ARMV8_PMUV3_PERFCTR_BR_IMMED_RETIRED   0x0D
+#define ARMV8_PMUV3_PERFCTR_BR_RETURN_RETIRED  0x0E
+#define ARMV8_PMUV3_PERFCTR_UNALIGNED_LDST_RETIRED 0x0F
+#define ARMV8_PMUV3_PERFCTR_TTBR_WRITE_RETIRED 0x1C
+#define ARMV8_PMUV3_PERFCTR_CHAIN  0x1E
+#define ARMV8_PMUV3_PERFCTR_BR_RETIRED 0x21
+
+/* Common microarchitectural events. */
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL   0x01
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL 0x02
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL 0x05
+#define ARMV8_PMUV3_PERFCTR_MEM_ACCESS 0x13
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE  0x14
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_WB   0x15
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE  0x16
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL   0x17
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_WB   0x18
+#define ARMV8_PMUV3_PERFCTR_BUS_ACCESS 0x19
+#define ARMV8_PMUV3_PERFCTR_MEMORY_ERROR   0x1A
+#define ARMV8_PMUV3_PERFCTR_BUS_CYCLES 0x1D
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_ALLOCATE 0x1F
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_ALLOCATE 0x20
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED_RETIRED0x22
+#define ARMV8_PMUV3_PERFCTR_STALL_FRONTEND 0x23
+#define ARMV8_PMUV3_PERFCTR_STALL_BACKEND  0x24
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB0x25
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB0x26
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE  0x27
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE_REFILL   0x28
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_ALLOCATE 0x29
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_REFILL   0x2A
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE  0x2B
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_WB   0x2C
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB_REFILL 0x2D
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB_REFILL 0x2E
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB0x2F
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB0x30
+
+/* ARMv8 recommended implementation defined event types */
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD  0x40
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR  0x41
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_RD   0x42
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_WR   0x43
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_INNER0x44
+#

[PATCH 2/2] KVM: ARM64: Fix the issues when guest PMCCFILTR is configured

2016-11-09 Thread Wei Huang

KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bit of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

Signed-off-by: Wei Huang <w...@redhat.com>
---
 virt/kvm/arm/pmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e9c40e..69ccce3 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -305,7 +305,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
continue;
type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
   & ARMV8_PMU_EVTYPE_EVENT;
-   if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
&& (enable & BIT(i))) {
reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
reg = lower_32_bits(reg);
@@ -379,7 +379,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
 
/* Software increment event does't need to be backed by a perf event */
-   if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR &&
+   select_idx != ARMV8_PMU_CYCLE_IDX)
return;
 
memset(, 0, sizeof(struct perf_event_attr));
@@ -391,7 +392,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0;
attr.exclude_hv = 1; /* Don't count EL2 events */
attr.exclude_host = 1; /* Don't count host events */
-   attr.config = eventsel;
+   attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ?
+   ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
 
counter = kvm_pmu_get_counter_value(vcpu, select_idx);
/* The initial sample period (overflow count) of an event. */
-- 
2.7.4

[PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h

2016-11-09 Thread Wei Huang

This patch moves ARMv8-related perf event definitions from perf_event.c
to asm/perf_event.h; so KVM code can use them directly. This also help
remove a duplicated definition of SW_INCR in perf_event.h.

Signed-off-by: Wei Huang 
---
 arch/arm64/include/asm/perf_event.h | 161 +++-
 arch/arm64/kernel/perf_event.c  | 161 
 2 files changed, 160 insertions(+), 162 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 2065f46..6c7b18b 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -46,7 +46,166 @@
 #defineARMV8_PMU_EVTYPE_MASK   0xc800  /* Mask for writable 
bits */
 #defineARMV8_PMU_EVTYPE_EVENT  0x  /* Mask for EVENT bits 
*/
 
-#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR 0   /* Software increment event */
+/*
+ * ARMv8 PMUv3 Performance Events handling code.
+ * Common event types.
+ */
+
+/* Required events. */
+#define ARMV8_PMUV3_PERFCTR_SW_INCR0x00
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL   0x03
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE  0x04
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED0x10
+#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES 0x11
+#define ARMV8_PMUV3_PERFCTR_BR_PRED0x12
+
+/* At least one of the following is required. */
+#define ARMV8_PMUV3_PERFCTR_INST_RETIRED   0x08
+#define ARMV8_PMUV3_PERFCTR_INST_SPEC  0x1B
+
+/* Common architectural events. */
+#define ARMV8_PMUV3_PERFCTR_LD_RETIRED 0x06
+#define ARMV8_PMUV3_PERFCTR_ST_RETIRED 0x07
+#define ARMV8_PMUV3_PERFCTR_EXC_TAKEN  0x09
+#define ARMV8_PMUV3_PERFCTR_EXC_RETURN 0x0A
+#define ARMV8_PMUV3_PERFCTR_CID_WRITE_RETIRED  0x0B
+#define ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED   0x0C
+#define ARMV8_PMUV3_PERFCTR_BR_IMMED_RETIRED   0x0D
+#define ARMV8_PMUV3_PERFCTR_BR_RETURN_RETIRED  0x0E
+#define ARMV8_PMUV3_PERFCTR_UNALIGNED_LDST_RETIRED 0x0F
+#define ARMV8_PMUV3_PERFCTR_TTBR_WRITE_RETIRED 0x1C
+#define ARMV8_PMUV3_PERFCTR_CHAIN  0x1E
+#define ARMV8_PMUV3_PERFCTR_BR_RETIRED 0x21
+
+/* Common microarchitectural events. */
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL   0x01
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL 0x02
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL 0x05
+#define ARMV8_PMUV3_PERFCTR_MEM_ACCESS 0x13
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE  0x14
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_WB   0x15
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE  0x16
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL   0x17
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_WB   0x18
+#define ARMV8_PMUV3_PERFCTR_BUS_ACCESS 0x19
+#define ARMV8_PMUV3_PERFCTR_MEMORY_ERROR   0x1A
+#define ARMV8_PMUV3_PERFCTR_BUS_CYCLES 0x1D
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_ALLOCATE 0x1F
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_ALLOCATE 0x20
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED_RETIRED0x22
+#define ARMV8_PMUV3_PERFCTR_STALL_FRONTEND 0x23
+#define ARMV8_PMUV3_PERFCTR_STALL_BACKEND  0x24
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB0x25
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB0x26
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE  0x27
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE_REFILL   0x28
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_ALLOCATE 0x29
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_REFILL   0x2A
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE  0x2B
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_WB   0x2C
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB_REFILL 0x2D
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB_REFILL 0x2E
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB0x2F
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB0x30
+
+/* ARMv8 recommended implementation defined event types */
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD  0x40
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR  0x41
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_RD   0x42
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_WR   0x43
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_INNER0x44
+#define

[PATCH 2/2] KVM: ARM64: Fix the issues when guest PMCCFILTR is configured

2016-11-09 Thread Wei Huang

KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bit of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

Signed-off-by: Wei Huang 
---
 virt/kvm/arm/pmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e9c40e..69ccce3 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -305,7 +305,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
continue;
type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
   & ARMV8_PMU_EVTYPE_EVENT;
-   if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
&& (enable & BIT(i))) {
reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
reg = lower_32_bits(reg);
@@ -379,7 +379,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
 
/* Software increment event does't need to be backed by a perf event */
-   if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+   if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR &&
+   select_idx != ARMV8_PMU_CYCLE_IDX)
return;
 
memset(, 0, sizeof(struct perf_event_attr));
@@ -391,7 +392,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0;
attr.exclude_hv = 1; /* Don't count EL2 events */
attr.exclude_host = 1; /* Don't count host events */
-   attr.config = eventsel;
+   attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ?
+   ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
 
counter = kvm_pmu_get_counter_value(vcpu, select_idx);
/* The initial sample period (overflow count) of an event. */
-- 
2.7.4

Re: [PATCH v2 1/6] KVM: arm/arm64: arch_timer: Gather KVM specific information in a structure

2016-02-18 Thread Wei Huang



On 02/11/2016 09:33 AM, Julien Grall wrote:
> Introduce a structure which are filled up by the arch timer driver and
> used by the virtual timer in KVM.
> 
> The first member of this structure will be the timecounter. More members
> will be added later.
> 
> This is also dropping arch_timer_get_timecounter as it was only used by
> the KVM code. Furthermore, a stub for the new helper hasn't been
> introduced because KVM is requiring the arch timer for both ARM64 and
> ARM32.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Gleb Natapov 
> Cc: Paolo Bonzini 
> ---
>  drivers/clocksource/arm_arch_timer.c |  9 +
>  include/clocksource/arm_arch_timer.h | 12 ++--
>  virt/kvm/arm/arch_timer.c|  6 +++---
>  3 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index c64d543..6eb2c5d 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -447,11 +447,11 @@ static struct cyclecounter cyclecounter = {
>   .mask   = CLOCKSOURCE_MASK(56),
>  };
>  
> -static struct timecounter timecounter;
> +static struct arch_timer_kvm_info arch_timer_kvm_info;
>  
> -struct timecounter *arch_timer_get_timecounter(void)
> +struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
>  {
> - return 
> + return _timer_kvm_info;
>  }
>  
>  static void __init arch_counter_register(unsigned type)
> @@ -479,7 +479,8 @@ static void __init arch_counter_register(unsigned type)
>   clocksource_register_hz(_counter, arch_timer_rate);
>   cyclecounter.mult = clocksource_counter.mult;
>   cyclecounter.shift = clocksource_counter.shift;
> - timecounter_init(, , start_count);
> + timecounter_init(_timer_kvm_info.timecounter,
> +  , start_count);
>  
>   /* 56 bits minimum, so we assume worst case rollover */
>   sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate);
> diff --git a/include/clocksource/arm_arch_timer.h 
> b/include/clocksource/arm_arch_timer.h
> index 25d0914..4d487f8 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -49,11 +49,16 @@ enum arch_timer_reg {
>  
>  #define ARCH_TIMER_EVT_STREAM_FREQ   1   /* 100us */
>  
> +struct arch_timer_kvm_info {
> + struct timecounter timecounter;
> +};
> +
>  #ifdef CONFIG_ARM_ARCH_TIMER
>  
>  extern u32 arch_timer_get_rate(void);
>  extern u64 (*arch_timer_read_counter)(void);
> -extern struct timecounter *arch_timer_get_timecounter(void);
> +
> +extern struct arch_timer_kvm_info *arch_timer_get_kvm_info(void);
>  
>  #else
>  
> @@ -67,11 +72,6 @@ static inline u64 arch_timer_read_counter(void)
>   return 0;
>  }
>  
> -static inline struct timecounter *arch_timer_get_timecounter(void)
> -{
> - return NULL;
> -}
> -

Most parts are OK. Regarding removing this function from the #else area,
is there a possibility to have CONFIG_ARM_ARCH_TIMER=n and CONFIG_KVM=y.
If so, will the compilation fails here?

-Wei

>  #endif
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 69bca18..a669c6a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -385,11 +385,11 @@ int kvm_timer_hyp_init(void)
>  {
>   struct device_node *np;
>   unsigned int ppi;
> + struct arch_timer_kvm_info *info;
>   int err;
>  
> - timecounter = arch_timer_get_timecounter();
> - if (!timecounter)
> - return -ENODEV;
> + info = arch_timer_get_kvm_info();
> + timecounter = >timecounter;
>  
>   np = of_find_matching_node(NULL, arch_timer_of_match);
>   if (!np) {
>

Re: [PATCH v2 1/6] KVM: arm/arm64: arch_timer: Gather KVM specific information in a structure

2016-02-18 Thread Wei Huang



On 02/11/2016 09:33 AM, Julien Grall wrote:
> Introduce a structure which are filled up by the arch timer driver and
> used by the virtual timer in KVM.
> 
> The first member of this structure will be the timecounter. More members
> will be added later.
> 
> This is also dropping arch_timer_get_timecounter as it was only used by
> the KVM code. Furthermore, a stub for the new helper hasn't been
> introduced because KVM is requiring the arch timer for both ARM64 and
> ARM32.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Gleb Natapov 
> Cc: Paolo Bonzini 
> ---
>  drivers/clocksource/arm_arch_timer.c |  9 +
>  include/clocksource/arm_arch_timer.h | 12 ++--
>  virt/kvm/arm/arch_timer.c|  6 +++---
>  3 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index c64d543..6eb2c5d 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -447,11 +447,11 @@ static struct cyclecounter cyclecounter = {
>   .mask   = CLOCKSOURCE_MASK(56),
>  };
>  
> -static struct timecounter timecounter;
> +static struct arch_timer_kvm_info arch_timer_kvm_info;
>  
> -struct timecounter *arch_timer_get_timecounter(void)
> +struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
>  {
> - return 
> + return _timer_kvm_info;
>  }
>  
>  static void __init arch_counter_register(unsigned type)
> @@ -479,7 +479,8 @@ static void __init arch_counter_register(unsigned type)
>   clocksource_register_hz(_counter, arch_timer_rate);
>   cyclecounter.mult = clocksource_counter.mult;
>   cyclecounter.shift = clocksource_counter.shift;
> - timecounter_init(, , start_count);
> + timecounter_init(_timer_kvm_info.timecounter,
> +  , start_count);
>  
>   /* 56 bits minimum, so we assume worst case rollover */
>   sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate);
> diff --git a/include/clocksource/arm_arch_timer.h 
> b/include/clocksource/arm_arch_timer.h
> index 25d0914..4d487f8 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -49,11 +49,16 @@ enum arch_timer_reg {
>  
>  #define ARCH_TIMER_EVT_STREAM_FREQ   1   /* 100us */
>  
> +struct arch_timer_kvm_info {
> + struct timecounter timecounter;
> +};
> +
>  #ifdef CONFIG_ARM_ARCH_TIMER
>  
>  extern u32 arch_timer_get_rate(void);
>  extern u64 (*arch_timer_read_counter)(void);
> -extern struct timecounter *arch_timer_get_timecounter(void);
> +
> +extern struct arch_timer_kvm_info *arch_timer_get_kvm_info(void);
>  
>  #else
>  
> @@ -67,11 +72,6 @@ static inline u64 arch_timer_read_counter(void)
>   return 0;
>  }
>  
> -static inline struct timecounter *arch_timer_get_timecounter(void)
> -{
> - return NULL;
> -}
> -

Most parts are OK. Regarding removing this function from the #else area,
is there a possibility to have CONFIG_ARM_ARCH_TIMER=n and CONFIG_KVM=y.
If so, will the compilation fails here?

-Wei

>  #endif
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 69bca18..a669c6a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -385,11 +385,11 @@ int kvm_timer_hyp_init(void)
>  {
>   struct device_node *np;
>   unsigned int ppi;
> + struct arch_timer_kvm_info *info;
>   int err;
>  
> - timecounter = arch_timer_get_timecounter();
> - if (!timecounter)
> - return -ENODEV;
> + info = arch_timer_get_kvm_info();
> + timecounter = >timecounter;
>  
>   np = of_find_matching_node(NULL, arch_timer_of_match);
>   if (!np) {
>

Re: [PATCH v2 0/6] arm64: Add support of KVM with ACPI

2016-02-18 Thread Wei Huang



On 02/11/2016 09:33 AM, Julien Grall wrote:
> Hello,
> 
> This small series allows an ARM64 ACPI based platform to use KVM.
> 
> Currently the KVM code has to parse the firmware table to get the necessary
> information to setup the virtual timer and virtual GIC.
> 
> However the parsing of those tables are already done in the GIC and arch
> timer drivers.
> 
> This patch series introduces different helpers to retrieve the information
> from different drivers avoiding to duplicate the parsing code.
> 
> Note there is patch series ([1] and [2]) adding support of KVM on ACPI,
> although the approach chosen is completely different. The code to parse
> the firmware tables are duplicated which I think make more complex to
> support new firmware tables.

I backported these patches to my internal tree. It booted on an ARM64
machine. Even though I haven't got the chance to test it on an GICv3
machine (will update later), I think you can add my name as Tested-by if
needed.

-Wei

> 
> See the changes since v1 in the different patches.
> 
> Regards,
> 
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-February/018482.html
> [2] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-February/018355.html
> 
> Julien Grall (6):
>   KVM: arm/arm64: arch_timer: Gather KVM specific information in a
> structure
>   KVM: arm/arm64: arch_timer: Rely on the arch timer to parse the
> firmware tables
>   irqchip/gic-v2: Gather ACPI specific data in a single structure
>   irqchip/gic-v2: Parse and export virtual GIC information
>   irqchip/gic-v3: Parse and export virtual GIC information
>   KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware
> tables
> 
>  drivers/clocksource/arm_arch_timer.c   | 11 ++--
>  drivers/irqchip/irq-gic-common.c   | 13 +
>  drivers/irqchip/irq-gic-common.h   |  3 ++
>  drivers/irqchip/irq-gic-v3.c   | 36 ++
>  drivers/irqchip/irq-gic.c  | 91 
> --
>  include/clocksource/arm_arch_timer.h   | 13 ++---
>  include/kvm/arm_vgic.h |  7 +--
>  include/linux/irqchip/arm-gic-common.h | 34 +
>  virt/kvm/arm/arch_timer.c  | 39 ---
>  virt/kvm/arm/vgic-v2.c | 67 +
>  virt/kvm/arm/vgic-v3.c | 45 +
>  virt/kvm/arm/vgic.c| 50 ++-
>  12 files changed, 264 insertions(+), 145 deletions(-)
>  create mode 100644 include/linux/irqchip/arm-gic-common.h
>

Re: [PATCH v2 0/6] arm64: Add support of KVM with ACPI

2016-02-18 Thread Wei Huang



On 02/11/2016 09:33 AM, Julien Grall wrote:
> Hello,
> 
> This small series allows an ARM64 ACPI based platform to use KVM.
> 
> Currently the KVM code has to parse the firmware table to get the necessary
> information to setup the virtual timer and virtual GIC.
> 
> However the parsing of those tables are already done in the GIC and arch
> timer drivers.
> 
> This patch series introduces different helpers to retrieve the information
> from different drivers avoiding to duplicate the parsing code.
> 
> Note there is patch series ([1] and [2]) adding support of KVM on ACPI,
> although the approach chosen is completely different. The code to parse
> the firmware tables are duplicated which I think make more complex to
> support new firmware tables.

I backported these patches to my internal tree. It booted on an ARM64
machine. Even though I haven't got the chance to test it on an GICv3
machine (will update later), I think you can add my name as Tested-by if
needed.

-Wei

> 
> See the changes since v1 in the different patches.
> 
> Regards,
> 
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-February/018482.html
> [2] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-February/018355.html
> 
> Julien Grall (6):
>   KVM: arm/arm64: arch_timer: Gather KVM specific information in a
> structure
>   KVM: arm/arm64: arch_timer: Rely on the arch timer to parse the
> firmware tables
>   irqchip/gic-v2: Gather ACPI specific data in a single structure
>   irqchip/gic-v2: Parse and export virtual GIC information
>   irqchip/gic-v3: Parse and export virtual GIC information
>   KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware
> tables
> 
>  drivers/clocksource/arm_arch_timer.c   | 11 ++--
>  drivers/irqchip/irq-gic-common.c   | 13 +
>  drivers/irqchip/irq-gic-common.h   |  3 ++
>  drivers/irqchip/irq-gic-v3.c   | 36 ++
>  drivers/irqchip/irq-gic.c  | 91 
> --
>  include/clocksource/arm_arch_timer.h   | 13 ++---
>  include/kvm/arm_vgic.h |  7 +--
>  include/linux/irqchip/arm-gic-common.h | 34 +
>  virt/kvm/arm/arch_timer.c  | 39 ---
>  virt/kvm/arm/vgic-v2.c | 67 +
>  virt/kvm/arm/vgic-v3.c | 45 +
>  virt/kvm/arm/vgic.c| 50 ++-
>  12 files changed, 264 insertions(+), 145 deletions(-)
>  create mode 100644 include/linux/irqchip/arm-gic-common.h
>

Re: [PATCH 3/5] irqchip/gic-v2: Parse and export virtual GIC information

2016-02-09 Thread Wei Huang



On 02/09/2016 02:49 PM, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 04:47:27PM +, Julien Grall wrote:
>> For now, the firmware tables are parsed 2 times: once in the GIC
>> drivers, the other timer when initializing the vGIC. It means code
>> duplication and make more tedious to add the support for another
>> firmware table (like ACPI).
>>
>> Introduce a new structure and set of helpers to get/set the virtual GIC
>> information. Also fill up the structure for GICv2.
>>
>> Signed-off-by: Julien Grall 
>> ---
>>
>> Cc: Thomas Gleixner 
>> Cc: Jason Cooper 
>> Cc: Marc Zyngier 
>>
>>  drivers/irqchip/irq-gic-common.c   | 13 ++
>>  drivers/irqchip/irq-gic-common.h   |  3 ++
>>  drivers/irqchip/irq-gic.c  | 78 
>> +-
>>  include/linux/irqchip/arm-gic-common.h | 34 +++
>>  4 files changed, 127 insertions(+), 1 deletion(-)
>>  create mode 100644 include/linux/irqchip/arm-gic-common.h
>>
>> diff --git a/drivers/irqchip/irq-gic-common.c 
>> b/drivers/irqchip/irq-gic-common.c
>> index f174ce0..704caf4 100644
>> --- a/drivers/irqchip/irq-gic-common.c
>> +++ b/drivers/irqchip/irq-gic-common.c
>> @@ -21,6 +21,19 @@
>>  
>>  #include "irq-gic-common.h"
>>  
>> +static const struct gic_kvm_info *gic_kvm_info;
>> +
>> +const struct gic_kvm_info *gic_get_kvm_info(void)
>> +{
>> +return gic_kvm_info;
>> +}
>> +
>> +void gic_set_kvm_info(const struct gic_kvm_info *info)
>> +{
>> +WARN(gic_kvm_info != NULL, "gic_kvm_info already set\n");
>> +gic_kvm_info = info;
>> +}
>> +
>>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>>  void *data)
>>  {
>> diff --git a/drivers/irqchip/irq-gic-common.h 
>> b/drivers/irqchip/irq-gic-common.h
>> index fff697d..205e5fd 100644
>> --- a/drivers/irqchip/irq-gic-common.h
>> +++ b/drivers/irqchip/irq-gic-common.h
>> @@ -19,6 +19,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  
>>  struct gic_quirk {
>>  const char *desc;
>> @@ -35,4 +36,6 @@ void gic_cpu_config(void __iomem *base, void 
>> (*sync_access)(void));
>>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>>  void *data);
>>  
>> +void gic_set_kvm_info(const struct gic_kvm_info *info);
>> +
>>  #endif /* _IRQ_GIC_COMMON_H */
>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>> index 911758c..d3a09a4 100644
>> --- a/drivers/irqchip/irq-gic.c
>> +++ b/drivers/irqchip/irq-gic.c
>> @@ -102,6 +102,8 @@ static struct static_key supports_deactivate = 
>> STATIC_KEY_INIT_TRUE;
>>  
>>  static struct gic_chip_data gic_data[CONFIG_ARM_GIC_MAX_NR] __read_mostly;
>>  
>> +static struct gic_kvm_info gic_v2_kvm_info;
>> +
>>  #ifdef CONFIG_GIC_NON_BANKED
>>  static void __iomem *gic_get_percpu_base(union gic_base *base)
>>  {
>> @@ -1190,6 +1192,44 @@ static bool gic_check_eoimode(struct device_node 
>> *node, void __iomem **base)
>>  return true;
>>  }
>>  
>> +static void __init gic_of_setup_kvm_info(struct device_node *node)
>> +{
>> +int ret;
>> +struct resource r;
>> +unsigned int irq;
>> +
>> +gic_v2_kvm_info.type = GIC_V2;
>> +
>> +irq = irq_of_parse_and_map(node, 0);
>> +if (!irq)
>> +gic_v2_kvm_info.maint_irq = -1;
>> +else
>> +gic_v2_kvm_info.maint_irq = irq;
>> +
>> +ret = of_address_to_resource(node, 2, );
>> +if (!ret) {
>> +gic_v2_kvm_info.vctrl_base = r.start;
>> +gic_v2_kvm_info.vctrl_size = resource_size();
>> +}
>> +
>> +ret = of_address_to_resource(node, 3, );
>> +if (!ret) {
>> +if (!PAGE_ALIGNED(r.start))
>> +pr_warn("GICV physical address 0x%llx not page 
>> aligned\n",
>> +(unsigned long long)r.start);
>> +else if (!PAGE_ALIGNED(resource_size()))
>> +pr_warn("GICV size 0x%llx not a multiple of page size 
>> 0x%lx\n",
>> +(unsigned long long)resource_size(),
>> +PAGE_SIZE);
>> +else {
>> +gic_v2_kvm_info.vcpu_base = r.start;
>> +gic_v2_kvm_info.vcpu_size = resource_size();
>> +}
>> +}
>> +
>> +gic_set_kvm_info(_v2_kvm_info);
>> +}
>> +
>>  int __init
>>  gic_of_init(struct device_node *node, struct device_node *parent)
>>  {
>> @@ -1219,8 +1259,10 @@ gic_of_init(struct device_node *node, struct 
>> device_node *parent)
>>  
>>  __gic_init_bases(gic_cnt, -1, dist_base, cpu_base, percpu_offset,
>>   >fwnode);
>> -if (!gic_cnt)
>> +if (!gic_cnt) {
>>  gic_init_physaddr(node);
>> +gic_of_setup_kvm_info(node);
>> +}
>>  
>>  if (parent) {
>>  irq = irq_of_parse_and_map(node, 0);
>> @@ -1247,6 +1289,32 @@ IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);
>>  
>>  #ifdef CONFIG_ACPI
>>  static phys_addr_t cpu_phy_base __initdata;
>> +static struct
>> +{
>> +u32

Re: [PATCH 3/5] irqchip/gic-v2: Parse and export virtual GIC information

2016-02-09 Thread Wei Huang



On 02/09/2016 02:49 PM, Christoffer Dall wrote:
> On Mon, Feb 08, 2016 at 04:47:27PM +, Julien Grall wrote:
>> For now, the firmware tables are parsed 2 times: once in the GIC
>> drivers, the other timer when initializing the vGIC. It means code
>> duplication and make more tedious to add the support for another
>> firmware table (like ACPI).
>>
>> Introduce a new structure and set of helpers to get/set the virtual GIC
>> information. Also fill up the structure for GICv2.
>>
>> Signed-off-by: Julien Grall 
>> ---
>>
>> Cc: Thomas Gleixner 
>> Cc: Jason Cooper 
>> Cc: Marc Zyngier 
>>
>>  drivers/irqchip/irq-gic-common.c   | 13 ++
>>  drivers/irqchip/irq-gic-common.h   |  3 ++
>>  drivers/irqchip/irq-gic.c  | 78 
>> +-
>>  include/linux/irqchip/arm-gic-common.h | 34 +++
>>  4 files changed, 127 insertions(+), 1 deletion(-)
>>  create mode 100644 include/linux/irqchip/arm-gic-common.h
>>
>> diff --git a/drivers/irqchip/irq-gic-common.c 
>> b/drivers/irqchip/irq-gic-common.c
>> index f174ce0..704caf4 100644
>> --- a/drivers/irqchip/irq-gic-common.c
>> +++ b/drivers/irqchip/irq-gic-common.c
>> @@ -21,6 +21,19 @@
>>  
>>  #include "irq-gic-common.h"
>>  
>> +static const struct gic_kvm_info *gic_kvm_info;
>> +
>> +const struct gic_kvm_info *gic_get_kvm_info(void)
>> +{
>> +return gic_kvm_info;
>> +}
>> +
>> +void gic_set_kvm_info(const struct gic_kvm_info *info)
>> +{
>> +WARN(gic_kvm_info != NULL, "gic_kvm_info already set\n");
>> +gic_kvm_info = info;
>> +}
>> +
>>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>>  void *data)
>>  {
>> diff --git a/drivers/irqchip/irq-gic-common.h 
>> b/drivers/irqchip/irq-gic-common.h
>> index fff697d..205e5fd 100644
>> --- a/drivers/irqchip/irq-gic-common.h
>> +++ b/drivers/irqchip/irq-gic-common.h
>> @@ -19,6 +19,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  
>>  struct gic_quirk {
>>  const char *desc;
>> @@ -35,4 +36,6 @@ void gic_cpu_config(void __iomem *base, void 
>> (*sync_access)(void));
>>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>>  void *data);
>>  
>> +void gic_set_kvm_info(const struct gic_kvm_info *info);
>> +
>>  #endif /* _IRQ_GIC_COMMON_H */
>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>> index 911758c..d3a09a4 100644
>> --- a/drivers/irqchip/irq-gic.c
>> +++ b/drivers/irqchip/irq-gic.c
>> @@ -102,6 +102,8 @@ static struct static_key supports_deactivate = 
>> STATIC_KEY_INIT_TRUE;
>>  
>>  static struct gic_chip_data gic_data[CONFIG_ARM_GIC_MAX_NR] __read_mostly;
>>  
>> +static struct gic_kvm_info gic_v2_kvm_info;
>> +
>>  #ifdef CONFIG_GIC_NON_BANKED
>>  static void __iomem *gic_get_percpu_base(union gic_base *base)
>>  {
>> @@ -1190,6 +1192,44 @@ static bool gic_check_eoimode(struct device_node 
>> *node, void __iomem **base)
>>  return true;
>>  }
>>  
>> +static void __init gic_of_setup_kvm_info(struct device_node *node)
>> +{
>> +int ret;
>> +struct resource r;
>> +unsigned int irq;
>> +
>> +gic_v2_kvm_info.type = GIC_V2;
>> +
>> +irq = irq_of_parse_and_map(node, 0);
>> +if (!irq)
>> +gic_v2_kvm_info.maint_irq = -1;
>> +else
>> +gic_v2_kvm_info.maint_irq = irq;
>> +
>> +ret = of_address_to_resource(node, 2, );
>> +if (!ret) {
>> +gic_v2_kvm_info.vctrl_base = r.start;
>> +gic_v2_kvm_info.vctrl_size = resource_size();
>> +}
>> +
>> +ret = of_address_to_resource(node, 3, );
>> +if (!ret) {
>> +if (!PAGE_ALIGNED(r.start))
>> +pr_warn("GICV physical address 0x%llx not page 
>> aligned\n",
>> +(unsigned long long)r.start);
>> +else if (!PAGE_ALIGNED(resource_size()))
>> +pr_warn("GICV size 0x%llx not a multiple of page size 
>> 0x%lx\n",
>> +(unsigned long long)resource_size(),
>> +PAGE_SIZE);
>> +else {
>> +gic_v2_kvm_info.vcpu_base = r.start;
>> +gic_v2_kvm_info.vcpu_size = resource_size();
>> +}
>> +}
>> +
>> +gic_set_kvm_info(_v2_kvm_info);
>> +}
>> +
>>  int __init
>>  gic_of_init(struct device_node *node, struct device_node *parent)
>>  {
>> @@ -1219,8 +1259,10 @@ gic_of_init(struct device_node *node, struct 
>> device_node *parent)
>>  
>>  __gic_init_bases(gic_cnt, -1, dist_base, cpu_base, percpu_offset,
>>   >fwnode);
>> -if (!gic_cnt)
>> +if (!gic_cnt) {
>>  gic_init_physaddr(node);
>> +gic_of_setup_kvm_info(node);
>> +}
>>  
>>  if (parent) {
>>  irq = irq_of_parse_and_map(node, 0);
>> @@ -1247,6 +1289,32 @@ IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);
>>  
>>  #ifdef

Re: [PATCH v3 1/7] acpi: Add early device probing infrastructure

2015-10-05 Thread Wei Huang



On 10/03/2015 05:04 AM, Marc Zyngier wrote:
> On Fri, 2 Oct 2015 16:06:05 -0500
> Wei Huang  wrote:
> 
> Hi Wei,
> 
>> Hi Marc,
> 
> [...]
> 
>>> +struct acpi_probe_entry {
>>> +   __u8 id[ACPI_TABLE_ID_LEN];
>>> +   __u8 type;
>>> +   acpi_probe_entry_validate_subtbl subtable_valid;
>>> +   union {
>>> +   acpi_tbl_table_handler probe_table;
>>> +   acpi_tbl_entry_handler probe_subtbl;
>>> +   };
>>
>> Could we avoid using union for probe_table & probe_subtbl? The benefit is 
>> that we don't need to do function casting below and compiler can 
>> automatically check the correctness.
>>
>>> +   kernel_ulong_t driver_data;
>>> +};
>>> +
>>> +#define ACPI_DECLARE_PROBE_ENTRY(table, name, table_id, subtable, valid, 
>>> data, fn) \
>>> +   static const struct acpi_probe_entry __acpi_probe_##name\
>>> +   __used __section(__##table##_acpi_probe_table)  \
>>> += {\
>>> +   .id = table_id, \
>>> +   .type = subtable,   \
>>> +   .subtable_valid = valid,\
>>> +   .probe_table = (acpi_tbl_table_handler)fn,  \
>>> +   .driver_data = data,\
>>> +  }
>>> +
>>
>> Something like: 
>>
>> #define ACPI_DECLARE_PROBE_ENTRY(table, name, table_id, subtable, valid, 
>> data, fn, subfn)\
>>  static const struct acpi_probe_entry __acpi_probe_##name\
>>  __used __section(__##table##_acpi_probe_table)  \
>>   = {\
>>  .id = table_id, \
>>  .type = subtable,   \
>>  .subtable_valid = valid,\
>>  .probe_table = fn,  \
>>  .probe_subtbl = subfn,  \
>>  .driver_data = data,\
>> }
>>
>> Then in patch 3, you can define new entries as:
>>
>> IRQCHIP_ACPI_DECLARE(gic_v2, ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR,
>>   gic_validate_dist, ACPI_MADT_GIC_VERSION_V2,
>>   NULL, gic_v2_acpi_init);
>> IRQCHIP_ACPI_DECLARE(gic_v2_maybe, ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR,
>>   gic_validate_dist, ACPI_MADT_GIC_VERSION_NONE,
>>   NULL, gic_v2_acpi_init);
>>
> 
> That's exactly what I was trying to avoid. If you want to do that, do
> it in the IRQCHIP_ACPI_DECLARE macro, as there is strictly no need for
> this this NULL to appear here (MADT always matches by subtable).
> 
> Or even better, have two ACPI_DECLARE* that populate the probe entry in
> a mutually exclusive way (either probe_table is set and both
> valid/subtbl are NULL, or probe_table is NULL and the two other fields
> are set).

Yes, this approach would be sufficient. So users can clearly tell them
apart in terms of usage cases.

Thanks,
-Wei

> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/7] acpi: Add early device probing infrastructure

2015-10-05 Thread Wei Huang



On 10/03/2015 05:04 AM, Marc Zyngier wrote:
> On Fri, 2 Oct 2015 16:06:05 -0500
> Wei Huang <w...@redhat.com> wrote:
> 
> Hi Wei,
> 
>> Hi Marc,
> 
> [...]
> 
>>> +struct acpi_probe_entry {
>>> +   __u8 id[ACPI_TABLE_ID_LEN];
>>> +   __u8 type;
>>> +   acpi_probe_entry_validate_subtbl subtable_valid;
>>> +   union {
>>> +   acpi_tbl_table_handler probe_table;
>>> +   acpi_tbl_entry_handler probe_subtbl;
>>> +   };
>>
>> Could we avoid using union for probe_table & probe_subtbl? The benefit is 
>> that we don't need to do function casting below and compiler can 
>> automatically check the correctness.
>>
>>> +   kernel_ulong_t driver_data;
>>> +};
>>> +
>>> +#define ACPI_DECLARE_PROBE_ENTRY(table, name, table_id, subtable, valid, 
>>> data, fn) \
>>> +   static const struct acpi_probe_entry __acpi_probe_##name\
>>> +   __used __section(__##table##_acpi_probe_table)  \
>>> += {\
>>> +   .id = table_id, \
>>> +   .type = subtable,   \
>>> +   .subtable_valid = valid,\
>>> +   .probe_table = (acpi_tbl_table_handler)fn,  \
>>> +   .driver_data = data,\
>>> +  }
>>> +
>>
>> Something like: 
>>
>> #define ACPI_DECLARE_PROBE_ENTRY(table, name, table_id, subtable, valid, 
>> data, fn, subfn)\
>>  static const struct acpi_probe_entry __acpi_probe_##name\
>>  __used __section(__##table##_acpi_probe_table)  \
>>   = {\
>>  .id = table_id, \
>>  .type = subtable,   \
>>  .subtable_valid = valid,\
>>  .probe_table = fn,  \
>>  .probe_subtbl = subfn,  \
>>  .driver_data = data,\
>> }
>>
>> Then in patch 3, you can define new entries as:
>>
>> IRQCHIP_ACPI_DECLARE(gic_v2, ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR,
>>   gic_validate_dist, ACPI_MADT_GIC_VERSION_V2,
>>   NULL, gic_v2_acpi_init);
>> IRQCHIP_ACPI_DECLARE(gic_v2_maybe, ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR,
>>   gic_validate_dist, ACPI_MADT_GIC_VERSION_NONE,
>>   NULL, gic_v2_acpi_init);
>>
> 
> That's exactly what I was trying to avoid. If you want to do that, do
> it in the IRQCHIP_ACPI_DECLARE macro, as there is strictly no need for
> this this NULL to appear here (MADT always matches by subtable).
> 
> Or even better, have two ACPI_DECLARE* that populate the probe entry in
> a mutually exclusive way (either probe_table is set and both
> valid/subtbl are NULL, or probe_table is NULL and the two other fields
> are set).

Yes, this approach would be sufficient. So users can clearly tell them
apart in terms of usage cases.

Thanks,
-Wei

> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/7] acpi: Add early device probing infrastructure

2015-10-02 Thread Wei Huang

Hi Marc,

On 09/28/2015 09:49 AM, Marc Zyngier wrote:
> IRQ controllers and timers are the two types of device the kernel
> requires before being able to use the device driver model.
> 
> ACPI so far lacks a proper probing infrastructure similar to the one
> we have with DT, where we're able to declare IRQ chips and
> clocksources inside the driver code, and let the core code pick it up
> and call us back on a match. This leads to all kind of really ugly
> hacks all over the arm64 code and even in the ACPI layer.
> 
> In order to allow some basic probing based on the ACPI tables,
> introduce "struct acpi_probe_entry" which contains just enough
> data and callbacks to match a table, an optional subtable, and
> call a probe function. A driver can, at build time, register itself
> and expect being called if the right entry exists in the ACPI
> table.
> 
> A acpi_probe_device_table() is provided, taking an identifier for
> a set of acpi_prove_entries, and iterating over the registered
> entries.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/acpi/scan.c   | 39 +++
>  include/asm-generic/vmlinux.lds.h | 10 ++
>  include/linux/acpi.h  | 66 
> +++
>  3 files changed, 115 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index f834b8c..daf9fc8 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1913,3 +1913,42 @@ int __init acpi_scan_init(void)
>   mutex_unlock(_scan_lock);
>   return result;
>  }
> +
> +static struct acpi_probe_entry *ape;
> +static int acpi_probe_count;
> +static DEFINE_SPINLOCK(acpi_probe_lock);
> +
> +static int __init acpi_match_madt(struct acpi_subtable_header *header,
> +   const unsigned long end)
> +{
> + if (!ape->subtable_valid || ape->subtable_valid(header, ape))
> + if (!ape->probe_subtbl(header, end))
> + acpi_probe_count++;
> +
> + return 0;
> +}
> +
> +int __init __acpi_probe_device_table(struct acpi_probe_entry *ap_head, int 
> nr)
> +{
> + int count = 0;
> +
> + if (acpi_disabled)
> + return 0;
> +
> + spin_lock(_probe_lock);
> + for (ape = ap_head; nr; ape++, nr--) {
> + if (ACPI_COMPARE_NAME(ACPI_SIG_MADT, ape->id)) {
> + acpi_probe_count = 0;
> + acpi_table_parse_madt(ape->type, acpi_match_madt, 0);
> + count += acpi_probe_count;
> + } else {
> + int res;
> + res = acpi_table_parse(ape->id, ape->probe_table);
> + if (!res)
> + count++;
> + }
> + }
> + spin_unlock(_probe_lock);
> +
> + return count;
> +}
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 1781e54..efd7ed1 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -181,6 +181,16 @@
>  #define CPUIDLE_METHOD_OF_TABLES() OF_TABLE(CONFIG_CPU_IDLE, cpuidle_method)
>  #define EARLYCON_OF_TABLES() OF_TABLE(CONFIG_SERIAL_EARLYCON, earlycon)
>  
> +#ifdef CONFIG_ACPI
> +#define ACPI_PROBE_TABLE(name)   
> \
> + . = ALIGN(8);   \
> + VMLINUX_SYMBOL(__##name##_acpi_probe_table) = .;\
> + *(__##name##_acpi_probe_table)  \
> + VMLINUX_SYMBOL(__##name##_acpi_probe_table_end) = .;
> +#else
> +#define ACPI_PROBE_TABLE(name)
> +#endif
> +
>  #define KERNEL_DTB() \
>   STRUCT_ALIGN(); \
>   VMLINUX_SYMBOL(__dtb_start) = .;\
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 84e7055..51a96a8 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -787,6 +787,61 @@ int acpi_dev_prop_read(struct acpi_device *adev, const 
> char *propname,
>  
>  struct fwnode_handle *acpi_get_next_subnode(struct device *dev,
>   struct fwnode_handle *subnode);
> +
> +struct acpi_probe_entry;
> +typedef bool (*acpi_probe_entry_validate_subtbl)(struct acpi_subtable_header 
> *,
> +  struct acpi_probe_entry *);
> +
> +#define ACPI_TABLE_ID_LEN5
> +
> +/**
> + * struct acpi_probe_entry - boot-time probing entry
> + * @id:  ACPI table name
> + * @type:Optional subtable type to match
> + *   (if @id contains subtables)
> + * @subtable_valid:  Optional callback to check the validity of
> + *   the subtable
> + * @probe_table: Callback to the driver being probed when table
> + *   match is successful
> + * @probe_subtbl:Callback to the driver being probed when

Re: [PATCH v3 1/7] acpi: Add early device probing infrastructure

2015-10-02 Thread Wei Huang

Hi Marc,

On 09/28/2015 09:49 AM, Marc Zyngier wrote:
> IRQ controllers and timers are the two types of device the kernel
> requires before being able to use the device driver model.
> 
> ACPI so far lacks a proper probing infrastructure similar to the one
> we have with DT, where we're able to declare IRQ chips and
> clocksources inside the driver code, and let the core code pick it up
> and call us back on a match. This leads to all kind of really ugly
> hacks all over the arm64 code and even in the ACPI layer.
> 
> In order to allow some basic probing based on the ACPI tables,
> introduce "struct acpi_probe_entry" which contains just enough
> data and callbacks to match a table, an optional subtable, and
> call a probe function. A driver can, at build time, register itself
> and expect being called if the right entry exists in the ACPI
> table.
> 
> A acpi_probe_device_table() is provided, taking an identifier for
> a set of acpi_prove_entries, and iterating over the registered
> entries.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/acpi/scan.c   | 39 +++
>  include/asm-generic/vmlinux.lds.h | 10 ++
>  include/linux/acpi.h  | 66 
> +++
>  3 files changed, 115 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index f834b8c..daf9fc8 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1913,3 +1913,42 @@ int __init acpi_scan_init(void)
>   mutex_unlock(_scan_lock);
>   return result;
>  }
> +
> +static struct acpi_probe_entry *ape;
> +static int acpi_probe_count;
> +static DEFINE_SPINLOCK(acpi_probe_lock);
> +
> +static int __init acpi_match_madt(struct acpi_subtable_header *header,
> +   const unsigned long end)
> +{
> + if (!ape->subtable_valid || ape->subtable_valid(header, ape))
> + if (!ape->probe_subtbl(header, end))
> + acpi_probe_count++;
> +
> + return 0;
> +}
> +
> +int __init __acpi_probe_device_table(struct acpi_probe_entry *ap_head, int 
> nr)
> +{
> + int count = 0;
> +
> + if (acpi_disabled)
> + return 0;
> +
> + spin_lock(_probe_lock);
> + for (ape = ap_head; nr; ape++, nr--) {
> + if (ACPI_COMPARE_NAME(ACPI_SIG_MADT, ape->id)) {
> + acpi_probe_count = 0;
> + acpi_table_parse_madt(ape->type, acpi_match_madt, 0);
> + count += acpi_probe_count;
> + } else {
> + int res;
> + res = acpi_table_parse(ape->id, ape->probe_table);
> + if (!res)
> + count++;
> + }
> + }
> + spin_unlock(_probe_lock);
> +
> + return count;
> +}
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 1781e54..efd7ed1 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -181,6 +181,16 @@
>  #define CPUIDLE_METHOD_OF_TABLES() OF_TABLE(CONFIG_CPU_IDLE, cpuidle_method)
>  #define EARLYCON_OF_TABLES() OF_TABLE(CONFIG_SERIAL_EARLYCON, earlycon)
>  
> +#ifdef CONFIG_ACPI
> +#define ACPI_PROBE_TABLE(name)   
> \
> + . = ALIGN(8);   \
> + VMLINUX_SYMBOL(__##name##_acpi_probe_table) = .;\
> + *(__##name##_acpi_probe_table)  \
> + VMLINUX_SYMBOL(__##name##_acpi_probe_table_end) = .;
> +#else
> +#define ACPI_PROBE_TABLE(name)
> +#endif
> +
>  #define KERNEL_DTB() \
>   STRUCT_ALIGN(); \
>   VMLINUX_SYMBOL(__dtb_start) = .;\
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 84e7055..51a96a8 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -787,6 +787,61 @@ int acpi_dev_prop_read(struct acpi_device *adev, const 
> char *propname,
>  
>  struct fwnode_handle *acpi_get_next_subnode(struct device *dev,
>   struct fwnode_handle *subnode);
> +
> +struct acpi_probe_entry;
> +typedef bool (*acpi_probe_entry_validate_subtbl)(struct acpi_subtable_header 
> *,
> +  struct acpi_probe_entry *);
> +
> +#define ACPI_TABLE_ID_LEN5
> +
> +/**
> + * struct acpi_probe_entry - boot-time probing entry
> + * @id:  ACPI table name
> + * @type:Optional subtable type to match
> + *   (if @id contains subtables)
> + * @subtable_valid:  Optional callback to check the validity of
> + *   the subtable
> + * @probe_table: Callback to the driver being probed when table
> + *   match is successful
> + * @probe_subtbl:Callback to the driver

Re: [KVM x86 vPMU Patch 0/2] Two vPMU Trivial Patches

2015-08-11 Thread Wei Huang




On 8/11/15 08:21, Paolo Bonzini wrote:



On 07/08/2015 21:53, Wei Huang wrote:

These two trivial patches are related to x86 vPMU code. They were
actually suggested by Andrew Jones while he was reviewing the last
big vPMU patch set.

These patches have been compiled and tested on AMD system using
a 64-bit guest VM with various perf commands (e.g. bench, test, top,
stat). No obvious problems were found.

Thanks,
-Wei

Wei Huang (2):
   KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific
 files
   KVM: x86/vPMU: Fix unnecessary signed extesion for AMD PERFCTRn

  arch/x86/kvm/pmu.h | 2 --
  arch/x86/kvm/pmu_amd.c | 2 --
  arch/x86/kvm/svm.c | 1 +
  arch/x86/kvm/vmx.c | 1 +
  4 files changed, 2 insertions(+), 4 deletions(-)



Applied patch 2.  For patch 1 I'm not sure, because I do not really like
1) externs in .c files; 2) globals with no declarations in a .h file.
So I'm leaving it out while I think more about it.

Thanks. The first one is minor anyway. I won't complain about it. :-)

-Wei



Paolo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KVM x86 vPMU Patch 0/2] Two vPMU Trivial Patches

2015-08-11 Thread Wei Huang




On 8/11/15 08:21, Paolo Bonzini wrote:



On 07/08/2015 21:53, Wei Huang wrote:

These two trivial patches are related to x86 vPMU code. They were
actually suggested by Andrew Jones while he was reviewing the last
big vPMU patch set.

These patches have been compiled and tested on AMD system using
a 64-bit guest VM with various perf commands (e.g. bench, test, top,
stat). No obvious problems were found.

Thanks,
-Wei

Wei Huang (2):
   KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific
 files
   KVM: x86/vPMU: Fix unnecessary signed extesion for AMD PERFCTRn

  arch/x86/kvm/pmu.h | 2 --
  arch/x86/kvm/pmu_amd.c | 2 --
  arch/x86/kvm/svm.c | 1 +
  arch/x86/kvm/vmx.c | 1 +
  4 files changed, 2 insertions(+), 4 deletions(-)



Applied patch 2.  For patch 1 I'm not sure, because I do not really like
1) externs in .c files; 2) globals with no declarations in a .h file.
So I'm leaving it out while I think more about it.

Thanks. The first one is minor anyway. I won't complain about it. :-)

-Wei



Paolo


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 1/2] KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific files

2015-08-07 Thread Wei Huang

Instead of being defined in a common header file, the kvm_pmu_ops struct
is arch (vmx/svm) specific. This trivial patch relocates two extern
variable definition to their arch-specific files.

Signed-off-by: Wei Huang 
---
 arch/x86/kvm/pmu.h | 2 --
 arch/x86/kvm/svm.c | 1 +
 arch/x86/kvm/vmx.c | 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index f96e1f9..95184fd 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -113,6 +113,4 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 
-extern struct kvm_pmu_ops intel_pmu_ops;
-extern struct kvm_pmu_ops amd_pmu_ops;
 #endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8e0c084..8abf980 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4452,6 +4452,7 @@ static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 }
 
+extern struct kvm_pmu_ops amd_pmu_ops;
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..6b2419d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10302,6 +10302,7 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm 
*kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
+extern struct kvm_pmu_ops intel_pmu_ops;
 static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 2/2] KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn

2015-08-07 Thread Wei Huang

According to AMD programmer's manual, AMD PERFCTRn is 64-bit MSR which,
unlike Intel perf counters, doesn't require signed extension. This
patch removes the unnecessary conversion in SVM vPMU code when PERFCTRn
is being updated.

Signed-off-by: Wei Huang 
---
 arch/x86/kvm/pmu_amd.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 886aa25..39b9112 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -133,8 +133,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
/* MSR_K7_PERFCTRn */
pmc = get_gp_pmc(pmu, msr, MSR_K7_PERFCTR0);
if (pmc) {
-   if (!msr_info->host_initiated)
-   data = (s64)data;
pmc->counter += data - pmc_read_counter(pmc);
return 0;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 0/2] Two vPMU Trivial Patches

2015-08-07 Thread Wei Huang

These two trivial patches are related to x86 vPMU code. They were
actually suggested by Andrew Jones while he was reviewing the last
big vPMU patch set.

These patches have been compiled and tested on AMD system using
a 64-bit guest VM with various perf commands (e.g. bench, test, top, 
stat). No obvious problems were found.

Thanks,
-Wei

Wei Huang (2):
  KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific
files
  KVM: x86/vPMU: Fix unnecessary signed extesion for AMD PERFCTRn

 arch/x86/kvm/pmu.h | 2 --
 arch/x86/kvm/pmu_amd.c | 2 --
 arch/x86/kvm/svm.c | 1 +
 arch/x86/kvm/vmx.c | 1 +
 4 files changed, 2 insertions(+), 4 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 0/2] Two vPMU Trivial Patches

2015-08-07 Thread Wei Huang

These two trivial patches are related to x86 vPMU code. They were
actually suggested by Andrew Jones while he was reviewing the last
big vPMU patch set.

These patches have been compiled and tested on AMD system using
a 64-bit guest VM with various perf commands (e.g. bench, test, top, 
stat). No obvious problems were found.

Thanks,
-Wei

Wei Huang (2):
  KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific
files
  KVM: x86/vPMU: Fix unnecessary signed extesion for AMD PERFCTRn

 arch/x86/kvm/pmu.h | 2 --
 arch/x86/kvm/pmu_amd.c | 2 --
 arch/x86/kvm/svm.c | 1 +
 arch/x86/kvm/vmx.c | 1 +
 4 files changed, 2 insertions(+), 4 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 2/2] KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn

2015-08-07 Thread Wei Huang

According to AMD programmer's manual, AMD PERFCTRn is 64-bit MSR which,
unlike Intel perf counters, doesn't require signed extension. This
patch removes the unnecessary conversion in SVM vPMU code when PERFCTRn
is being updated.

Signed-off-by: Wei Huang w...@redhat.com
---
 arch/x86/kvm/pmu_amd.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 886aa25..39b9112 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -133,8 +133,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
/* MSR_K7_PERFCTRn */
pmc = get_gp_pmc(pmu, msr, MSR_K7_PERFCTR0);
if (pmc) {
-   if (!msr_info-host_initiated)
-   data = (s64)data;
pmc-counter += data - pmc_read_counter(pmc);
return 0;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KVM x86 vPMU Patch 1/2] KVM: x86/vPMU: Move the definition of kvm_pmu_ops to arch-specific files

2015-08-07 Thread Wei Huang

Instead of being defined in a common header file, the kvm_pmu_ops struct
is arch (vmx/svm) specific. This trivial patch relocates two extern
variable definition to their arch-specific files.

Signed-off-by: Wei Huang w...@redhat.com
---
 arch/x86/kvm/pmu.h | 2 --
 arch/x86/kvm/svm.c | 1 +
 arch/x86/kvm/vmx.c | 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index f96e1f9..95184fd 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -113,6 +113,4 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 
-extern struct kvm_pmu_ops intel_pmu_ops;
-extern struct kvm_pmu_ops amd_pmu_ops;
 #endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8e0c084..8abf980 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4452,6 +4452,7 @@ static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 }
 
+extern struct kvm_pmu_ops amd_pmu_ops;
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..6b2419d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10302,6 +10302,7 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm 
*kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
+extern struct kvm_pmu_ops intel_pmu_ops;
 static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the kvm-arm tree with the arm64 tree

2015-01-22 Thread Wei Huang

On 01/22/2015 02:51 AM, Marc Zyngier wrote:
> On Thu, Jan 22 2015 at  5:07:04 am GMT, Stephen Rothwell 
>  wrote:
> 
> Hi Stephen,
> 
>> Today's linux-next merge of the kvm-arm tree got a conflict in
>> arch/arm64/include/asm/kvm_arm.h between commit 6e53031ed840 ("arm64:
>> kvm: remove ESR_EL2_* macros") from the arm64 tree and commit
>> 0d97f8848104 ("arm/arm64: KVM: add tracing support for arm64 exit
>> handler") from the kvm-arm tree.
>>
>> I fixed it up (see below, but this probably requires more work) and can
>> carry the fix as necessary (no action is required).
> 
> Thanks for dealing with this. I think the following patch should be
> applied on top of your resolution, making the new macro part of the
> asm/esr.h file.
> 
> Mark, Wei: does it match your expectations?
Looks good to me. Thanks for handling this issue.

-Wei

> 
> Thanks,
> 
>   M.
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 6216709..92bbae3 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -96,6 +96,7 @@
>  #define ESR_ELx_COND_SHIFT   (20)
>  #define ESR_ELx_COND_MASK(UL(0xF) << ESR_ELx_COND_SHIFT)
>  #define ESR_ELx_WFx_ISS_WFE  (UL(1) << 0)
> +#define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
>  
>  #ifndef __ASSEMBLY__
>  #include 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 53fbc1e..94674eb 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -192,6 +192,4 @@
>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>  #define HPFAR_MASK   (~UL(0xf))
>  
> -#define ESR_EL2_HVC_IMM_MASK ((1UL << 16) - 1)
> -
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index b861ff6..bbc17cd 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -133,7 +133,7 @@ static inline phys_addr_t kvm_vcpu_get_fault_ipa(const 
> struct kvm_vcpu *vcpu)
>  
>  static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
>  {
> - return kvm_vcpu_get_hsr(vcpu) & ESR_EL2_HVC_IMM_MASK;
> + return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_xVC_IMM_MASK;
>  }
>  
>  static inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the kvm-arm tree with the arm64 tree

2015-01-22 Thread Wei Huang

On 01/22/2015 02:51 AM, Marc Zyngier wrote:
 On Thu, Jan 22 2015 at  5:07:04 am GMT, Stephen Rothwell 
 s...@canb.auug.org.au wrote:
 
 Hi Stephen,
 
 Today's linux-next merge of the kvm-arm tree got a conflict in
 arch/arm64/include/asm/kvm_arm.h between commit 6e53031ed840 (arm64:
 kvm: remove ESR_EL2_* macros) from the arm64 tree and commit
 0d97f8848104 (arm/arm64: KVM: add tracing support for arm64 exit
 handler) from the kvm-arm tree.

 I fixed it up (see below, but this probably requires more work) and can
 carry the fix as necessary (no action is required).
 
 Thanks for dealing with this. I think the following patch should be
 applied on top of your resolution, making the new macro part of the
 asm/esr.h file.
 
 Mark, Wei: does it match your expectations?
Looks good to me. Thanks for handling this issue.

-Wei

 
 Thanks,
 
   M.
 
 diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
 index 6216709..92bbae3 100644
 --- a/arch/arm64/include/asm/esr.h
 +++ b/arch/arm64/include/asm/esr.h
 @@ -96,6 +96,7 @@
  #define ESR_ELx_COND_SHIFT   (20)
  #define ESR_ELx_COND_MASK(UL(0xF)  ESR_ELx_COND_SHIFT)
  #define ESR_ELx_WFx_ISS_WFE  (UL(1)  0)
 +#define ESR_ELx_xVC_IMM_MASK ((1UL  16) - 1)
  
  #ifndef __ASSEMBLY__
  #include asm/types.h
 diff --git a/arch/arm64/include/asm/kvm_arm.h 
 b/arch/arm64/include/asm/kvm_arm.h
 index 53fbc1e..94674eb 100644
 --- a/arch/arm64/include/asm/kvm_arm.h
 +++ b/arch/arm64/include/asm/kvm_arm.h
 @@ -192,6 +192,4 @@
  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
  #define HPFAR_MASK   (~UL(0xf))
  
 -#define ESR_EL2_HVC_IMM_MASK ((1UL  16) - 1)
 -
  #endif /* __ARM64_KVM_ARM_H__ */
 diff --git a/arch/arm64/include/asm/kvm_emulate.h 
 b/arch/arm64/include/asm/kvm_emulate.h
 index b861ff6..bbc17cd 100644
 --- a/arch/arm64/include/asm/kvm_emulate.h
 +++ b/arch/arm64/include/asm/kvm_emulate.h
 @@ -133,7 +133,7 @@ static inline phys_addr_t kvm_vcpu_get_fault_ipa(const 
 struct kvm_vcpu *vcpu)
  
  static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
  {
 - return kvm_vcpu_get_hsr(vcpu)  ESR_EL2_HVC_IMM_MASK;
 + return kvm_vcpu_get_hsr(vcpu)  ESR_ELx_xVC_IMM_MASK;
  }
  
  static inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu)
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment

2014-10-02 Thread tip-bot for Wei Huang

Commit-ID:  cc6cd47e7395bc05c5077009808b820633eb3f18
Gitweb: http://git.kernel.org/tip/cc6cd47e7395bc05c5077009808b820633eb3f18
Author: Wei Huang 
AuthorDate: Wed, 24 Sep 2014 22:55:14 -0500
Committer:  Ingo Molnar 
CommitDate: Fri, 3 Oct 2014 06:04:41 +0200

perf/x86: Tone down kernel messages when the PMU check fails in a virtual 
environment

PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.

Signed-off-by: Wei Huang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/perf_event.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 918d75f..16c7302 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,7 +243,8 @@ static bool check_hw_exists(void)
 
 msr_fail:
printk(KERN_CONT "Broken PMU hardware detected, using software events 
only.\n");
-   printk(KERN_ERR "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, 
val_new);
+   printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
+  "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);
 
return false;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment

2014-10-02 Thread tip-bot for Wei Huang

Commit-ID:  cc6cd47e7395bc05c5077009808b820633eb3f18
Gitweb: http://git.kernel.org/tip/cc6cd47e7395bc05c5077009808b820633eb3f18
Author: Wei Huang w...@redhat.com
AuthorDate: Wed, 24 Sep 2014 22:55:14 -0500
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 3 Oct 2014 06:04:41 +0200

perf/x86: Tone down kernel messages when the PMU check fails in a virtual 
environment

PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.

Signed-off-by: Wei Huang w...@redhat.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Arnaldo Carvalho de Melo a...@kernel.org
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kernel/cpu/perf_event.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 918d75f..16c7302 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,7 +243,8 @@ static bool check_hw_exists(void)
 
 msr_fail:
printk(KERN_CONT Broken PMU hardware detected, using software events 
only.\n);
-   printk(KERN_ERR Failed to access perfctr msr (MSR %x is %Lx)\n, reg, 
val_new);
+   printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
+  Failed to access perfctr msr (MSR %x is %Lx)\n, reg, val_new);
 
return false;
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] perf/x86: Use KERN_INFO when checking PMU fails on virtual environment

2014-09-29 Thread Wei Huang


Hi Ingo, tglx and hpa,

Any comment on this patch? Thanks.

-Wei

On 09/24/2014 10:55 PM, Wei Huang wrote:

PMU checking can fail due to various reasons. On native machine,
this is mostly caused by faulty hardware and it is reasonable to
use KERN_ERR in reporting. However, when kernel is running on
virtualized environment, this checking can fail if virtual PMU is
not supported (e.g. KVM on AMD host). It is annoying to see
an error message on splash screen, even though we know such failure
is benign on virtualized environment.

This patch checks if kernel is running in virtualized environment.
If so, it will use KERN_INFO in reporting. This patch was tested
successfully on KVM.

Signed-off-by: Wei Huang 
---
  arch/x86/kernel/cpu/perf_event.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 918d75f..16c7302 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,7 +243,8 @@ static bool check_hw_exists(void)

  msr_fail:
printk(KERN_CONT "Broken PMU hardware detected, using software events 
only.\n");
-   printk(KERN_ERR "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, 
val_new);
+   printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
+  "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);

return false;
  }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 115 matches

Mail list logo