Hi,

please add the upstream commit 338b522ca43cfd32d11a370f4203bcd089c6c877
("perf/x86/intel: Protect LBR and extra_regs against KVM lying") to
-stable. (mainly 3.14, but it affects any kernel from 3.12 to 3.15)

This commit fixes a kernel crash that happens very reliably inside a Qemu
guest, where the host has Intel CPU, and "-cpu host" is given to the
command line. Relevant stack trace is like the following:
(which was originally reported by my colleage Mohammed Gamal)

====
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.26-1-pserver #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88013aa78000 ti: ffff88013aa54000 task.ti: ffff88013aa54000
RIP: 0010:[<ffffffff81ce444d>]  [<ffffffff81ce444d>] intel_pmu_init+0x2f1/0x921
RSP: 0000:ffff88013aa55e28  EFLAGS: 00000202
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000345
RDX: 0000000000000003 RSI: 0000000000000730 RDI: 0000ffffffffffff
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000007
R10: 0000000000000001 R11: ffffffff81cbb160 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88013ffff000 CR3: 0000000001c0b000 CR4: 00000000001406f0
Stack:
 ffffffff81cbdd60 ffffffff81ce3437 0000000000000001 ffffffff81ce3466 
 0000000000000000 ffffffff81cdf291 ffffffff81ce3437 0000000000000001
 0000000000000000 0000000000000000 0000000000000000 ffffffff8100021a 
Call Trace:
 [<ffffffff81ce3437>] ? check_bugs+0x2e/0x2e  
 [<ffffffff81ce3466>] ? init_hw_perf_events+0x2f/0x4e1
 [<ffffffff81cdf291>] ? set_real_mode_permissions+0x93/0x9e
 [<ffffffff81ce3437>] ? check_bugs+0x2e/0x2e
 [<ffffffff8100021a>] ? do_one_initcall+0x4a/0x170
 [<ffffffff8109f46f>] ? clockevents_register_device+0xdf/0x170
 [<ffffffff81ce90a9>] ? native_smp_prepare_cpus+0x35d/0x389
 [<ffffffff81cdc8a3>] ? kernel_init_freeable+0x95/0x1c6
 [<ffffffff81709920>] ? rest_init+0x80/0x80
 [<ffffffff81709929>] ? kernel_init+0x9/0xf0
 [<ffffffff8171d27c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff81709920>] ? rest_init+0x80/0x80
Code: 6d fd ff 44 89 0d bc 6d fd ff 89 0d 76 6e fd ff 7e
2b 83 e2 1f b8 03 00 00 00 b9 45 03   00 00 83 fa 02 0f 4f c2 89 05 83
6d fd ff <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 89 15 21 6e fd ff e8 1c 67
 RIP  [<ffffffff81ce444d>] intel_pmu_init+0x2f1/0x921
 RSP <ffff88013aa55e28>
---[ end trace caccfda5c953b0c5 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
====

I'm already aware that this commit is longer than 100 lines, which is
not ideal for -stable. However, without this commit, a guest kernel
crashes every time. Given that it's actually a serious issue, please
consider taking this patch. I'm not aware of any alternative, such as
a simpler commit to fix the bug.

Sidenote: I also tried to backport this commit to 3.10, but no luck.
3.10 kernel crashes no matter whether this fix is included or not.

Thanks,
Dongsu


====
>From 338b522ca43cfd32d11a370f4203bcd089c6c877 Mon Sep 17 00:00:00 2001
From: Kan Liang <[email protected]>
Date: Mon, 14 Jul 2014 12:25:56 -0700
Subject: [PATCH] perf/x86/intel: Protect LBR and extra_regs against KVM lying

With -cpu host, KVM reports LBR and extra_regs support, if the host has
support.

When the guest perf driver tries to access LBR or extra_regs MSR,
it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
So check the related MSRs access right once at initialization time to avoid
the error access at runtime.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y
(for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
Start the guest with -cpu host.
Run perf record with --branch-any or --branch-filter in guest to trigger LBR
Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to
trigger offcore_rsp #GP

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Maria Dimakopoulou <[email protected]>
Cc: Mark Davies <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yan, Zheng <[email protected]>
Link: 
http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
 arch/x86/kernel/cpu/perf_event.c       |  3 ++
 arch/x86/kernel/cpu/perf_event.h       | 12 ++++---
 arch/x86/kernel/cpu/perf_event_intel.c | 66 +++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2bdfbff..2879ecd 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event 
*event)
                        continue;
                if (event->attr.config1 & ~er->valid_mask)
                        return -EINVAL;
+               /* Check if the extra msrs can be safely accessed*/
+               if (!er->extra_msr_access)
+                       return -ENXIO;
 
                reg->idx = er->idx;
                reg->config = event->attr.config1;
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3b2f9bd..8ade931 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -295,14 +295,16 @@ struct extra_reg {
        u64                     config_mask;
        u64                     valid_mask;
        int                     idx;  /* per_xxx->regs[] reg index */
+       bool                    extra_msr_access;
 };
 
 #define EVENT_EXTRA_REG(e, ms, m, vm, i) {     \
-       .event = (e),           \
-       .msr = (ms),            \
-       .config_mask = (m),     \
-       .valid_mask = (vm),     \
-       .idx = EXTRA_REG_##i,   \
+       .event = (e),                   \
+       .msr = (ms),                    \
+       .config_mask = (m),             \
+       .valid_mask = (vm),             \
+       .idx = EXTRA_REG_##i,           \
+       .extra_msr_access = true,       \
        }
 
 #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx)     \
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index c206815..2502d0d 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2182,6 +2182,41 @@ static void intel_snb_check_microcode(void)
        }
 }
 
+/*
+ * Under certain circumstances, access certain MSR may cause #GP.
+ * The function tests if the input MSR can be safely accessed.
+ */
+static bool check_msr(unsigned long msr, u64 mask)
+{
+       u64 val_old, val_new, val_tmp;
+
+       /*
+        * Read the current value, change it and read it back to see if it
+        * matches, this is needed to detect certain hardware emulators
+        * (qemu/kvm) that don't trap on the MSR access and always return 0s.
+        */
+       if (rdmsrl_safe(msr, &val_old))
+               return false;
+
+       /*
+        * Only change the bits which can be updated by wrmsrl.
+        */
+       val_tmp = val_old ^ mask;
+       if (wrmsrl_safe(msr, val_tmp) ||
+           rdmsrl_safe(msr, &val_new))
+               return false;
+
+       if (val_new != val_tmp)
+               return false;
+
+       /* Here it's sure that the MSR can be safely accessed.
+        * Restore the old value and return.
+        */
+       wrmsrl(msr, val_old);
+
+       return true;
+}
+
 static __init void intel_sandybridge_quirk(void)
 {
        x86_pmu.check_microcode = intel_snb_check_microcode;
@@ -2271,7 +2306,8 @@ __init int intel_pmu_init(void)
        union cpuid10_ebx ebx;
        struct event_constraint *c;
        unsigned int unused;
-       int version;
+       struct extra_reg *er;
+       int version, i;
 
        if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
                switch (boot_cpu_data.x86) {
@@ -2577,5 +2613,33 @@ __init int intel_pmu_init(void)
                }
        }
 
+       /*
+        * Access LBR MSR may cause #GP under certain circumstances.
+        * E.g. KVM doesn't support LBR MSR
+        * Check all LBT MSR here.
+        * Disable LBR access if any LBR MSRs can not be accessed.
+        */
+       if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+               x86_pmu.lbr_nr = 0;
+       for (i = 0; i < x86_pmu.lbr_nr; i++) {
+               if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
+                     check_msr(x86_pmu.lbr_to + i, 0xffffUL)))
+                       x86_pmu.lbr_nr = 0;
+       }
+
+       /*
+        * Access extra MSR may cause #GP under certain circumstances.
+        * E.g. KVM doesn't support offcore event
+        * Check all extra_regs here.
+        */
+       if (x86_pmu.extra_regs) {
+               for (er = x86_pmu.extra_regs; er->msr; er++) {
+                       er->extra_msr_access = check_msr(er->msr, 0x1ffUL);
+                       /* Disable LBR select mapping */
+                       if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
+                               x86_pmu.lbr_sel_map = NULL;
+               }
+       }
+
        return 0;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to