Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
Maybe no contract ... but a bunch of places (many of them in Intel specific code) that check for it Interestingly, quite a few of them are actually checking for HYPERVISOR not because of missing hypervisor features, but rather because hypervisors don't have to work around certain errata. :) Full analysis after my sig, but tl;dr: the only case of using HYPERVISOR before using MSRs are in arch/x86/events/intel/cstate.c and arch/x86/events/intel/uncore.c. There are some workarounds in drivers/gpu that might fall into a similar bucket. But as far as MSRs go, the way to go overwhelmingly seems to be {rd,wr}msrl_safe. Thanks, Paolo On 10/11/20 00:36, Luck, Tony wrote: arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { Print a slightly less frightening warning. arch/x86/events/intel/core.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) Working around KVM's ignore_msrs=1 option (and quite ugly; shows that the option shouldn't be enabled by default). arch/x86/events/intel/core.c: int assume = 3 * !boot_cpu_has(X86_FEATURE_HYPERVISOR); Seems unnecessary. arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) Too complicated. :) arch/x86/kernel/apic/apic.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) Hypervisors don't have errata. arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/intel.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) Print different vulnerability status in sysfs. arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), vmware_hypercall_mode arch/x86/kernel/cpu/vmware.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/jailhouse.c:!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/kvm.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/paravirt.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) Obviously needed before using paravirt features of the hypervisor. arch/x86/kernel/tsc.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || Disables ART in VMs. Probably the idea is that ART does not have an offset field similar to the TSC's, but it's not necessary. Looking at the hypervisor-provided CPUID should be enough. arch/x86/mm/init_64.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { Tweaks hotplug heuristics, no MSRs involved. drivers/acpi/processor_idle.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) Avoids pointless hypervisor exit on idle (i.e. just an optimization). drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h: return boot_cpu_has(X86_FEATURE_HYPERVISOR); Work around SR-IOV bugs. drivers/gpu/drm/i915/i915_memcpy.c: !boot_cpu_has(X86_FEATURE_HYPERVISOR)) Work around KVM deficiency. drivers/gpu/drm/radeon/radeon_device.c: return boot_cpu_has(X86_FEATURE_HYPERVISOR); Work around SR-IOV bugs. drivers/visorbus/visorchipset.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { Needed before using paravirt features of the hypervisor.
RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
> I wouldn't expect all hypervisors to necessarily set CPUID.01H:ECX[bit > 31]. Architecturally, on Intel CPUs, that bit is simply defined as > "not used." There is no documented contract between Intel and > hypervisor vendors regarding the use of that bit. (AMD, on the other > hand, *does* document that bit as "reserved for use by hypervisor to > indicate guest status.") Maybe no contract ... but a bunch of places (many of them in Intel specific code) that check for it. Perhaps I should poke the owners of the Intel SDM to accept the inevitable. $ git grep "boot_cpu_has(X86_FEATURE_HYPERVISOR" arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/events/intel/core.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/events/intel/core.c: int assume = 3 * !boot_cpu_has(X86_FEATURE_HYPERVISOR); arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/apic/apic.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/intel.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), vmware_hypercall_mode arch/x86/kernel/cpu/vmware.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/jailhouse.c:!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/kvm.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/paravirt.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/tsc.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || arch/x86/mm/init_64.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { drivers/acpi/processor_idle.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h: return boot_cpu_has(X86_FEATURE_HYPERVISOR); drivers/gpu/drm/i915/i915_memcpy.c: !boot_cpu_has(X86_FEATURE_HYPERVISOR)) drivers/gpu/drm/radeon/radeon_device.c: return boot_cpu_has(X86_FEATURE_HYPERVISOR); drivers/visorbus/visorchipset.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { -Tony
Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
On Mon, Nov 9, 2020 at 2:57 PM Luck, Tony wrote: > > > I thought Linux had long ago gone the route of turning rdmsr/wrmsr > > into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on > > writes and return zero to the caller for #GPs on reads. > > Linux just switched that around for the machine check banks ... if they #GP > fault, then something is seriously wrong. > > Maybe that isn't a general change of direction though. Perhaps I > should either use rdmsrl_safe() in this code. Or (better?) add > > if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) > return; > > to the start of intel_imc_init(). I wouldn't expect all hypervisors to necessarily set CPUID.01H:ECX[bit 31]. Architecturally, on Intel CPUs, that bit is simply defined as "not used." There is no documented contract between Intel and hypervisor vendors regarding the use of that bit. (AMD, on the other hand, *does* document that bit as "reserved for use by hypervisor to indicate guest status.")
RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
> I thought Linux had long ago gone the route of turning rdmsr/wrmsr > into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on > writes and return zero to the caller for #GPs on reads. Linux just switched that around for the machine check banks ... if they #GP fault, then something is seriously wrong. Maybe that isn't a general change of direction though. Perhaps I should either use rdmsrl_safe() in this code. Or (better?) add if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) return; to the start of intel_imc_init(). -Tony
Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
On Mon, Nov 9, 2020 at 2:09 PM Luck, Tony wrote: > > What does KVM do with model specific MSRs? "Model specific model-specific registers?" :-) KVM only implements a small subset of MSRs. By default, any access to the rest raises #GP. > Looks like you let the guest believe it was running on one of Sandy Bridge, > Ivy Bridge, Haswell (Xeon). > > So, the core MCE code tried to enable extended error reporting. > > If there is a mode to have KVM let the guest think that it read/wrote MSR > 0x17F, > but actually, doesn't do it ... that would seem to be a reasonable thing to > do here. There is an 'ignore_msrs' module parameter, to sink writes and return zero on reads for unknown MSRs, but I don't think it's commonly used. I thought Linux had long ago gone the route of turning rdmsr/wrmsr into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on writes and return zero to the caller for #GPs on reads.
RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
What does KVM do with model specific MSRs? Looks like you let the guest believe it was running on one of Sandy Bridge, Ivy Bridge, Haswell (Xeon). So, the core MCE code tried to enable extended error reporting. If there is a mode to have KVM let the guest think that it read/wrote MSR 0x17F, but actually, doesn't do it ... that would seem to be a reasonable thing to do here. -Tony
Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
On Mon, 2020-11-02 at 11:18 +, tip-bot2 for Tony Luck wrote: > The following commit has been merged into the ras/core branch of tip: > > Commit-ID: 68299a42f84288537ee3420c431ac0115ccb90b1 > Gitweb: > https://git.kernel.org/tip/68299a42f84288537ee3420c431ac0115ccb90b1 > Author:Tony Luck > AuthorDate:Fri, 30 Oct 2020 12:04:00 -07:00 > Committer: Borislav Petkov > CommitterDate: Mon, 02 Nov 2020 11:15:59 +01:00 > > x86/mce: Enable additional error logging on certain Intel CPUs > > The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an > optional additional error logging mode which is enabled by an MSR. > > Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu, > but userspace should not be poking at MSRs. So move the enabling into > the kernel. > > [ bp: Correct the explanation why this is done. ] > > Suggested-by: Boris Petkov > Signed-off-by: Tony Luck > Signed-off-by: Borislav Petkov Booting a simple KVM guest using today's linux-next is now generating those errors below inside the guest due to this patch. Are those expected? # qemu-kvm -name kata -cpu host -smp 48 -m 48g -hda rhel-8.3-x86_64-kvm.img.qcow2 -cdrom kata.iso -nic user,hostfwd=tcp::-:22 -nographic guest .config (if ever matters): https://cailca.coding.net/public/linux/mm/git/files/master/x86.config [6.801741][T0] clocksource: tsc-early: mask: 0x max_cycles: 0x1e3bca858ab, max_idle_ns: 440795282452 ns [6.804371][T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 4194.90 BogoMIPS (lpj=20974530) [6.806956][T0] pid_max: default: 49152 minimum: 384 [6.814328][T0] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [6.814328][T0] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [6.814328][T0] x86/cpu: User Mode Instruction Prevention (UMIP) activated [6.814328][T0] unchecked MSR access error: RDMSR from 0x17f at rIP: 0x84483f16 (mce_intel_feature_init+0x156/0x270) [6.814328][T0] Call Trace: [6.814328][T0] __mcheck_cpu_init_vendor+0x105/0x250 __rdmsr at arch/x86/include/asm/msr.h:93 (inlined by) native_read_msr at arch/x86/include/asm/msr.h:127 (inlined by) intel_imc_init at arch/x86/kernel/cpu/mce/intel.c:524 (inlined by) mce_intel_feature_init at arch/x86/kernel/cpu/mce/intel.c:537 [6.814328][T0] mcheck_cpu_init+0x21f/0xb00 [6.814328][T0] identify_cpu+0xfcb/0x1980 [6.814328][T0] identify_boot_cpu+0xd/0xb5 [6.814328][T0] check_bugs+0x6c/0x1606 [6.814328][T0] ? _raw_spin_unlock+0x1a/0x30 [6.814328][T0] ? poking_init+0x2b5/0x2ea [6.814328][T0] ? l1tf_cmdline+0x11a/0x11a [6.814328][T0] ? lockdep_init_map_waits+0x267/0x6f0 [6.814328][T0] start_kernel+0x372/0x39f [6.814328][T0] secondary_startup_64_no_verify+0xc2/0xcb [6.814328][T0] unchecked MSR access error: WRMSR to 0x17f (tried to write 0x0002) at rIP: 0x84483f3a (mce_intel_feature_init+0x17a/0x270) [6.814328][T0] Call Trace: [6.814328][T0] __mcheck_cpu_init_vendor+0x105/0x250 [6.814328][T0] mcheck_cpu_init+0x21f/0xb00 [6.814328][T0] identify_cpu+0xfcb/0x1980 [6.814328][T0] identify_boot_cpu+0xd/0xb5 [6.814328][T0] check_bugs+0x6c/0x1606 [6.814328][T0] ? _raw_spin_unlock+0x1a/0x30 [6.814328][T0] ? poking_init+0x2b5/0x2ea [6.814328][T0] ? l1tf_cmdline+0x11a/0x11a [6.814328][T0] ? lockdep_init_map_waits+0x267/0x6f0 [6.814328][T0] start_kernel+0x372/0x39f [6.814328][T0] secondary_startup_64_no_verify+0xc2/0xcb [6.814328][T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [6.814328][T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 == host CPU == # lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 1 Core(s) per socket: 12 Socket(s): 4 NUMA node(s):4 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz Stepping:2 CPU MHz: 1980.076 BogoMIPS:4195.25 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache:256K L3 cache:30720K NUMA node0 CPU(s): 0-5,24-29 NUMA node1 CPU(s): 6-11,30-35 NUMA node2 CPU(s): 12-17,36-41 NUMA node3 CPU(s): 18-23,42-47 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr
[tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs
The following commit has been merged into the ras/core branch of tip: Commit-ID: 68299a42f84288537ee3420c431ac0115ccb90b1 Gitweb: https://git.kernel.org/tip/68299a42f84288537ee3420c431ac0115ccb90b1 Author:Tony Luck AuthorDate:Fri, 30 Oct 2020 12:04:00 -07:00 Committer: Borislav Petkov CommitterDate: Mon, 02 Nov 2020 11:15:59 +01:00 x86/mce: Enable additional error logging on certain Intel CPUs The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an optional additional error logging mode which is enabled by an MSR. Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu, but userspace should not be poking at MSRs. So move the enabling into the kernel. [ bp: Correct the explanation why this is done. ] Suggested-by: Boris Petkov Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20201030190807.ga13...@agluck-desk2.amr.corp.intel.com --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/mce/intel.c | 20 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 972a34d..b2dd264 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -139,6 +139,7 @@ #define MSR_IA32_MCG_CAP 0x0179 #define MSR_IA32_MCG_STATUS0x017a #define MSR_IA32_MCG_CTL 0x017b +#define MSR_ERROR_CONTROL 0x017f #define MSR_IA32_MCG_EXT_CTL 0x04d0 #define MSR_OFFCORE_RSP_0 0x01a6 diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index abe9fe0..b47883e 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -509,12 +509,32 @@ static void intel_ppin_init(struct cpuinfo_x86 *c) } } +/* + * Enable additional error logs from the integrated + * memory controller on processors that support this. + */ +static void intel_imc_init(struct cpuinfo_x86 *c) +{ + u64 error_control; + + switch (c->x86_model) { + case INTEL_FAM6_SANDYBRIDGE_X: + case INTEL_FAM6_IVYBRIDGE_X: + case INTEL_FAM6_HASWELL_X: + rdmsrl(MSR_ERROR_CONTROL, error_control); + error_control |= 2; + wrmsrl(MSR_ERROR_CONTROL, error_control); + break; + } +} + void mce_intel_feature_init(struct cpuinfo_x86 *c) { intel_init_thermal(c); intel_init_cmci(); intel_init_lmce(); intel_ppin_init(c); + intel_imc_init(c); } void mce_intel_feature_clear(struct cpuinfo_x86 *c)