Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-10 Thread Paolo Bonzini

Maybe no contract ... but a bunch of places (many of them in Intel
specific code) that check for it


Interestingly, quite a few of them are actually checking for HYPERVISOR 
not because of missing hypervisor features, but rather because 
hypervisors don't have to work around certain errata. :)


Full analysis after my sig, but tl;dr: the only case of using HYPERVISOR 
before using MSRs are in arch/x86/events/intel/cstate.c and 
arch/x86/events/intel/uncore.c.  There are some workarounds in 
drivers/gpu that might fall into a similar bucket.  But as far as MSRs 
go, the way to go  overwhelmingly seems to be {rd,wr}msrl_safe.


Thanks,

Paolo

On 10/11/20 00:36, Luck, Tony wrote:

arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {


Print a slightly less frightening warning.


arch/x86/events/intel/core.c:   if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))


Working around KVM's ignore_msrs=1 option (and quite ugly; shows that 
the option shouldn't be enabled by default).



arch/x86/events/intel/core.c:   int assume = 3 * 
!boot_cpu_has(X86_FEATURE_HYPERVISOR);


Seems unnecessary.


arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))


Too complicated. :)


arch/x86/kernel/apic/apic.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR))


Hypervisors don't have errata.


arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/cpu/intel.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR))


Print different vulnerability status in sysfs.


arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), 
vmware_hypercall_mode
arch/x86/kernel/cpu/vmware.c:   if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/jailhouse.c:!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/kvm.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/paravirt.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))


Obviously needed before using paravirt features of the hypervisor.


arch/x86/kernel/tsc.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||


Disables ART in VMs.  Probably the idea is that ART does not have an 
offset field similar to the TSC's, but it's not necessary.  Looking at 
the hypervisor-provided CPUID should be enough.



arch/x86/mm/init_64.c:  if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {


Tweaks hotplug heuristics, no MSRs involved.


drivers/acpi/processor_idle.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))


Avoids pointless hypervisor exit on idle (i.e. just an optimization).


drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h:   return 
boot_cpu_has(X86_FEATURE_HYPERVISOR);


Work around SR-IOV bugs.


drivers/gpu/drm/i915/i915_memcpy.c: 
!boot_cpu_has(X86_FEATURE_HYPERVISOR))


Work around KVM deficiency.


drivers/gpu/drm/radeon/radeon_device.c: return 
boot_cpu_has(X86_FEATURE_HYPERVISOR);


Work around SR-IOV bugs.


drivers/visorbus/visorchipset.c:if 
(boot_cpu_has(X86_FEATURE_HYPERVISOR)) {


Needed before using paravirt features of the hypervisor.



RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Luck, Tony
> I wouldn't expect all hypervisors to necessarily set CPUID.01H:ECX[bit
> 31]. Architecturally, on Intel CPUs, that bit is simply defined as
> "not used." There is no documented contract between Intel and
> hypervisor vendors regarding the use of that bit. (AMD, on the other
> hand, *does* document that bit as "reserved for use by hypervisor to
> indicate guest status.")

Maybe no contract ... but a bunch of places (many of them in Intel
specific code) that check for it.  Perhaps I should poke the owners
of the Intel SDM to accept the inevitable.

$ git grep "boot_cpu_has(X86_FEATURE_HYPERVISOR"
arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/events/intel/core.c:   if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/events/intel/core.c:   int assume = 3 * 
!boot_cpu_has(X86_FEATURE_HYPERVISOR);
arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/apic/apic.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/cpu/intel.c:if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), 
vmware_hypercall_mode
arch/x86/kernel/cpu/vmware.c:   if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
arch/x86/kernel/jailhouse.c:!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/kvm.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/paravirt.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
arch/x86/kernel/tsc.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
arch/x86/mm/init_64.c:  if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
drivers/acpi/processor_idle.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h:   return 
boot_cpu_has(X86_FEATURE_HYPERVISOR);
drivers/gpu/drm/i915/i915_memcpy.c: 
!boot_cpu_has(X86_FEATURE_HYPERVISOR))
drivers/gpu/drm/radeon/radeon_device.c: return 
boot_cpu_has(X86_FEATURE_HYPERVISOR);
drivers/visorbus/visorchipset.c:if 
(boot_cpu_has(X86_FEATURE_HYPERVISOR)) {

-Tony


Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Jim Mattson
On Mon, Nov 9, 2020 at 2:57 PM Luck, Tony  wrote:
>
> > I thought Linux had long ago gone the route of turning rdmsr/wrmsr
> > into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on
> > writes and return zero to the caller for #GPs on reads.
>
> Linux just switched that around for the machine check banks ... if they #GP
> fault, then something is seriously wrong.
>
> Maybe that isn't a general change of direction though. Perhaps I
> should either use rdmsrl_safe() in this code. Or (better?) add
>
> if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> return;
>
> to the start of intel_imc_init().

I wouldn't expect all hypervisors to necessarily set CPUID.01H:ECX[bit
31]. Architecturally, on Intel CPUs, that bit is simply defined as
"not used." There is no documented contract between Intel and
hypervisor vendors regarding the use of that bit. (AMD, on the other
hand, *does* document that bit as "reserved for use by hypervisor to
indicate guest status.")


RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Luck, Tony
> I thought Linux had long ago gone the route of turning rdmsr/wrmsr
> into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on
> writes and return zero to the caller for #GPs on reads.

Linux just switched that around for the machine check banks ... if they #GP
fault, then something is seriously wrong.

Maybe that isn't a general change of direction though. Perhaps I
should either use rdmsrl_safe() in this code. Or (better?) add

if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;

to the start of intel_imc_init().

-Tony


Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Jim Mattson
On Mon, Nov 9, 2020 at 2:09 PM Luck, Tony  wrote:
>
> What does KVM do with model specific MSRs?

"Model specific model-specific registers?" :-)

KVM only implements a small subset of MSRs. By default, any access to
the rest raises #GP.

> Looks like you let the guest believe it was running on one of Sandy Bridge, 
> Ivy Bridge, Haswell (Xeon).
>
> So, the core MCE code tried to enable extended error reporting.
>
> If there is a mode to have KVM let the guest think that it read/wrote MSR 
> 0x17F,
> but actually, doesn't do it ... that would seem to be a reasonable thing to 
> do here.

There is an 'ignore_msrs' module parameter, to sink writes and return
zero on reads for unknown MSRs, but I don't think it's commonly used.

I thought Linux had long ago gone the route of turning rdmsr/wrmsr
into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on
writes and return zero to the caller for #GPs on reads.


RE: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Luck, Tony
What does KVM do with model specific MSRs?

Looks like you let the guest believe it was running on one of Sandy Bridge, Ivy 
Bridge, Haswell (Xeon).

So, the core MCE code tried to enable extended error reporting.

If there is a mode to have KVM let the guest think that it read/wrote MSR 0x17F,
but actually, doesn't do it ... that would seem to be a reasonable thing to do 
here.

-Tony


Re: [tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-09 Thread Qian Cai
On Mon, 2020-11-02 at 11:18 +, tip-bot2 for Tony Luck wrote:
> The following commit has been merged into the ras/core branch of tip:
> 
> Commit-ID: 68299a42f84288537ee3420c431ac0115ccb90b1
> Gitweb:
> https://git.kernel.org/tip/68299a42f84288537ee3420c431ac0115ccb90b1
> Author:Tony Luck 
> AuthorDate:Fri, 30 Oct 2020 12:04:00 -07:00
> Committer: Borislav Petkov 
> CommitterDate: Mon, 02 Nov 2020 11:15:59 +01:00
> 
> x86/mce: Enable additional error logging on certain Intel CPUs
> 
> The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
> optional additional error logging mode which is enabled by an MSR.
> 
> Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu,
> but userspace should not be poking at MSRs. So move the enabling into
> the kernel.
> 
>  [ bp: Correct the explanation why this is done. ]
> 
> Suggested-by: Boris Petkov 
> Signed-off-by: Tony Luck 
> Signed-off-by: Borislav Petkov 

Booting a simple KVM guest using today's linux-next is now generating those
errors below inside the guest due to this patch. Are those expected?

# qemu-kvm -name kata -cpu host -smp 48 -m 48g -hda 
rhel-8.3-x86_64-kvm.img.qcow2 -cdrom kata.iso -nic user,hostfwd=tcp::-:22 
-nographic

guest .config (if ever matters): 
https://cailca.coding.net/public/linux/mm/git/files/master/x86.config

[6.801741][T0] clocksource: tsc-early: mask: 0x 
max_cycles: 0x1e3bca858ab, max_idle_ns: 440795282452 ns
[6.804371][T0] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 4194.90 BogoMIPS (lpj=20974530)
[6.806956][T0] pid_max: default: 49152 minimum: 384
[6.814328][T0] Mount-cache hash table entries: 131072 (order: 8, 
1048576 bytes, linear)
[6.814328][T0] Mountpoint-cache hash table entries: 131072 (order: 8, 
1048576 bytes, linear)
[6.814328][T0] x86/cpu: User Mode Instruction Prevention (UMIP) 
activated
[6.814328][T0] unchecked MSR access error: RDMSR from 0x17f at rIP: 
0x84483f16 (mce_intel_feature_init+0x156/0x270)
[6.814328][T0] Call Trace:
[6.814328][T0]  __mcheck_cpu_init_vendor+0x105/0x250
__rdmsr at arch/x86/include/asm/msr.h:93
(inlined by) native_read_msr at arch/x86/include/asm/msr.h:127
(inlined by) intel_imc_init at arch/x86/kernel/cpu/mce/intel.c:524
(inlined by) mce_intel_feature_init at arch/x86/kernel/cpu/mce/intel.c:537
[6.814328][T0]  mcheck_cpu_init+0x21f/0xb00
[6.814328][T0]  identify_cpu+0xfcb/0x1980
[6.814328][T0]  identify_boot_cpu+0xd/0xb5
[6.814328][T0]  check_bugs+0x6c/0x1606
[6.814328][T0]  ? _raw_spin_unlock+0x1a/0x30
[6.814328][T0]  ? poking_init+0x2b5/0x2ea
[6.814328][T0]  ? l1tf_cmdline+0x11a/0x11a
[6.814328][T0]  ? lockdep_init_map_waits+0x267/0x6f0
[6.814328][T0]  start_kernel+0x372/0x39f
[6.814328][T0]  secondary_startup_64_no_verify+0xc2/0xcb
[6.814328][T0] unchecked MSR access error: WRMSR to 0x17f (tried to 
write 0x0002) at rIP: 0x84483f3a 
(mce_intel_feature_init+0x17a/0x270)
[6.814328][T0] Call Trace:
[6.814328][T0]  __mcheck_cpu_init_vendor+0x105/0x250
[6.814328][T0]  mcheck_cpu_init+0x21f/0xb00
[6.814328][T0]  identify_cpu+0xfcb/0x1980
[6.814328][T0]  identify_boot_cpu+0xd/0xb5
[6.814328][T0]  check_bugs+0x6c/0x1606
[6.814328][T0]  ? _raw_spin_unlock+0x1a/0x30
[6.814328][T0]  ? poking_init+0x2b5/0x2ea
[6.814328][T0]  ? l1tf_cmdline+0x11a/0x11a
[6.814328][T0]  ? lockdep_init_map_waits+0x267/0x6f0
[6.814328][T0]  start_kernel+0x372/0x39f
[6.814328][T0]  secondary_startup_64_no_verify+0xc2/0xcb
[6.814328][T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[6.814328][T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0

== host CPU ==
# lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
CPU(s):  48
On-line CPU(s) list: 0-47
Thread(s) per core:  1
Core(s) per socket:  12
Socket(s):   4
NUMA node(s):4
Vendor ID:   GenuineIntel
CPU family:  6
Model:   63
Model name:  Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz
Stepping:2
CPU MHz: 1980.076
BogoMIPS:4195.25
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:30720K
NUMA node0 CPU(s):   0-5,24-29
NUMA node1 CPU(s):   6-11,30-35
NUMA node2 CPU(s):   12-17,36-41
NUMA node3 CPU(s):   18-23,42-47
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
ssse3 sdbg fma cx16 xtpr 

[tip: ras/core] x86/mce: Enable additional error logging on certain Intel CPUs

2020-11-02 Thread tip-bot2 for Tony Luck
The following commit has been merged into the ras/core branch of tip:

Commit-ID: 68299a42f84288537ee3420c431ac0115ccb90b1
Gitweb:
https://git.kernel.org/tip/68299a42f84288537ee3420c431ac0115ccb90b1
Author:Tony Luck 
AuthorDate:Fri, 30 Oct 2020 12:04:00 -07:00
Committer: Borislav Petkov 
CommitterDate: Mon, 02 Nov 2020 11:15:59 +01:00

x86/mce: Enable additional error logging on certain Intel CPUs

The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
optional additional error logging mode which is enabled by an MSR.

Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu,
but userspace should not be poking at MSRs. So move the enabling into
the kernel.

 [ bp: Correct the explanation why this is done. ]

Suggested-by: Boris Petkov 
Signed-off-by: Tony Luck 
Signed-off-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/20201030190807.ga13...@agluck-desk2.amr.corp.intel.com
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kernel/cpu/mce/intel.c  | 20 
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d..b2dd264 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -139,6 +139,7 @@
 #define MSR_IA32_MCG_CAP   0x0179
 #define MSR_IA32_MCG_STATUS0x017a
 #define MSR_IA32_MCG_CTL   0x017b
+#define MSR_ERROR_CONTROL  0x017f
 #define MSR_IA32_MCG_EXT_CTL   0x04d0
 
 #define MSR_OFFCORE_RSP_0  0x01a6
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index abe9fe0..b47883e 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -509,12 +509,32 @@ static void intel_ppin_init(struct cpuinfo_x86 *c)
}
 }
 
+/*
+ * Enable additional error logs from the integrated
+ * memory controller on processors that support this.
+ */
+static void intel_imc_init(struct cpuinfo_x86 *c)
+{
+   u64 error_control;
+
+   switch (c->x86_model) {
+   case INTEL_FAM6_SANDYBRIDGE_X:
+   case INTEL_FAM6_IVYBRIDGE_X:
+   case INTEL_FAM6_HASWELL_X:
+   rdmsrl(MSR_ERROR_CONTROL, error_control);
+   error_control |= 2;
+   wrmsrl(MSR_ERROR_CONTROL, error_control);
+   break;
+   }
+}
+
 void mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
intel_init_thermal(c);
intel_init_cmci();
intel_init_lmce();
intel_ppin_init(c);
+   intel_imc_init(c);
 }
 
 void mce_intel_feature_clear(struct cpuinfo_x86 *c)