Re: Fwd: [WARNING AND ERROR] may be system slow and audio and video breaking
On Mon, Oct 19, 2020 at 03:27:48AM +0530, Jeffrin Jose T wrote: > On Sun, 2020-10-18 at 23:03 +0200, Borislav Petkov wrote: > > On Mon, Oct 19, 2020 at 01:51:34AM +0530, Jeffrin Jose T wrote: > > > On Sun, 2020-10-18 at 19:49 +0200, Borislav Petkov wrote: > > > > On Sun, Oct 18, 2020 at 10:42:39PM +0530, Jeffrin Jose T wrote: > > > > > smpboot: Scheduler frequency invariance went wobbly, disabling! > > > > > [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at > > > > > rIP: > > > > > 0xb5c9a184 (native_read_msr+0x4/0x30) > > > > Ok, you forgot to say in your initial mail that this happens when you > > suspend your laptop. > > > > Now, this unchecked MSR error thing happens only once because that > > early > > during resume the microcode on CPU1 is not updated yet - and that > > needs > > to be debugged separately and I'll try to reproduce that on my > > machine - > > so the microcode is not updated yet and therefore the 0x123 MSR is > > not > > "emulated" by the microcode, so to speak, thus the warning. > > > > That warning doesn't happen anymore, though, once the microcode is > > updated. > > > > But what happens after that is you get a flood of correctable PCIe > > errors about a transaction to a device timeoutting: > > > > pcieport :00:1c.5: AER: Corrected error received: :00:1c.5 > > pcieport :00:1c.5: PCIe Bus Error: severity=Corrected, type=Data > > Link Layer, (Transmitter ID) > > pcieport :00:1c.5: device [8086:9d15] error > > status/mask=1000/2000 > > pcieport :00:1c.5:[12] Timeout > > > > and it looks like that flood is slowing down the machine because it > > is > > busy logging them. > > > > Do > > > > # lspci -nn -xxx > > > > as root. It'll show us which device that 8086:9d15 is. > > > > Thx. > > > > $sudo lspci -nn -xxx | grep 9d15 > 00:1c.5 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI > Express Root Port #6 [8086:9d15] (rev f1) Hm, looks like a builtin pci express port can't stomach suspend/resume and starts throwing AER errors. Adding Bjorn for a comment and leaving in the rest for reference. > file lspci.txt is attached > -- > software engineer > rajagiri school of engineering and technology > 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core > Processor Host Bridge/DRAM Registers [8086:5904] (rev 03) > 00: 86 80 04 59 06 00 90 20 03 00 00 06 00 00 00 00 > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 11 13 > 30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 > 40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00 > 50: c1 02 00 00 b1 00 00 00 47 00 f0 9f 01 00 00 9b > 60: 03 00 00 f0 00 00 00 00 01 80 d1 fe 00 00 00 00 > 70: 00 00 00 ff 01 00 00 00 00 0c 00 ff 7f 00 00 00 > 80: 01 00 00 00 00 00 00 00 1a 00 00 00 00 00 00 00 > 90: 01 00 00 ff 01 00 00 00 01 00 f0 5e 02 00 00 00 > a0: 01 00 00 00 02 00 00 00 01 00 00 5f 02 00 00 00 > b0: 01 00 00 9c 01 00 80 9b 01 00 00 9b 01 00 00 a0 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 09 00 10 01 31 60 61 7a dc 80 15 94 00 c0 06 00 > f0: 00 00 00 00 c8 0f 09 00 00 00 00 00 00 00 00 00 > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Device > [8086:5921] (rev 06) > 00: 86 80 21 59 07 04 10 00 06 00 00 03 10 00 00 00 > 10: 04 00 00 ee 00 00 00 00 0c 00 00 d0 00 00 00 00 > 20: 01 f0 00 00 00 00 00 00 00 00 00 00 43 10 11 13 > 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 > 40: 09 70 0c 01 31 60 61 7a dc 80 15 94 00 00 00 00 > 50: c1 02 00 00 b1 00 00 00 00 00 00 00 01 00 00 9c > 60: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 10 ac 92 00 00 80 00 10 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 05 d0 01 00 > b0: 18 00 e0 fe 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 18 50 90 9a > > 00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 > v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 03) > 00: 86 80 03 19 02 00 90 00 03 00 80 11 00 00 00 00 > 10: 04 80 1a ef 00 00 00 00 00 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 11 13 > 30: 00 00 00 00 90 00 00 00 00 00 00 00 ff 01 00 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 b1 00 00 00 00 00 00 00 00 00 00 00 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 05 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Re: Fwd: [WARNING AND ERROR] may be system slow and audio and video breaking
On Sun, 2020-10-18 at 23:03 +0200, Borislav Petkov wrote: > On Mon, Oct 19, 2020 at 01:51:34AM +0530, Jeffrin Jose T wrote: > > On Sun, 2020-10-18 at 19:49 +0200, Borislav Petkov wrote: > > > On Sun, Oct 18, 2020 at 10:42:39PM +0530, Jeffrin Jose T wrote: > > > > smpboot: Scheduler frequency invariance went wobbly, disabling! > > > > [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at > > > > rIP: > > > > 0xb5c9a184 (native_read_msr+0x4/0x30) > > Ok, you forgot to say in your initial mail that this happens when you > suspend your laptop. > > Now, this unchecked MSR error thing happens only once because that > early > during resume the microcode on CPU1 is not updated yet - and that > needs > to be debugged separately and I'll try to reproduce that on my > machine - > so the microcode is not updated yet and therefore the 0x123 MSR is > not > "emulated" by the microcode, so to speak, thus the warning. > > That warning doesn't happen anymore, though, once the microcode is > updated. > > But what happens after that is you get a flood of correctable PCIe > errors about a transaction to a device timeoutting: > > pcieport :00:1c.5: AER: Corrected error received: :00:1c.5 > pcieport :00:1c.5: PCIe Bus Error: severity=Corrected, type=Data > Link Layer, (Transmitter ID) > pcieport :00:1c.5: device [8086:9d15] error > status/mask=1000/2000 > pcieport :00:1c.5:[12] Timeout > > and it looks like that flood is slowing down the machine because it > is > busy logging them. > > Do > > # lspci -nn -xxx > > as root. It'll show us which device that 8086:9d15 is. > > Thx. > $sudo lspci -nn -xxx | grep 9d15 00:1c.5 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 [8086:9d15] (rev f1) $ file lspci.txt is attached -- software engineer rajagiri school of engineering and technology 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 03) 00: 86 80 04 59 06 00 90 20 03 00 00 06 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 11 13 30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00 50: c1 02 00 00 b1 00 00 00 47 00 f0 9f 01 00 00 9b 60: 03 00 00 f0 00 00 00 00 01 80 d1 fe 00 00 00 00 70: 00 00 00 ff 01 00 00 00 00 0c 00 ff 7f 00 00 00 80: 01 00 00 00 00 00 00 00 1a 00 00 00 00 00 00 00 90: 01 00 00 ff 01 00 00 00 01 00 f0 5e 02 00 00 00 a0: 01 00 00 00 02 00 00 00 01 00 00 5f 02 00 00 00 b0: 01 00 00 9c 01 00 80 9b 01 00 00 9b 01 00 00 a0 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 09 00 10 01 31 60 61 7a dc 80 15 94 00 c0 06 00 f0: 00 00 00 00 c8 0f 09 00 00 00 00 00 00 00 00 00 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5921] (rev 06) 00: 86 80 21 59 07 04 10 00 06 00 00 03 10 00 00 00 10: 04 00 00 ee 00 00 00 00 0c 00 00 d0 00 00 00 00 20: 01 f0 00 00 00 00 00 00 00 00 00 00 43 10 11 13 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 40: 09 70 0c 01 31 60 61 7a dc 80 15 94 00 00 00 00 50: c1 02 00 00 b1 00 00 00 00 00 00 00 01 00 00 9c 60: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 10 ac 92 00 00 80 00 10 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 05 d0 01 00 b0: 18 00 e0 fe 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 18 50 90 9a 00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 03) 00: 86 80 03 19 02 00 90 00 03 00 80 11 00 00 00 00 10: 04 80 1a ef 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 11 13 30: 00 00 00 00 90 00 00 00 00 00 00 00 ff 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 b1 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 05 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 01 e0 03 00 08 00 00 00 00 00 00 00 00 00 00 00 e0: 09 00 0c 01 31 60 61 7a dc 80 15 94 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) 00: 86 80 2f 9d 02 04 90 02 21 30 03 0c 00 00 80 00 10: 04 00 19 ef 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00
Re: Fwd: [WARNING AND ERROR] may be system slow and audio and video breaking
On Mon, Oct 19, 2020 at 01:51:34AM +0530, Jeffrin Jose T wrote: > On Sun, 2020-10-18 at 19:49 +0200, Borislav Petkov wrote: > > On Sun, Oct 18, 2020 at 10:42:39PM +0530, Jeffrin Jose T wrote: > > > smpboot: Scheduler frequency invariance went wobbly, disabling! > > > [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at rIP: > > > 0xb5c9a184 (native_read_msr+0x4/0x30) Ok, you forgot to say in your initial mail that this happens when you suspend your laptop. Now, this unchecked MSR error thing happens only once because that early during resume the microcode on CPU1 is not updated yet - and that needs to be debugged separately and I'll try to reproduce that on my machine - so the microcode is not updated yet and therefore the 0x123 MSR is not "emulated" by the microcode, so to speak, thus the warning. That warning doesn't happen anymore, though, once the microcode is updated. But what happens after that is you get a flood of correctable PCIe errors about a transaction to a device timeoutting: pcieport :00:1c.5: AER: Corrected error received: :00:1c.5 pcieport :00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) pcieport :00:1c.5: device [8086:9d15] error status/mask=1000/2000 pcieport :00:1c.5:[12] Timeout and it looks like that flood is slowing down the machine because it is busy logging them. Do # lspci -nn -xxx as root. It'll show us which device that 8086:9d15 is. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: Fwd: [WARNING AND ERROR] may be system slow and audio and video breaking
On Sun, Oct 18, 2020 at 10:42:39PM +0530, Jeffrin Jose T wrote: > smpboot: Scheduler frequency invariance went wobbly, disabling! > [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at rIP: > 0xb5c9a184 (native_read_msr+0x4/0x30) > [ 1112.592869] Call Trace: > [ 1112.592876] update_srbds_msr+0x6f/0xb0 > [ 1112.592880] smp_store_cpu_info+0x8e/0xb0 > [ 1112.592883] start_secondary+0x93/0x200 > [ 1112.592887] ? set_cpu_sibling_map+0xcb0/0xcb0 > [ 1112.592891] secondary_startup_64+0xa4/0xb0 > [ 1112.592898] unchecked MSR access error: WRMSR to 0x123 (tried to > write 0x) at rIP: 0xb5c9a264 > (native_write_msr+0x4/0x20) > [ 1112.592899] Call Trace: > [ 1112.592902] update_srbds_msr+0x98/0xb0 > [ 1112.592904] smp_store_cpu_info+0x8e/0xb0 > [ 1112.592907] start_secondary+0x93/0x200 > [ 1112.592911] ? set_cpu_sibling_map+0xcb0/0xcb0 > [ 1112.592914] secondary_startup_64+0xa4/0xb0 > [ 2915.106879] show_signal: 6 callbacks suppressed > [ 6089.209343] WARNING: stack going in the wrong direction? at > i915_gem_close_object+0x2fb/0x560 [i915] This looks strange. Please send - full dmesg - output from the "grep -r . /sys/devices/system/cpu/vulnerabilities/" command - /proc/cpuinfo - .config Privately is fine too. > -x---xx > Linux debian 5.9.1-rc1+ #4 SMP Fri Oct 16 16:48:04 IST 2020 x86_64 What kernel is that exactly? Can you reproduce with plain v5.9 too? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Fwd: [WARNING AND ERROR] may be system slow and audio and video breaking
--- Begin Message --- hello, System was slow and audio and video was breaking. I was compiling a kernel and also played a youtube video in firefox and maybe evolution. meminfo.txt and lscpu.txt files are attached. The following is a part from "dmesg -l warn" ---x--x-x smpboot: Scheduler frequency invariance went wobbly, disabling! [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at rIP: 0xb5c9a184 (native_read_msr+0x4/0x30) [ 1112.592869] Call Trace: [ 1112.592876] update_srbds_msr+0x6f/0xb0 [ 1112.592880] smp_store_cpu_info+0x8e/0xb0 [ 1112.592883] start_secondary+0x93/0x200 [ 1112.592887] ? set_cpu_sibling_map+0xcb0/0xcb0 [ 1112.592891] secondary_startup_64+0xa4/0xb0 [ 1112.592898] unchecked MSR access error: WRMSR to 0x123 (tried to write 0x) at rIP: 0xb5c9a264 (native_write_msr+0x4/0x20) [ 1112.592899] Call Trace: [ 1112.592902] update_srbds_msr+0x98/0xb0 [ 1112.592904] smp_store_cpu_info+0x8e/0xb0 [ 1112.592907] start_secondary+0x93/0x200 [ 1112.592911] ? set_cpu_sibling_map+0xcb0/0xcb0 [ 1112.592914] secondary_startup_64+0xa4/0xb0 [ 2915.106879] show_signal: 6 callbacks suppressed [ 6089.209343] WARNING: stack going in the wrong direction? at i915_gem_close_object+0x2fb/0x560 [i915] $ -x---xx Linux debian 5.9.1-rc1+ #4 SMP Fri Oct 16 16:48:04 IST 2020 x86_64 GNU/Linux GNU Make4.3 Binutils2.35.1 Util-linux 2.36 Mount 2.36 Bison 3.7.2 Flex2.6.4 Dynamic linker (ldd)2.30 Procps 3.3.16 Kbd 2.3.0 Console-tools 2.3.0 Sh-utils8.32 Udev246 Reported-by: Jeffrin Jose T MemTotal:6952576 kB MemFree: 299000 kB MemAvailable:2964748 kB Buffers: 169024 kB Cached: 3447336 kB SwapCached: 108 kB Active: 1996356 kB Inactive:3315820 kB Active(anon): 643748 kB Inactive(anon): 1950892 kB Active(file):1352608 kB Inactive(file): 1364928 kB Unevictable: 57668 kB Mlocked: 160 kB SwapTotal: 8263676 kB SwapFree:8261092 kB Dirty: 16632 kB Writeback: 0 kB AnonPages: 1736772 kB Mapped: 520572 kB Shmem:898760 kB KReclaimable: 264048 kB Slab:1045968 kB SReclaimable: 264048 kB SUnreclaim: 781920 kB KernelStack: 28320 kB PageTables:27856 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:11739964 kB Committed_AS:7741848 kB VmallocTotal: 34359738367 kB VmallocUsed: 60544 kB VmallocChunk: 0 kB Percpu: 2672 kB HardwareCorrupted: 0 kB AnonHugePages:763904 kB ShmemHugePages:0 kB ShmemPmdMapped:0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 891756 kB DirectMap2M: 7372800 kB DirectMap1G: 1048576 kB Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s):1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz Stepping:9 CPU MHz: 2299.999 CPU max MHz: 2300. CPU min MHz: 400. BogoMIPS:4599.93 Virtualization: VT-x L1d cache: 64 KiB L1i cache: 64 KiB L2 cache:512 KiB L3 cache:3 MiB NUMA node0 CPU(s): 0-3 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2:Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling Vulnerability Srbds: Mitigation; Microcode