Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
Dear Diederik, the new kernel: root@g6 /opt # uname -a Linux g6.lan.dac 5.18.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.2-1 (2022-06-06) x86_64 GNU/Linux in Debian/testing works again in the way, that LIBVIRT KVM works again! I most probably found the reason for the slow disk access on my machine: Please see this new bug report: Debian Bug report logs - #1013260 coreutils: /bin/chown very slow in conjunction with storebackup https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013260 Thank you very much for your answer! Sincerely, Adrian Kieß On Mon, 30 May 2022 14:29:29 +0200 Diederik de Haas wrote: > Hi Adrian, > > On Monday, 30 May 2022 13:14:26 CEST Adrian Kieß wrote: > > yes, it works with the kernel 5.16.0-6, but disk access is still slow. > > Ok, but that issue was also happening before 5.17 and is not a new problem. > Do you have a(n old) kernel (still) installed which does NOT have this slow > disk access issue? If it happens on all kernel versions, then a hardware > issue > becomes much more likely to be the real culprit. > > > For example, virt-manager/viewer sometimes needs a minute to connect to > > the KVM instances on localhost. But not all applications are this slow; > > for example the E-Mail client Sylpheed starts as fast as before and is > > operating at fast speed. > > In your initial report I noticed the following: > > Network: > > Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3 > > IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8 > > IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown > > mac: fe:40:ab:83:94:4a > > IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full > > mac: fe:54:00:c2:24:94 > > IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full > > mac: fe:54:00:bf:35:8b > > IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full > > mac: fe:54:00:25:b0:8b > > IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full > > mac: fe:54:00:4a:c8:69 > > Is it normal/expected that IF-ID-[2-5] have "unknown speed: 10 Mbps duplex" ? > If not, that may be worth looking into. > > > I assume there is also another bug now in the system, not only due to > > the new kernel. There is also another bug in GDM3, which I also > > reported: Loading GDM3 after bootup and logging in as normal user is > > also very, very slow. > > Which sounds like the moment lots of files/data is read from disk to > initialize > the session, which does point to a disk issue. > But if the initial boot isn't terribly slow as well, that would be odd. > Or is /home mounted from another disk? > > > As you suggested, I installed the kernel 5.17.11 from Debian/unstable > > and booted into this kernel. > > > > virt-manager and my KVM VM instances do work again, but one VM instance > > failed to load after bootup. I restarted the VM instance, and it is now > > also operating fine. > > Good, that sounds like major progress :) It looks to me that the KVM problem > is now (mostly?) fixed. > > > When opening the virt-viewer instance from virt-manager, connecting to > > the VM is still very slow with kernel 5.17.11. Something must be wrong > > I/O wise. > > That still/all points to a disk problem > > > I attached the dmesg output, you requested, as TXT file to this E-mail. > > Couple of things I noticed in the dmesg output: > 1) "[Firmware Warn]: HEST: Duplicated hardware error source ID: 9." > https://lkml.org/lkml/2011/6/27/370 seems relevant for that as it provided > the > better warning, but it also points out that it *is* considered a firmware bug. > I noticed your BIOS is from 2011. Is there a newer version available? If so, > it may be worth trying that out to see if that improves things. > > 2) Several ACPI related warnings. > No idea if or what should be done with that. > > 3) "kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using > workaround" and "kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu" > That looks like there are still KVM related issues (just not or less fatal) > There have been other bug reports related to KVM. > > 4) BUG: kernel NULL pointer dereference, address: 000b > That's never good. The dmesg output also contains a Call Trace and several > mentions of KVM, so it looks like there's still something not right about it. > I have no idea how to interpret those Call (or Stack) Traces, so hopefully > someone else chimes in who is familiar with that. > > Cheers, > Diederik -- With many greetings from Leipzig, Germany. Adrian Immanuel Kieß Gothaer Straße 34 D-04155 Leipzig — < adr...@kiess.onl > --SYSTEM-- echo "Your fortune cookie: " && /usr/games/fortune -c -s de > (letzteworte) % Letzte Worte eines Fahrlehrers: "Die Ampel ist rot." echo "g6.lan.dac uptime: " && /usr/bin/uptime > 17:17:09 up 2 days, 1:21, 12 users, load average: 1,09, 0,90, 0,85
Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
That dmesg output sounds a lot like bugs 1010916 and 1011168 and a Xeon X3430 CPU would be another older one that predates XSAVE. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1011168#15 -- Jon Doge Wrangler X(7): A program for managing terminal windows. See also screen(1) and tmux(1).
Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
Hi Adrian, On Monday, 30 May 2022 13:14:26 CEST Adrian Kieß wrote: > yes, it works with the kernel 5.16.0-6, but disk access is still slow. Ok, but that issue was also happening before 5.17 and is not a new problem. Do you have a(n old) kernel (still) installed which does NOT have this slow disk access issue? If it happens on all kernel versions, then a hardware issue becomes much more likely to be the real culprit. > For example, virt-manager/viewer sometimes needs a minute to connect to > the KVM instances on localhost. But not all applications are this slow; > for example the E-Mail client Sylpheed starts as fast as before and is > operating at fast speed. In your initial report I noticed the following: > Network: > Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3 > IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8 > IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown > mac: fe:40:ab:83:94:4a > IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full > mac: fe:54:00:c2:24:94 > IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full > mac: fe:54:00:bf:35:8b > IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full > mac: fe:54:00:25:b0:8b > IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full > mac: fe:54:00:4a:c8:69 Is it normal/expected that IF-ID-[2-5] have "unknown speed: 10 Mbps duplex" ? If not, that may be worth looking into. > I assume there is also another bug now in the system, not only due to > the new kernel. There is also another bug in GDM3, which I also > reported: Loading GDM3 after bootup and logging in as normal user is > also very, very slow. Which sounds like the moment lots of files/data is read from disk to initialize the session, which does point to a disk issue. But if the initial boot isn't terribly slow as well, that would be odd. Or is /home mounted from another disk? > As you suggested, I installed the kernel 5.17.11 from Debian/unstable > and booted into this kernel. > > virt-manager and my KVM VM instances do work again, but one VM instance > failed to load after bootup. I restarted the VM instance, and it is now > also operating fine. Good, that sounds like major progress :) It looks to me that the KVM problem is now (mostly?) fixed. > When opening the virt-viewer instance from virt-manager, connecting to > the VM is still very slow with kernel 5.17.11. Something must be wrong > I/O wise. That still/all points to a disk problem > I attached the dmesg output, you requested, as TXT file to this E-mail. Couple of things I noticed in the dmesg output: 1) "[Firmware Warn]: HEST: Duplicated hardware error source ID: 9." https://lkml.org/lkml/2011/6/27/370 seems relevant for that as it provided the better warning, but it also points out that it *is* considered a firmware bug. I noticed your BIOS is from 2011. Is there a newer version available? If so, it may be worth trying that out to see if that improves things. 2) Several ACPI related warnings. No idea if or what should be done with that. 3) "kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround" and "kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu" That looks like there are still KVM related issues (just not or less fatal) There have been other bug reports related to KVM. 4) BUG: kernel NULL pointer dereference, address: 000b That's never good. The dmesg output also contains a Call Trace and several mentions of KVM, so it looks like there's still something not right about it. I have no idea how to interpret those Call (or Stack) Traces, so hopefully someone else chimes in who is familiar with that. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
Dear Diederik, yes, it works with the kernel 5.16.0-6, but disk access is still slow. For example, virt-manager/viewer sometimes needs a minute to connect to the KVM instances on localhost. But not all applications are this slow; for example the E-Mail client Sylpheed starts as fast as before and is operating at fast speed. I assume there is also another bug now in the system, not only due to the new kernel. There is also another bug in GDM3, which I also reported: Loading GDM3 after bootup and logging in as normal user is also very, very slow. As you suggested, I installed the kernel 5.17.11 from Debian/unstable and booted into this kernel. virt-manager and my KVM VM instances do work again, but one VM instance failed to load after bootup. I restarted the VM instance, and it is now also operating fine. When opening the virt-viewer instance from virt-manager, connecting to the VM is still very slow with kernel 5.17.11. Something must be wrong I/O wise. I attached the dmesg output, you requested, as TXT file to this E-mail. Thank you very much for your answer! Sincerely, Adrian Kiess On Mon, 30 May 2022 11:45:29 +0200 Diederik de Haas wrote: > On Monday, 30 May 2022 09:59:06 CEST Adrian Immanuel Kiess wrote: > > Package: src:linux > > Version: 5.17.3-1 > > Debian Release: bookworm/sid > > APT policy: (990, 'testing') > > > > Kernel: Linux 5.16.0-6-amd64 (SMP w/4 CPU threads; PREEMPT) > > Does everything work correctly with kernel 5.16.0-6 ? > Sid/Unstable currently has and it would be useful to know if > the issue is still present in that version. Can you test that? > > If it is, then hopefully `dmesg` can give some clues. After you've noticed > the > described symptoms again, can you do `dmesg --level > emerg,alert,crit,err,warn` > and send that to this bug report? [0.022299] ACPI: SPCR: Unexpected SPCR Access Width. Defaulting to byte size [0.233929] core: CPUID marked event: 'bus cycles' unavailable [0.281581] pmd_set_huge: Cannot satisfy [mem 0xe000-0xe020] with a huge-page mapping due to MTRR override. [0.285753] [Firmware Warn]: HEST: Duplicated hardware error source ID: 9. [8.897715] ERST: Can not request [mem 0xbf7ff000-0xbf7f] for ERST. [9.537343] ACPI Warning: SystemIO range 0x1028-0x102F conflicts with OpRegion 0x1000-0x102F (\_SB.PCI0.LPC0.PMIO) (20211217/utaddress-204) [9.549414] ACPI Warning: SystemIO range 0x1180-0x11AF conflicts with OpRegion 0x1180-0x11AF (\_SB.PCI0.LPC0.GPOX) (20211217/utaddress-204) [9.556866] lpc_ich: Resource conflict(s) found affecting gpio_ich [9.678746] resource sanity check: requesting [mem 0x000c-0x000d], which spans more than PCI Bus :00 [mem 0x000d4000-0x000dbfff window] [9.678756] caller pci_map_rom+0x79/0x1d0 mapping multiple BARs [ 13.853298] IPMI Watchdog: Unable to register misc device [ 14.329981] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround [ 24.691518] kauditd_printk_skb: 85 callbacks suppressed [ 26.412447] kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu [ 90.991726] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. [ 95.092388] BUG: kernel NULL pointer dereference, address: 000b [ 95.092396] #PF: supervisor write access in kernel mode [ 95.092398] #PF: error_code(0x0002) - not-present page [ 95.092404] Oops: 0002 [#1] PREEMPT SMP PTI [ 95.092407] CPU: 2 PID: 4379 Comm: CPU 0/KVM Not tainted 5.17.0-3-amd64 #1 Debian 5.17.11-1 [ 95.092411] Hardware name: HP ProLiant ML110 G6/ProLiant ML110 G6, BIOS O27 08/26/2011 [ 95.092413] RIP: 0010:kvm_replace_memslot+0xcf/0x390 [kvm] [ 95.092481] Code: 44 24 08 48 85 db 0f 84 3b 02 00 00 48 89 ea 48 c1 e2 04 48 01 da 48 8b 4a 08 48 85 c9 74 1e 48 8b 32 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 8d 54 [ 95.092484] RSP: 0018:a1fd87ffbd70 EFLAGS: 00010206 [ 95.092487] RAX: a1fd87fb5058 RBX: 955e10f13200 RCX: a1fd87fb5388 [ 95.092489] RDX: 955e10f13200 RSI: 0003 RDI: a1fd87fb5000 [ 95.092491] RBP: R08: R09: [ 95.092493] R10: 0003 R11: 0004 R12: [ 95.092495] R13: R14: R15: a1fd87fb5000 [ 95.092497] FS: 7f0e39639640() GS:95606fd0() knlGS: [ 95.092500] CS: 0010 DS: ES: CR0: 80050033 [ 95.092503] CR2: 000b CR3: 0001b478 CR4: 26e0 [ 95.092505] Call Trace: [ 95.092509] [ 95.092513] ? _raw_read_unlock+0x18/0x30 [ 95.092519] kvm_set_memslot+0x3c2/0x4a0 [kvm] [ 95.092564] kvm_vm_ioctl+0x2cb/0xd80 [kvm] [ 95.092610] ?
Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
On Monday, 30 May 2022 09:59:06 CEST Adrian Immanuel Kiess wrote: > Package: src:linux > Version: 5.17.3-1 > Debian Release: bookworm/sid > APT policy: (990, 'testing') > > Kernel: Linux 5.16.0-6-amd64 (SMP w/4 CPU threads; PREEMPT) Does everything work correctly with kernel 5.16.0-6 ? Sid/Unstable currently has kernel 5.17.11 and it would be useful to know if the issue is still present in that version. Can you test that? If it is, then hopefully `dmesg` can give some clues. After you've noticed the described symptoms again, can you do `dmesg --level emerg,alert,crit,err,warn` and send that to this bug report? signature.asc Description: This is a digitally signed message part.
Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430
Package: src:linux Version: 5.17.3-1 Severity: important Dear Maintainer, * What led up to the situation? Upgrading my Debian/testing installation with apt -u dist-upgrade * What exactly did you do (or not do) that was effective (or ineffective)? apt -u dist-upgrade * What was the outcome of this action? The new 5.17-1 kernel has several issues * What outcome did you expect instead? Working Linux kernel with no issues currently in Debian/testing, the new 5.17 kernel has several issues. My LIBVIRT KVM VM instances fail to load with this new kernel. The disk access is slow in some programs, as for example showing the KVM VM instances in virt-manager fails due to broken disk access. After some hours of uptime, a kernel thread goes wild and uses 100% of the CPU. I am also running this new kernel (5.17) on a Debian/testing VMWARE ESXI VPS instance, at my providers place, where everything works fine. Thus this most probably an issue with my used hardware. My hardware is the following: adrian@g6 ~ % inxi -F System: Host: g6.lan.dac Kernel: 5.16.0-6-amd64 arch: x86_64 bits: 64 Desktop: Enlightenment v: 0.25.3 Distro: Debian GNU/Linux bookworm/sid Machine: Type: Desktop System: HP product: ProLiant ML110 G6 v: N/A serial: Mobo: Wistron model: ProLiant ML110 G6 serial: BIOS: HP v: O27 date: 08/26/2011 CPU: Info: quad core model: Intel Xeon X3430 bits: 64 type: MCP cache: L2: 1024 KiB Speed (MHz): avg: 1635 min/max: 1197/2394 cores: 1: 1692 2: 1617 3: 1649 4: 1585 Graphics: Device-1: AMD Oland GL [FirePro W2100] driver: radeon v: kernel Display: x11 server: X.Org v: 1.21.1.3 with: Xwayland v: 22.1.0 driver: X: loaded: radeon gpu: radeon resolution: 1920x1200 OpenGL: renderer: AMD OLAND (DRM 2.50.0 5.16.0-6-amd64 LLVM 13.0.1) v: 4.5 Mesa 21.3.8 Audio: Device-1: AMD Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series] driver: snd_hda_intel Device-2: ASUSTek Xonar U1 Audio Station type: USB driver: hid-generic,snd-usb-audio,usbhid Sound Server-1: ALSA v: k5.16.0-6-amd64 running: yes Sound Server-2: PipeWire v: 0.3.51 running: yes Network: Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3 IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8 IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown mac: fe:40:ab:83:94:4a IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full mac: fe:54:00:c2:24:94 IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full mac: fe:54:00:bf:35:8b IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full mac: fe:54:00:25:b0:8b IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full mac: fe:54:00:4a:c8:69 Drives: Local Storage: total: 11.79 TiB used: 5.87 TiB (49.8%) ID-1: /dev/sda vendor: Toshiba model: Q300 size: 447.13 GiB ID-2: /dev/sdb vendor: A-Data model: SP550 size: 447.13 GiB ID-3: /dev/sdc vendor: Toshiba model: HDWE140 size: 3.64 TiB ID-4: /dev/sdd vendor: Toshiba model: MG06ACA800E size: 7.28 TiB Partition: ID-1: / size: 437.53 GiB used: 91.98 GiB (21.0%) fs: ext4 dev: /dev/sda3 ID-2: /boot size: 1.2 GiB used: 272.2 MiB (22.2%) fs: ext4 dev: /dev/sda1 ID-3: /home size: 3.64 TiB used: 1.76 TiB (48.4%) fs: btrfs dev: /dev/sdc1 Swap: ID-1: swap-1 type: partition size: 46.66 GiB used: 139.5 MiB (0.3%) dev: /dev/sdb2 Sensors: Permissions: Unable to run ipmi sensors. Root privileges required. System Temperatures: cpu: 49.0 C mobo: N/A gpu: radeon temp: 39.0 C Fan Speeds (RPM): N/A Info: Processes: 410 Uptime: 1h 1m Memory: 15.62 GiB used: 9.43 GiB (60.4%) Shell: Zsh inxi: 3.3.16 Thank you very much, for your kind attention. Sincerely, Adrian Kiess -- Package-specific info: ** Kernel log: boot messages should be attached ** Model information sys_vendor: HP product_name: ProLiant ML110 G6 product_version: chassis_vendor: HP chassis_version: N/A bios_vendor: HP bios_version: O27 board_vendor: Wistron Corporation board_name: ProLiant ML110 G6 board_version: ** Network interface configuration: *** /etc/network/interfaces: auto lo iface lo inet loopback auto eth0 allow-hotplug eth0 iface eth0 inet manual iface eth0 inet6 manual auto br0 allow-hotplug br0 iface br0 inet static bridge_ports eth0 address 192.168.0.3/16 broadcast 192.168.255.255 gateway 192.168.0.1 dns-nameservers 127.0.0.1 dns-search lan.dac post-up /sbin/brctl setfd br0 0 addif br0 eth0 auto br0:0 allow-hotplug br0:0 iface br0:0 inet static address 192.168.122.1/16 broadcast 192.168.255.255 dns-nameservers 127.0.0.1 dns-search v-zone.lan.dac auto br0:1 allow-hotplug br0:1 iface br0:1 inet6 static address FC00:::::::0003 dns-nameservers ::1 netmask 64 auto br0:2 allow-hotplug