Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-06-20 Thread Adrian Kieß
Dear Diederik,

the new kernel:

root@g6 /opt # uname -a
Linux g6.lan.dac 5.18.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.2-1
(2022-06-06) x86_64 GNU/Linux

in Debian/testing works again in the way, that LIBVIRT KVM works again!

I most probably found the reason for the slow disk access on my machine:

Please see this new bug report:

Debian Bug report logs - #1013260
coreutils: /bin/chown very slow in conjunction with storebackup

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013260

Thank you very much for your answer!

Sincerely,

Adrian Kieß

On Mon, 30 May 2022 14:29:29 +0200
Diederik de Haas  wrote:

> Hi Adrian,
> 
> On Monday, 30 May 2022 13:14:26 CEST Adrian Kieß wrote:
> > yes, it works with the kernel 5.16.0-6, but disk access is still slow.
> 
> Ok, but that issue was also happening before 5.17 and is not a new problem.
> Do you have a(n old) kernel (still) installed which does NOT have this slow 
> disk access issue? If it happens on all kernel versions, then a hardware 
> issue 
> becomes much more likely to be the real culprit.
> 
> > For example, virt-manager/viewer sometimes needs a minute to connect to
> > the KVM instances on localhost. But not all applications are this slow;
> > for example the E-Mail client Sylpheed starts as fast as before and is
> > operating at fast speed.
> 
> In your initial report I noticed the following:
> > Network:
> >   Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3
> >   IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8
> >   IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown
> > mac: fe:40:ab:83:94:4a
> >   IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full
> > mac: fe:54:00:c2:24:94
> >   IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full
> > mac: fe:54:00:bf:35:8b
> >   IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full
> > mac: fe:54:00:25:b0:8b
> >   IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full
> > mac: fe:54:00:4a:c8:69
> 
> Is it normal/expected that IF-ID-[2-5] have "unknown speed: 10 Mbps duplex" ?
> If not, that may be worth looking into.
> 
> > I assume there is also another bug now in the system, not only due to
> > the new kernel. There is also another bug in GDM3, which I also
> > reported: Loading GDM3 after bootup and logging in as normal user is
> > also very, very slow.
> 
> Which sounds like the moment lots of files/data is read from disk to 
> initialize 
> the session, which does point to a disk issue.
> But if the initial boot isn't terribly slow as well, that would be odd.
> Or is /home mounted from another disk?
> 
> > As you suggested, I installed the kernel 5.17.11 from Debian/unstable
> > and booted into this kernel.
> > 
> > virt-manager and my KVM VM instances do work again, but one VM instance
> > failed to load after bootup. I restarted the VM instance, and it is now
> > also operating fine.
> 
> Good, that sounds like major progress :) It looks to me that the KVM problem 
> is now (mostly?) fixed.
> 
> > When opening the virt-viewer instance from virt-manager, connecting to
> > the VM is still very slow with kernel 5.17.11. Something must be wrong
> > I/O wise.
> 
> That still/all points to a disk problem
> 
> > I attached the dmesg output, you requested, as TXT file to this E-mail.
> 
> Couple of things I noticed in the dmesg output:
> 1) "[Firmware Warn]: HEST: Duplicated hardware error source ID: 9."
> https://lkml.org/lkml/2011/6/27/370 seems relevant for that as it provided 
> the 
> better warning, but it also points out that it *is* considered a firmware bug.
> I noticed your BIOS is from 2011. Is there a newer version available? If so, 
> it may be worth trying that out to see if that improves things.
> 
> 2) Several ACPI related warnings.
> No idea if or what should be done with that.
> 
> 3) "kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using 
> workaround" and "kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu"
> That looks like there are still KVM related issues (just not or less fatal)
> There have been other bug reports related to KVM.
> 
> 4) BUG: kernel NULL pointer dereference, address: 000b
> That's never good. The dmesg output also contains a Call Trace and several 
> mentions of KVM, so it looks like there's still something not right about it.
> I have no idea how to interpret those Call (or Stack) Traces, so hopefully 
> someone else chimes in who is familiar with that.
> 
> Cheers,
>   Diederik


-- 
With many greetings from Leipzig, Germany.
Adrian Immanuel Kieß 

Gothaer Straße 34
D-04155 Leipzig

 — < adr...@kiess.onl >

--SYSTEM--
echo "Your fortune cookie: " && /usr/games/fortune -c -s de
> (letzteworte) % Letzte Worte eines Fahrlehrers: "Die Ampel ist rot."

echo "g6.lan.dac uptime: " && /usr/bin/uptime
> 17:17:09 up 2 days, 1:21, 12 users, load average: 1,09, 0,90, 0,85



Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-05-30 Thread Jon DeVree
That dmesg output sounds a lot like bugs 1010916 and 1011168 and a Xeon
X3430 CPU would be another older one that predates XSAVE.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1011168#15

-- 
Jon
Doge Wrangler
X(7): A program for managing terminal windows. See also screen(1) and tmux(1).



Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-05-30 Thread Diederik de Haas
Hi Adrian,

On Monday, 30 May 2022 13:14:26 CEST Adrian Kieß wrote:
> yes, it works with the kernel 5.16.0-6, but disk access is still slow.

Ok, but that issue was also happening before 5.17 and is not a new problem.
Do you have a(n old) kernel (still) installed which does NOT have this slow 
disk access issue? If it happens on all kernel versions, then a hardware issue 
becomes much more likely to be the real culprit.

> For example, virt-manager/viewer sometimes needs a minute to connect to
> the KVM instances on localhost. But not all applications are this slow;
> for example the E-Mail client Sylpheed starts as fast as before and is
> operating at fast speed.

In your initial report I noticed the following:
> Network:
>   Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3
>   IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8
>   IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown
> mac: fe:40:ab:83:94:4a
>   IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full
> mac: fe:54:00:c2:24:94
>   IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full
> mac: fe:54:00:bf:35:8b
>   IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full
> mac: fe:54:00:25:b0:8b
>   IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full
> mac: fe:54:00:4a:c8:69

Is it normal/expected that IF-ID-[2-5] have "unknown speed: 10 Mbps duplex" ?
If not, that may be worth looking into.

> I assume there is also another bug now in the system, not only due to
> the new kernel. There is also another bug in GDM3, which I also
> reported: Loading GDM3 after bootup and logging in as normal user is
> also very, very slow.

Which sounds like the moment lots of files/data is read from disk to initialize 
the session, which does point to a disk issue.
But if the initial boot isn't terribly slow as well, that would be odd.
Or is /home mounted from another disk?

> As you suggested, I installed the kernel 5.17.11 from Debian/unstable
> and booted into this kernel.
> 
> virt-manager and my KVM VM instances do work again, but one VM instance
> failed to load after bootup. I restarted the VM instance, and it is now
> also operating fine.

Good, that sounds like major progress :) It looks to me that the KVM problem 
is now (mostly?) fixed.

> When opening the virt-viewer instance from virt-manager, connecting to
> the VM is still very slow with kernel 5.17.11. Something must be wrong
> I/O wise.

That still/all points to a disk problem

> I attached the dmesg output, you requested, as TXT file to this E-mail.

Couple of things I noticed in the dmesg output:
1) "[Firmware Warn]: HEST: Duplicated hardware error source ID: 9."
https://lkml.org/lkml/2011/6/27/370 seems relevant for that as it provided the 
better warning, but it also points out that it *is* considered a firmware bug.
I noticed your BIOS is from 2011. Is there a newer version available? If so, 
it may be worth trying that out to see if that improves things.

2) Several ACPI related warnings.
No idea if or what should be done with that.

3) "kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using 
workaround" and "kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu"
That looks like there are still KVM related issues (just not or less fatal)
There have been other bug reports related to KVM.

4) BUG: kernel NULL pointer dereference, address: 000b
That's never good. The dmesg output also contains a Call Trace and several 
mentions of KVM, so it looks like there's still something not right about it.
I have no idea how to interpret those Call (or Stack) Traces, so hopefully 
someone else chimes in who is familiar with that.

Cheers,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-05-30 Thread Adrian Kieß
Dear Diederik,

yes, it works with the kernel 5.16.0-6, but disk access is still slow.
For example, virt-manager/viewer sometimes needs a minute to connect to
the KVM instances on localhost. But not all applications are this slow;
for example the E-Mail client Sylpheed starts as fast as before and is
operating at fast speed. 

I assume there is also another bug now in the system, not only due to
the new kernel. There is also another bug in GDM3, which I also
reported: Loading GDM3 after bootup and logging in as normal user is
also very, very slow.

As you suggested, I installed the kernel 5.17.11 from Debian/unstable
and booted into this kernel.

virt-manager and my KVM VM instances do work again, but one VM instance
failed to load after bootup. I restarted the VM instance, and it is now
also operating fine.

When opening the virt-viewer instance from virt-manager, connecting to
the VM is still very slow with kernel 5.17.11. Something must be wrong
I/O wise.

I attached the dmesg output, you requested, as TXT file to this E-mail.

Thank you very much for your answer!

Sincerely,

Adrian Kiess

On Mon, 30 May 2022 11:45:29 +0200 Diederik de Haas
 wrote:

> On Monday, 30 May 2022 09:59:06 CEST Adrian Immanuel Kiess wrote:
> > Package: src:linux
> > Version: 5.17.3-1
> > Debian Release: bookworm/sid
> >   APT policy: (990, 'testing')
> > 
> > Kernel: Linux 5.16.0-6-amd64 (SMP w/4 CPU threads; PREEMPT)
> 
> Does everything work correctly with kernel 5.16.0-6 ?
> Sid/Unstable currently has  and it would be useful to know if 
> the issue is still present in that version. Can you test that?
> 
> If it is, then hopefully `dmesg` can give some clues. After you've noticed 
> the 
> described symptoms again, can you do `dmesg --level 
> emerg,alert,crit,err,warn` 
> and send that to this bug report?


[0.022299] ACPI: SPCR: Unexpected SPCR Access Width.  Defaulting to byte 
size
[0.233929] core: CPUID marked event: 'bus cycles' unavailable
[0.281581] pmd_set_huge: Cannot satisfy [mem 0xe000-0xe020] with a 
huge-page mapping due to MTRR override.
[0.285753] [Firmware Warn]: HEST: Duplicated hardware error source ID: 9.
[8.897715] ERST: Can not request [mem 0xbf7ff000-0xbf7f] for ERST.
[9.537343] ACPI Warning: SystemIO range 
0x1028-0x102F conflicts with OpRegion 
0x1000-0x102F (\_SB.PCI0.LPC0.PMIO) 
(20211217/utaddress-204)
[9.549414] ACPI Warning: SystemIO range 
0x1180-0x11AF conflicts with OpRegion 
0x1180-0x11AF (\_SB.PCI0.LPC0.GPOX) 
(20211217/utaddress-204)
[9.556866] lpc_ich: Resource conflict(s) found affecting gpio_ich
[9.678746] resource sanity check: requesting [mem 0x000c-0x000d], 
which spans more than PCI Bus :00 [mem 0x000d4000-0x000dbfff window]
[9.678756] caller pci_map_rom+0x79/0x1d0 mapping multiple BARs
[   13.853298] IPMI Watchdog: Unable to register misc device
[   14.329981] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. 
Using workaround
[   24.691518] kauditd_printk_skb: 85 callbacks suppressed
[   26.412447] kvm: KVM_SET_TSS_ADDR need to be called before entering vcpu
[   90.991726] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. 
Duplicate IMA measurements will not be recorded in the IMA log.
[   95.092388] BUG: kernel NULL pointer dereference, address: 000b
[   95.092396] #PF: supervisor write access in kernel mode
[   95.092398] #PF: error_code(0x0002) - not-present page
[   95.092404] Oops: 0002 [#1] PREEMPT SMP PTI
[   95.092407] CPU: 2 PID: 4379 Comm: CPU 0/KVM Not tainted 5.17.0-3-amd64 #1  
Debian 5.17.11-1
[   95.092411] Hardware name: HP ProLiant ML110 G6/ProLiant ML110 G6, BIOS O27  
  08/26/2011
[   95.092413] RIP: 0010:kvm_replace_memslot+0xcf/0x390 [kvm]
[   95.092481] Code: 44 24 08 48 85 db 0f 84 3b 02 00 00 48 89 ea 48 c1 e2 04 
48 01 da 48 8b 4a 08 48 85 c9 74 1e 48 8b 32 48 89 31 48 85 f6 74 04 <48> 89 4e 
08 48 c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 8d 54
[   95.092484] RSP: 0018:a1fd87ffbd70 EFLAGS: 00010206
[   95.092487] RAX: a1fd87fb5058 RBX: 955e10f13200 RCX: a1fd87fb5388
[   95.092489] RDX: 955e10f13200 RSI: 0003 RDI: a1fd87fb5000
[   95.092491] RBP:  R08:  R09: 
[   95.092493] R10: 0003 R11: 0004 R12: 
[   95.092495] R13:  R14:  R15: a1fd87fb5000
[   95.092497] FS:  7f0e39639640() GS:95606fd0() 
knlGS:
[   95.092500] CS:  0010 DS:  ES:  CR0: 80050033
[   95.092503] CR2: 000b CR3: 0001b478 CR4: 26e0
[   95.092505] Call Trace:
[   95.092509]  
[   95.092513]  ? _raw_read_unlock+0x18/0x30
[   95.092519]  kvm_set_memslot+0x3c2/0x4a0 [kvm]
[   95.092564]  kvm_vm_ioctl+0x2cb/0xd80 [kvm]
[   95.092610]  ? 

Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-05-30 Thread Diederik de Haas
On Monday, 30 May 2022 09:59:06 CEST Adrian Immanuel Kiess wrote:
> Package: src:linux
> Version: 5.17.3-1
> Debian Release: bookworm/sid
>   APT policy: (990, 'testing')
> 
> Kernel: Linux 5.16.0-6-amd64 (SMP w/4 CPU threads; PREEMPT)

Does everything work correctly with kernel 5.16.0-6 ?
Sid/Unstable currently has kernel 5.17.11 and it would be useful to know if 
the issue is still present in that version. Can you test that?

If it is, then hopefully `dmesg` can give some clues. After you've noticed the 
described symptoms again, can you do `dmesg --level emerg,alert,crit,err,warn` 
and send that to this bug report?

signature.asc
Description: This is a digitally signed message part.


Bug#1012100: linux-image-5.17-1: KVM LIBVIRT fails to start, slow disk access, and a kernel thread goes wild on Intel Xeon X3430

2022-05-30 Thread Adrian Immanuel Kiess
Package: src:linux
Version: 5.17.3-1
Severity: important

Dear Maintainer,

   * What led up to the situation?
 Upgrading my Debian/testing installation with apt -u dist-upgrade
   * What exactly did you do (or not do) that was effective (or
 ineffective)?
 apt -u dist-upgrade
   * What was the outcome of this action?
 The new 5.17-1 kernel has several issues
   * What outcome did you expect instead?
 Working Linux kernel with no issues

currently in Debian/testing, the new 5.17 kernel has several issues.

My LIBVIRT KVM VM instances fail to load with this new kernel.

The disk access is slow in some programs, as for example showing the KVM VM
instances in virt-manager fails due to broken disk access.

After some hours of uptime, a kernel thread goes wild and uses 100% of the CPU.

I am also running this new kernel (5.17) on a Debian/testing VMWARE ESXI VPS
instance, at my providers place, where everything works fine. Thus this most
probably an issue with my used hardware.

My hardware is the following:

adrian@g6 ~ % inxi -F
System:
  Host: g6.lan.dac Kernel: 5.16.0-6-amd64 arch: x86_64 bits: 64
Desktop: Enlightenment v: 0.25.3 Distro: Debian GNU/Linux bookworm/sid
Machine:
  Type: Desktop System: HP product: ProLiant ML110 G6 v: N/A
serial: 
  Mobo: Wistron model: ProLiant ML110 G6 serial: 
BIOS: HP v: O27 date: 08/26/2011
CPU:
  Info: quad core model: Intel Xeon X3430 bits: 64 type: MCP cache:
L2: 1024 KiB
  Speed (MHz): avg: 1635 min/max: 1197/2394 cores: 1: 1692 2: 1617 3: 1649
4: 1585
Graphics:
  Device-1: AMD Oland GL [FirePro W2100] driver: radeon v: kernel
  Display: x11 server: X.Org v: 1.21.1.3 with: Xwayland v: 22.1.0 driver:
X: loaded: radeon gpu: radeon resolution: 1920x1200
  OpenGL: renderer: AMD OLAND (DRM 2.50.0 5.16.0-6-amd64 LLVM 13.0.1)
v: 4.5 Mesa 21.3.8
Audio:
  Device-1: AMD Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000
  Series]
driver: snd_hda_intel
  Device-2: ASUSTek Xonar U1 Audio Station type: USB
driver: hid-generic,snd-usb-audio,usbhid
  Sound Server-1: ALSA v: k5.16.0-6-amd64 running: yes
  Sound Server-2: PipeWire v: 0.3.51 running: yes
Network:
  Device-1: Broadcom NetXtreme BCM5723 Gigabit Ethernet PCIe driver: tg3
  IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 64:31:50:d3:c0:f8
  IF-ID-1: br0 state: up speed: 1000 Mbps duplex: unknown
mac: fe:40:ab:83:94:4a
  IF-ID-2: vnet0 state: unknown speed: 10 Mbps duplex: full
mac: fe:54:00:c2:24:94
  IF-ID-3: vnet1 state: unknown speed: 10 Mbps duplex: full
mac: fe:54:00:bf:35:8b
  IF-ID-4: vnet2 state: unknown speed: 10 Mbps duplex: full
mac: fe:54:00:25:b0:8b
  IF-ID-5: vnet3 state: unknown speed: 10 Mbps duplex: full
mac: fe:54:00:4a:c8:69
Drives:
  Local Storage: total: 11.79 TiB used: 5.87 TiB (49.8%)
  ID-1: /dev/sda vendor: Toshiba model: Q300 size: 447.13 GiB
  ID-2: /dev/sdb vendor: A-Data model: SP550 size: 447.13 GiB
  ID-3: /dev/sdc vendor: Toshiba model: HDWE140 size: 3.64 TiB
  ID-4: /dev/sdd vendor: Toshiba model: MG06ACA800E size: 7.28 TiB
Partition:
  ID-1: / size: 437.53 GiB used: 91.98 GiB (21.0%) fs: ext4 dev: /dev/sda3
  ID-2: /boot size: 1.2 GiB used: 272.2 MiB (22.2%) fs: ext4 dev: /dev/sda1
  ID-3: /home size: 3.64 TiB used: 1.76 TiB (48.4%) fs: btrfs
dev: /dev/sdc1
Swap:
  ID-1: swap-1 type: partition size: 46.66 GiB used: 139.5 MiB (0.3%)
dev: /dev/sdb2
Sensors:
  Permissions: Unable to run ipmi sensors. Root privileges required.
  System Temperatures: cpu: 49.0 C mobo: N/A gpu: radeon temp: 39.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 410 Uptime: 1h 1m Memory: 15.62 GiB used: 9.43 GiB (60.4%)
  Shell: Zsh inxi: 3.3.16

Thank you very much, for your kind attention.

Sincerely,

Adrian Kiess


-- Package-specific info:
** Kernel log: boot messages should be attached

** Model information
sys_vendor: HP
product_name: ProLiant ML110 G6
product_version: 
chassis_vendor: HP
chassis_version: N/A
bios_vendor: HP
bios_version: O27   
board_vendor: Wistron Corporation
board_name: ProLiant ML110 G6
board_version: 

** Network interface configuration:
*** /etc/network/interfaces:

auto lo
iface lo inet loopback


auto eth0
allow-hotplug eth0

iface eth0 inet manual
iface eth0 inet6 manual


auto br0
allow-hotplug br0

iface br0 inet static
bridge_ports eth0
address 192.168.0.3/16
broadcast 192.168.255.255
gateway 192.168.0.1
dns-nameservers 127.0.0.1
dns-search lan.dac

post-up /sbin/brctl setfd br0 0 addif br0 eth0

auto br0:0
allow-hotplug br0:0

iface br0:0 inet static
address 192.168.122.1/16
broadcast 192.168.255.255
dns-nameservers 127.0.0.1
dns-search v-zone.lan.dac

auto br0:1
allow-hotplug br0:1

iface br0:1 inet6 static
address FC00:::::::0003
dns-nameservers ::1
netmask 64



auto br0:2
allow-hotplug