Bug#1035779: linux-image-5.10.0-22: kvm/qemu kernel null pointer dereference, VM doesn't start

2023-05-09 Thread Salvatore Bonaccorso
Control: tags -1 - moreinfo
Control: tags -1 + upstream

Hi Jared,

On Wed, May 10, 2023 at 03:13:05AM +, Jared Epp wrote:
> Hi Salvatore,
> 
> Thanks for the quick reply!

Thanks to you for your quick testing :)

> > > 
> > > This sounds similar to the
> > > https://forum.proxmox.com/threads/with-latest-5-15-104-1-pve-windows-server-vm-freeze-stuck.125294/
> > > issue. Would you be able to verify two things:
> > > 
> > > Check how the Windows VM is configured and if you pass the
> > > '+hv-tlbflush' flag.
> > > 
> 
> You're right, I am passing this flag. In libvirt I use:
> 
> 
>   
> 
>   
> 
>   
> 
> 
> I tried this fix first and it works. If I reboot into
> 5.10.0-22-amd64, and instead set  above, the
> VM boots.

Great, thanks for confirming this temporary workaround.

> > > Additionally, would the attached patch make the issue go away?
> 
> Thanks for the patch; it does fix the issue.  I set this back:
> , applied your patch to 5.10.0-22-amd64 and
> booted my newly patched kernel, and the VM boots.

That's good, thanks for testing. I will ask upstream to cherry-pick
the patch as well for the 5.10.y stable series so it can go in the
next update.

> > 
> > 
> > Now with patch attached.
> > 
> > Regards,
> > Salvatore
> 
> Thanks for your help.  This is my first time using the BTS so I hope
> I've done everything correctly. If there's anything else I should
> do, or if you want me to test something, let me know.

All perfect :)

Regards,
Salvatore



Bug#1035779: linux-image-5.10.0-22: kvm/qemu kernel null pointer dereference, VM doesn't start

2023-05-09 Thread Jared Epp
Hi Salvatore,

Thanks for the quick reply!

> > 
> > This sounds similar to the
> > https://forum.proxmox.com/threads/with-latest-5-15-104-1-pve-windows-server-vm-freeze-stuck.125294/
> > issue. Would you be able to verify two things:
> > 
> > Check how the Windows VM is configured and if you pass the
> > '+hv-tlbflush' flag.
> > 

You're right, I am passing this flag. In libvirt I use:


  

  

  


I tried this fix first and it works. If I reboot into 5.10.0-22-amd64, and 
instead set  above, the VM boots.

> > Additionally, would the attached patch make the issue go away?

Thanks for the patch; it does fix the issue.  I set this back: , applied your patch to 5.10.0-22-amd64 and booted my newly patched 
kernel, and the VM boots.

> 
> 
> Now with patch attached.
> 
> Regards,
> Salvatore

Thanks for your help.  This is my first time using the BTS so I hope I've done 
everything correctly. If there's anything else I should do, or if you want me 
to test something, let me know.

Jared



Bug#1035779: linux-image-5.10.0-22: kvm/qemu kernel null pointer dereference, VM doesn't start

2023-05-09 Thread Salvatore Bonaccorso
On Tue, May 09, 2023 at 09:36:45PM +0200, Salvatore Bonaccorso wrote:
> Control: tags -1 + moreinfo
> 
> Hi Jared,
> 
> On Mon, May 08, 2023 at 11:50:21PM -0600, Jared Epp wrote:
> > Package: src:linux
> > Version: 5.10.178-3
> > Severity: normal
> > X-Debbugs-Cc: jared...@pm.me
> > 
> > Dear Maintainer,
> > 
> > After I updated my Debian 11 host kernel to 5.10.0-22, my VM guest
> > (Windows 10 using KVM / qemu / libvirt) no longer boots and there's
> > a kernel null pointer dereference along with a call trace, etc. in
> > the system log. If I reboot and choose 5.10.0-21 in grub, the VM
> > works as expected and there's no error in the log.
> > 
> > Below, reportbug included part of the kernel log but it missed part
> > of the problem so I pasted that in, I hope that's okay. If you need
> > any other information let me know.
> > 
> > Thanks
> > 
> > Jared Epp
> > 
> > -- Package-specific info:
> > ** Version:
> > Linux version 5.10.0-22-amd64 (debian-ker...@lists.debian.org) (gcc-10 
> > (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) 
> > #1 SMP Debian 5.10.178-3 (2023-04-22)
> > 
> > ** Command line:
> > BOOT_IMAGE=/vmlinuz-5.10.0-22-amd64 root=/dev/mapper/panthro--vg-root ro 
> > quiet mem_sleep_default=s2idle default_hugepagesz=1G hugepages=8
> > 
> > ** Tainted: D (128)
> >  * kernel died recently, i.e. there was an OOPS or BUG
> > 
> > ** Kernel log:
> > [   51.576266] BUG: kernel NULL pointer dereference, address: 
> > 
> > [   51.576269] #PF: supervisor read access in kernel mode
> > [   51.576270] #PF: error_code(0x) - not-present page
> > [   51.576271] PGD 0 P4D 0 
> > [   51.576273] Oops:  [#1] SMP NOPTI
> > [   51.576275] CPU: 6 PID: 2209 Comm: CPU 0/KVM Not tainted 5.10.0-22-amd64 
> > #1 Debian 5.10.178-3
> > [   51.576276] Hardware name: ASUS System Product Name/CROSSHAIR VI HERO, 
> > BIOS 8701 02/08/2023
> > [   51.576280] RIP: 0010:find_first_bit+0x19/0x40
> > [   51.576281] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc cc cc cc 49 
> > 89 f0 48 85 f6 74 28 31 c0 eb 0d 48 83 c0 40 48 83 c7 08 4c 39 c0 73 17 
> > <48> 8b 17 48 85 d2 74 eb f3 48 0f bc d2 48 01 d0 49 39 c0 4c 0f 47
> > [   51.576282] RSP: 0018:a99ac3a23a30 EFLAGS: 00010246
> > [   51.576283] RAX:  RBX: a99ac38a5000 RCX: 
> > 
> > [   51.576283] RDX:  RSI: 0120 RDI: 
> > 
> > [   51.576284] RBP:  R08: 0120 R09: 
> > 94e2c1ae72a8
> > [   51.576284] R10: 000f R11:  R12: 
> > 94e2c1ae72a8
> > [   51.576285] R13: 0323 R14: 0003 R15: 
> > 0006
> > [   51.576286] FS:  (0053) GS:94e89e98(002b) 
> > knlGS:f8033f006000
> > [   51.576286] CS:  0010 DS:  ES:  CR0: 80050033
> > [   51.576287] CR2:  CR3: 00018e4ee000 CR4: 
> > 00750ee0
> > [   51.576287] PKRU: 5554
> > [   51.576288] Call Trace:
> > [   51.576307]  kvm_make_vcpus_request_mask+0x38/0xf0 [kvm]
> > [   51.576319]  kvm_hv_flush_tlb+0x147/0x370 [kvm]
> > [   51.576328]  ? kvm_page_track_is_active+0x12/0x50 [kvm]
> > [   51.576336]  ? make_spte+0x146/0x260 [kvm]
> > [   51.576344]  ? mmu_spte_update+0x11/0x1c0 [kvm]
> > [   51.576351]  ? set_spte+0xee/0x140 [kvm]
> > [   51.576358]  ? mmu_set_spte+0x327/0x4a0 [kvm]
> > [   51.576365]  ? kvm_release_pfn_clean+0x22/0x40 [kvm]
> > [   51.576372]  ? direct_page_fault+0x223/0xa20 [kvm]
> > [   51.576374]  ? svm_get_segment+0x18/0x100 [kvm_amd]
> > [   51.576382]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
> > [   51.576383]  ? svm_get_segment+0x18/0x100 [kvm_amd]
> > [   51.576390]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
> > [   51.576398]  kvm_hv_hypercall+0x176/0x580 [kvm]
> > [   51.576401]  ? get_cpu_vendor+0x40/0xa0
> > [   51.576403]  ? native_load_tr_desc+0x67/0x70
> > [   51.576411]  kvm_arch_vcpu_ioctl_run+0xbe8/0x1740 [kvm]
> > [   51.576419]  kvm_vcpu_ioctl+0x21e/0x5b0 [kvm]
> > [   51.576422]  __x64_sys_ioctl+0x8b/0xc0
> > [   51.576424]  do_syscall_64+0x33/0x80
> > [   51.576426]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> > [   51.576428] RIP: 0033:0x7fad816f2237
> > [   51.576429] Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 
> > c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 
> > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
> > [   51.576429] RSP: 002b:7fad7ce65508 EFLAGS: 0246 ORIG_RAX: 
> > 0010
> > [   51.576430] RAX: ffda RBX: ae80 RCX: 
> > 7fad816f2237
> > [   51.576431] RDX:  RSI: ae80 RDI: 
> > 001c
> > [   51.576431] RBP: 55a3e17511c0 R08: 55a3df109848 R09: 
> > 55a3df5335c0
> > [   51.576432] R10:  R11: 0246 R12: 
> > 
> > [   51.576432] R13: 55a3df54fbc0 R14: 7fad7ce657c0 R15: 
> > 

Bug#1035779: linux-image-5.10.0-22: kvm/qemu kernel null pointer dereference, VM doesn't start

2023-05-09 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hi Jared,

On Mon, May 08, 2023 at 11:50:21PM -0600, Jared Epp wrote:
> Package: src:linux
> Version: 5.10.178-3
> Severity: normal
> X-Debbugs-Cc: jared...@pm.me
> 
> Dear Maintainer,
> 
> After I updated my Debian 11 host kernel to 5.10.0-22, my VM guest
> (Windows 10 using KVM / qemu / libvirt) no longer boots and there's
> a kernel null pointer dereference along with a call trace, etc. in
> the system log. If I reboot and choose 5.10.0-21 in grub, the VM
> works as expected and there's no error in the log.
> 
> Below, reportbug included part of the kernel log but it missed part
> of the problem so I pasted that in, I hope that's okay. If you need
> any other information let me know.
> 
> Thanks
> 
> Jared Epp
> 
> -- Package-specific info:
> ** Version:
> Linux version 5.10.0-22-amd64 (debian-ker...@lists.debian.org) (gcc-10 
> (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) 
> #1 SMP Debian 5.10.178-3 (2023-04-22)
> 
> ** Command line:
> BOOT_IMAGE=/vmlinuz-5.10.0-22-amd64 root=/dev/mapper/panthro--vg-root ro 
> quiet mem_sleep_default=s2idle default_hugepagesz=1G hugepages=8
> 
> ** Tainted: D (128)
>  * kernel died recently, i.e. there was an OOPS or BUG
> 
> ** Kernel log:
> [   51.576266] BUG: kernel NULL pointer dereference, address: 
> [   51.576269] #PF: supervisor read access in kernel mode
> [   51.576270] #PF: error_code(0x) - not-present page
> [   51.576271] PGD 0 P4D 0 
> [   51.576273] Oops:  [#1] SMP NOPTI
> [   51.576275] CPU: 6 PID: 2209 Comm: CPU 0/KVM Not tainted 5.10.0-22-amd64 
> #1 Debian 5.10.178-3
> [   51.576276] Hardware name: ASUS System Product Name/CROSSHAIR VI HERO, 
> BIOS 8701 02/08/2023
> [   51.576280] RIP: 0010:find_first_bit+0x19/0x40
> [   51.576281] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc cc cc cc 49 89 
> f0 48 85 f6 74 28 31 c0 eb 0d 48 83 c0 40 48 83 c7 08 4c 39 c0 73 17 <48> 8b 
> 17 48 85 d2 74 eb f3 48 0f bc d2 48 01 d0 49 39 c0 4c 0f 47
> [   51.576282] RSP: 0018:a99ac3a23a30 EFLAGS: 00010246
> [   51.576283] RAX:  RBX: a99ac38a5000 RCX: 
> 
> [   51.576283] RDX:  RSI: 0120 RDI: 
> 
> [   51.576284] RBP:  R08: 0120 R09: 
> 94e2c1ae72a8
> [   51.576284] R10: 000f R11:  R12: 
> 94e2c1ae72a8
> [   51.576285] R13: 0323 R14: 0003 R15: 
> 0006
> [   51.576286] FS:  (0053) GS:94e89e98(002b) 
> knlGS:f8033f006000
> [   51.576286] CS:  0010 DS:  ES:  CR0: 80050033
> [   51.576287] CR2:  CR3: 00018e4ee000 CR4: 
> 00750ee0
> [   51.576287] PKRU: 5554
> [   51.576288] Call Trace:
> [   51.576307]  kvm_make_vcpus_request_mask+0x38/0xf0 [kvm]
> [   51.576319]  kvm_hv_flush_tlb+0x147/0x370 [kvm]
> [   51.576328]  ? kvm_page_track_is_active+0x12/0x50 [kvm]
> [   51.576336]  ? make_spte+0x146/0x260 [kvm]
> [   51.576344]  ? mmu_spte_update+0x11/0x1c0 [kvm]
> [   51.576351]  ? set_spte+0xee/0x140 [kvm]
> [   51.576358]  ? mmu_set_spte+0x327/0x4a0 [kvm]
> [   51.576365]  ? kvm_release_pfn_clean+0x22/0x40 [kvm]
> [   51.576372]  ? direct_page_fault+0x223/0xa20 [kvm]
> [   51.576374]  ? svm_get_segment+0x18/0x100 [kvm_amd]
> [   51.576382]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
> [   51.576383]  ? svm_get_segment+0x18/0x100 [kvm_amd]
> [   51.576390]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
> [   51.576398]  kvm_hv_hypercall+0x176/0x580 [kvm]
> [   51.576401]  ? get_cpu_vendor+0x40/0xa0
> [   51.576403]  ? native_load_tr_desc+0x67/0x70
> [   51.576411]  kvm_arch_vcpu_ioctl_run+0xbe8/0x1740 [kvm]
> [   51.576419]  kvm_vcpu_ioctl+0x21e/0x5b0 [kvm]
> [   51.576422]  __x64_sys_ioctl+0x8b/0xc0
> [   51.576424]  do_syscall_64+0x33/0x80
> [   51.576426]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> [   51.576428] RIP: 0033:0x7fad816f2237
> [   51.576429] Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 c7 
> c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
> [   51.576429] RSP: 002b:7fad7ce65508 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [   51.576430] RAX: ffda RBX: ae80 RCX: 
> 7fad816f2237
> [   51.576431] RDX:  RSI: ae80 RDI: 
> 001c
> [   51.576431] RBP: 55a3e17511c0 R08: 55a3df109848 R09: 
> 55a3df5335c0
> [   51.576432] R10:  R11: 0246 R12: 
> 
> [   51.576432] R13: 55a3df54fbc0 R14: 7fad7ce657c0 R15: 
> 00802000
> [   51.576434] Modules linked in: xt_nat veth nft_chain_nat xt_MASQUERADE 
> nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter vhost_net vhost 
> vhost_iotlb tap tun bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 xt_hl 
> ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4

Bug#1035779: linux-image-5.10.0-22: kvm/qemu kernel null pointer dereference, VM doesn't start

2023-05-08 Thread Jared Epp
Package: src:linux
Version: 5.10.178-3
Severity: normal
X-Debbugs-Cc: jared...@pm.me

Dear Maintainer,

After I updated my Debian 11 host kernel to 5.10.0-22, my VM guest (Windows 10 
using KVM / qemu / libvirt) no longer boots and there's a kernel null pointer 
dereference along with a call trace, etc. in the system log. If I reboot and 
choose 5.10.0-21 in grub, the VM works as expected and there's no error in the 
log.

Below, reportbug included part of the kernel log but it missed part of the 
problem so I pasted that in, I hope that's okay. If you need any other 
information let me know.

Thanks

Jared Epp

-- Package-specific info:
** Version:
Linux version 5.10.0-22-amd64 (debian-ker...@lists.debian.org) (gcc-10 (Debian 
10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP 
Debian 5.10.178-3 (2023-04-22)

** Command line:
BOOT_IMAGE=/vmlinuz-5.10.0-22-amd64 root=/dev/mapper/panthro--vg-root ro quiet 
mem_sleep_default=s2idle default_hugepagesz=1G hugepages=8

** Tainted: D (128)
 * kernel died recently, i.e. there was an OOPS or BUG

** Kernel log:
[   51.576266] BUG: kernel NULL pointer dereference, address: 
[   51.576269] #PF: supervisor read access in kernel mode
[   51.576270] #PF: error_code(0x) - not-present page
[   51.576271] PGD 0 P4D 0 
[   51.576273] Oops:  [#1] SMP NOPTI
[   51.576275] CPU: 6 PID: 2209 Comm: CPU 0/KVM Not tainted 5.10.0-22-amd64 #1 
Debian 5.10.178-3
[   51.576276] Hardware name: ASUS System Product Name/CROSSHAIR VI HERO, BIOS 
8701 02/08/2023
[   51.576280] RIP: 0010:find_first_bit+0x19/0x40
[   51.576281] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc cc cc cc 49 89 
f0 48 85 f6 74 28 31 c0 eb 0d 48 83 c0 40 48 83 c7 08 4c 39 c0 73 17 <48> 8b 17 
48 85 d2 74 eb f3 48 0f bc d2 48 01 d0 49 39 c0 4c 0f 47
[   51.576282] RSP: 0018:a99ac3a23a30 EFLAGS: 00010246
[   51.576283] RAX:  RBX: a99ac38a5000 RCX: 
[   51.576283] RDX:  RSI: 0120 RDI: 
[   51.576284] RBP:  R08: 0120 R09: 94e2c1ae72a8
[   51.576284] R10: 000f R11:  R12: 94e2c1ae72a8
[   51.576285] R13: 0323 R14: 0003 R15: 0006
[   51.576286] FS:  (0053) GS:94e89e98(002b) 
knlGS:f8033f006000
[   51.576286] CS:  0010 DS:  ES:  CR0: 80050033
[   51.576287] CR2:  CR3: 00018e4ee000 CR4: 00750ee0
[   51.576287] PKRU: 5554
[   51.576288] Call Trace:
[   51.576307]  kvm_make_vcpus_request_mask+0x38/0xf0 [kvm]
[   51.576319]  kvm_hv_flush_tlb+0x147/0x370 [kvm]
[   51.576328]  ? kvm_page_track_is_active+0x12/0x50 [kvm]
[   51.576336]  ? make_spte+0x146/0x260 [kvm]
[   51.576344]  ? mmu_spte_update+0x11/0x1c0 [kvm]
[   51.576351]  ? set_spte+0xee/0x140 [kvm]
[   51.576358]  ? mmu_set_spte+0x327/0x4a0 [kvm]
[   51.576365]  ? kvm_release_pfn_clean+0x22/0x40 [kvm]
[   51.576372]  ? direct_page_fault+0x223/0xa20 [kvm]
[   51.576374]  ? svm_get_segment+0x18/0x100 [kvm_amd]
[   51.576382]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
[   51.576383]  ? svm_get_segment+0x18/0x100 [kvm_amd]
[   51.576390]  ? kvm_get_cs_db_l_bits+0x35/0x70 [kvm]
[   51.576398]  kvm_hv_hypercall+0x176/0x580 [kvm]
[   51.576401]  ? get_cpu_vendor+0x40/0xa0
[   51.576403]  ? native_load_tr_desc+0x67/0x70
[   51.576411]  kvm_arch_vcpu_ioctl_run+0xbe8/0x1740 [kvm]
[   51.576419]  kvm_vcpu_ioctl+0x21e/0x5b0 [kvm]
[   51.576422]  __x64_sys_ioctl+0x8b/0xc0
[   51.576424]  do_syscall_64+0x33/0x80
[   51.576426]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[   51.576428] RIP: 0033:0x7fad816f2237
[   51.576429] Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
[   51.576429] RSP: 002b:7fad7ce65508 EFLAGS: 0246 ORIG_RAX: 
0010
[   51.576430] RAX: ffda RBX: ae80 RCX: 7fad816f2237
[   51.576431] RDX:  RSI: ae80 RDI: 001c
[   51.576431] RBP: 55a3e17511c0 R08: 55a3df109848 R09: 55a3df5335c0
[   51.576432] R10:  R11: 0246 R12: 
[   51.576432] R13: 55a3df54fbc0 R14: 7fad7ce657c0 R15: 00802000
[   51.576434] Modules linked in: xt_nat veth nft_chain_nat xt_MASQUERADE 
nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter vhost_net vhost 
vhost_iotlb tap tun bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 xt_hl 
ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_multiport nft_limit 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi 
snd_hda_intel nls_ascii snd_intel_dspcfg nls_cp437 soundwire_intel vfat 
soundwire_generic_allocation fat snd_soc_core snd_compress soundwire_cadence 
snd_hda_codec edac_mce_amd xt_limit xt_addrtype kvm_amd snd_hda_