--- Begin Message ---
Thank you for the response and explanation.  Would you like me to file a 
Bugzilla entry for this? Or is there an existing bug ID already that could be 
used to track the issue?

Thanks,
Josh

From: Fiona Ebner <f.eb...@proxmox.com>
Date: Wednesday, September 4, 2024 at 5:59 AM
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Cc: Knight, Joshua <joshua.kni...@netscout.com>
Subject: Re: [PVE-User] QEMU crash with dpdk 22.11 app on Proxmox 8
External Email: This message originated outside of NETSCOUT. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

Hi,

Am 28.08.24 um 16:56 schrieb Knight, Joshua via pve-user:
>
>
> We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process 
> for a guest will crash while starting a DPDK application in the guest.
>
>
>   *   Proxmox 8.2.4 with QEMU 9.0.2-2
>   *   Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
>   *   Using virtio network interfaces that are up/connected
>   *   Binding interfaces with the (legacy) igb_uio driver
>
> When starting the application, the VM ssh connection will disconnect and the 
> VM will be powered off in the ui.
>
> root@karma06:~/dpdk-22.11# python3 
> /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
> root@karma06:~/dpdk-22.11# python3 
> /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
> root@karma06:~/dpdk-22.11# python3 
> /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
> root@karma06:~/dpdk-22.11# python3 
> /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23
>
> root@karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- 
> -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
> EAL: Detected CPU lcores: 6
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:12.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
> eth_virtio_pci_init(): Failed to init PCI device
> EAL: Requested device 0000:06:13.0 cannot be used
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
> EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
> TELEMETRY: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> Warning: NUMA should be configured manually by using --port-numa-config and 
> --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
>
> client_loop: send disconnect: Broken pipe
>
>
>
> A QEMU assertion is seen in the host’s system log. Using GDB we can see that 
> QEMU is aborted.
>
> karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: 
> kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
>
> Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7d999cc006c0 (LWP 36256)]
> __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
> no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> 44      ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (threadid=<optimized out>, 
> signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, 
> threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007d99a105afb2 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #3  0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: 
> %s%sAssertion `%s' failed.\n%n",
>     assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0", 
> file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> 
> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
> #5  0x00007d99a1053eb2 in __GI___assert_fail 
> (assertion=assertion@entry=0x5a9eb5a20f5e "ret == 0",
>     file=file@entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", 
> line=line@entry=1836,
>     function=function@entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> 
> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
> #6  0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at 
> ../accel/kvm/kvm-all.c:1836
> #7  kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at 
> ../accel/kvm/kvm-all.c:1821
> #8  0x00005a9eb540bed2 in virtio_pci_one_vector_unmask 
> (proxy=proxy@entry=0x5a9eb9f5ada0, queue_no=queue_no@entry=4294967295,
>     vector=vector@entry=0, msg=..., n=0x5a9eb9f63368) at 
> ../hw/virtio/virtio-pci.c:991
> #9  0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, 
> vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
> #10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, 
> vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
> #11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, 
> was_masked=<optimized out>) at ../hw/pci/msix.c:140
> #12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, 
> addr=12, value=<optimized out>, size=4, shift=<optimized out>,
>     mask=<optimized out>, attrs=...) at ../system/memory.c:497
> #13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr@entry=12, 
> value=value@entry=0x7d999cbfae58, size=size@entry=4,
>     access_size_min=<optimized out>, access_size_max=<optimized out>, 
> access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
>     mr=<optimized out>, attrs=...) at ../system/memory.c:573
> #14 0x00005a9eb560403c in memory_region_dispatch_write 
> (mr=mr@entry=0x5a9eb9f5b3e0, addr=addr@entry=12, data=<optimized out>,
>     op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1528
> #15 0x00005a9eb560b95f in flatview_write_continue_step 
> (attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028 "", mr_addr=12,
>     l=l@entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at 
> ../system/physmem.c:2713
> #16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, 
> l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
>     attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
> #17 flatview_write (fv=0x7d8d6c0796b0, addr=addr@entry=4260909068, 
> attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=len@entry=4)
>     at ../system/physmem.c:2774
> #18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, 
> attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
>     at ../system/physmem.c:2894
> #19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, 
> addr=4260909068, attrs=attrs@entry=..., buf=buf@entry=0x7d99a3433028, len=4,
>     is_write=<optimized out>) at ../system/physmem.c:2904
> #20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu@entry=0x5a9eb81e6890) at 
> ../accel/kvm/kvm-all.c:2917
> #21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5a9eb81e6890) 
> at ../accel/kvm/kvm-accel-ops.c:50
> #22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at 
> ../util/qemu-thread-posix.c:541
> #23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #24 0x00007d99a11287dc in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
>
> One thing that’s interesting about this backtrace is it seems to exactly 
> match an existing issue in QEMU that claims to be patched, and that patch 
> should be present in QEMU 9.0.2, the version running on this Proxmox host.
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/1928__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHuZ366b3Q$<https://urldefense.com/v3/__https:/gitlab.com/qemu-project/qemu/-/issues/1928__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHuZ366b3Q$>
>
> We’ve found a workaround by switching from the deprecated igb_uio driver to 
> the vfio-pci driver when binding the interfaces for dpdk. In this case the VM 
> does not crash. But I’m wondering if anyone has hit this before or if it’s a 
> known issue.  I would certainly not expect any operation in the guest to 
> cause QEMU to crash. It’s also odd that the crash seen claims to be patched 
> in 9.0.2.
>
> We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and 
> Intel processors. The crash does not occur on earlier releases such as 
> Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.
>
> Thanks,
> Josh
>

we do have a revert of that patch currently, because it caused some
regressions that sounded just as bad as the original issue [0].

A fix for the regressions has landed upstream now [1], and I'll take a
look at pulling it in and dropping the revert.

[0]:
https://urldefense.com/v3/__https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian*patches*extra*0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96__;Ly8v!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHueDRcgcQ$<https://urldefense.com/v3/__https:/git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian*patches*extra*0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch;h=d2de6d11ba1e2a2bd2ea8dccf660ac6e66b047d4;hb=582fd47901356342b8e0bef19d7d8fdc324d2d96__;Ly8v!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHueDRcgcQ$>
[1]:
https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git....@redhat.com/__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHulPlHOF4$<https://urldefense.com/v3/__https:/lore.kernel.org/qemu-devel/a8e63ff289d137197ad7a701a587cc432872d798.1724151593.git....@redhat.com/__;!!Nzg7nt7_!DDyzn2spNViFXSmqxBuK7TasCnLmJNaWW2Mqdm1FXd2DPjls_iN6d4QRHEnsnwc_7Atv-hrey5DbkrHulPlHOF4$>

Best Regards,
Fiona

--- End Message ---
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to