Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
@Jack Wang, Maybe four io_uring patches in 5.4.71 fixes the issue for you as well? Thanks, Pankaj > Hi Jens. > > On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe wrote: > > > > Would be great if you could try 5.4.71 and see if that helps for your > > issue. > > > > Oh wow, yeah it did fix the issue. > > I'm able to reliably turn off and start the VM multiple times in a row. > Double checked by confirming QEMU is dynamically linked to liburing.so.1. > > Looks like those 4 io_uring fixes helped. > > Thanks! >
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
On 10/17/20 8:29 AM, Ju Hyung Park wrote: > Hi Jens. > > On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe wrote: >> >> Would be great if you could try 5.4.71 and see if that helps for your >> issue. >> > > Oh wow, yeah it did fix the issue. > > I'm able to reliably turn off and start the VM multiple times in a row. > Double checked by confirming QEMU is dynamically linked to liburing.so.1. > > Looks like those 4 io_uring fixes helped. Awesome, thanks for testing! -- Jens Axboe
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
Hi Jens. On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe wrote: > > Would be great if you could try 5.4.71 and see if that helps for your > issue. > Oh wow, yeah it did fix the issue. I'm able to reliably turn off and start the VM multiple times in a row. Double checked by confirming QEMU is dynamically linked to liburing.so.1. Looks like those 4 io_uring fixes helped. Thanks!
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
On 10/16/20 12:04 PM, Ju Hyung Park wrote: > A small update: > > As per Stefano's suggestion, disabling io_uring support from QEMU from > the configuration step did fix the problem and I'm no longer having > hangs. > > Looks like it __is__ an io_uring issue :( Would be great if you could try 5.4.71 and see if that helps for your issue. -- Jens Axboe
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
A small update: As per Stefano's suggestion, disabling io_uring support from QEMU from the configuration step did fix the problem and I'm no longer having hangs. Looks like it __is__ an io_uring issue :( Btw, I used liburing fe50048 for linking QEMU. Thanks. On Fri, Oct 2, 2020 at 4:35 PM Stefano Garzarella wrote: > > Hi Ju, > > On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote: > > Hi Stefano, > > > > On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella > > wrote: > > > Please, can you share the qemu command line that you are using? > > > This can be useful for the analysis. > > > > Sure. > > Thanks for sharing. > > The issue seems related to io_uring and the new io_uring fd monitoring > implementation available from QEMU 5.0. > > I'll try to reproduce. > > For now, as a workaround, you can rebuild qemu by disabling io-uring support: > > ../configure --disable-linux-io-uring ... > > > Thanks, > Stefano > > > > > QEMU: > > /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S > > -object > > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes > > -blockdev > > {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} > > -blockdev > > {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} > > -machine > > pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format > > -cpu > > Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset > > -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10 > > -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2 > > -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none > > -no-user-config -nodefaults -chardev > > socket,id=charmonitor,fd=32,server,nowait -mon > > chardev=charmonitor,id=monitor,mode=control -rtc > > base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay > > -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global > > ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device > > pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 > > -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 > > -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 > > -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 > > -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device > > qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev > > {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} > > -device > > virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on > > -netdev tap,fd=34,id=hostnet0 -device > > e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3 > > -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device > > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device > > vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0 > > -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object > > rng-random,id=objrng0,filename=/dev/urandom -device > > virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg > > timestamp=on > > > > And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM. > > > > > > win10 > > 7ccc3031-1dab-4267-b72a-d60065b5ff7f > > > > > xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0;> > > http://microsoft.com/win/10"/> > > > > > > 8388608 > > 8388608 > > > > > > > > > > 4 > > > > > > > > > > > > > > > > hvm > > > type="pflash">/usr/share/OVMF/OVMF_CODE.fd > > /var/lib/libvirt/qemu/nvram/win10_VARS.fd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
Hi Ju, On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote: > Hi Stefano, > > On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella wrote: > > Please, can you share the qemu command line that you are using? > > This can be useful for the analysis. > > Sure. Thanks for sharing. The issue seems related to io_uring and the new io_uring fd monitoring implementation available from QEMU 5.0. I'll try to reproduce. For now, as a workaround, you can rebuild qemu by disabling io-uring support: ../configure --disable-linux-io-uring ... Thanks, Stefano > > QEMU: > /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S > -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes > -blockdev > {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} > -blockdev > {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} > -blockdev > {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} > -blockdev > {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} > -machine > pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format > -cpu > Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset > -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10 > -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2 > -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none > -no-user-config -nodefaults -chardev > socket,id=charmonitor,fd=32,server,nowait -mon > chardev=charmonitor,id=monitor,mode=control -rtc > base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay > -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global > ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device > pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 > -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 > -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 > -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 > -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device > qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev > {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} > -blockdev > {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} > -device > virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on > -netdev tap,fd=34,id=hostnet0 -device > e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3 > -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device > vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0 > -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object > rng-random,id=objrng0,filename=/dev/urandom -device > virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg > timestamp=on > > And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM. > > > win10 > 7ccc3031-1dab-4267-b72a-d60065b5ff7f > > xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0;> > http://microsoft.com/win/10"/> > > > 8388608 > 8388608 > > > > > 4 > > > > > > > > hvm > /usr/share/OVMF/OVMF_CODE.fd > /var/lib/libvirt/qemu/nvram/win10_VARS.fd > > > > > > > > > > > > > > > > > > > > > > > > > > > > destroy > restart > destroy > > > > > > /usr/bin/qemu-system-x86_64 > >detect_zeroes="unmap"/> >dev="/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9"/> > >function="0x0"/> > > > > > >function="0x0" multifunction="on"/> > > > > >function="0x1"/> > > > > >function="0x2"/> > > > > >function="0x3"/> > > > >
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
Hi Stefano, On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella wrote: > Please, can you share the qemu command line that you are using? > This can be useful for the analysis. Sure. QEMU: /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes -blockdev {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -cpu Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10 -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2 -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=34,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0 -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg timestamp=on And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM. win10 7ccc3031-1dab-4267-b72a-d60065b5ff7f http://libosinfo.org/xmlns/libvirt/domain/1.0;> http://microsoft.com/win/10"/> 8388608 8388608 4 hvm /usr/share/OVMF/OVMF_CODE.fd /var/lib/libvirt/qemu/nvram/win10_VARS.fd destroy restart destroy /usr/bin/qemu-system-x86_64 /dev/urandom
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
Stefano Garzarella 于2020年10月1日周四 上午10:59写道: > > +Cc: qemu-devel@nongnu.org > > Hi, > > On Thu, Oct 01, 2020 at 01:26:51AM +0900, Ju Hyung Park wrote: > > Hi everyone. > > > > I have recently switched to a setup running QEMU 5.0(which supports > > io_uring) for a Windows 10 guest on Linux v5.4.63. > > The QEMU hosts /dev/nvme0n1p3 to the guest with virtio-blk with > > discard/unmap enabled. > > Please, can you share the qemu command line that you are using? > This can be useful for the analysis. > > Thanks, > Stefano > > > > > I've been having a weird issue where the system would randomly hang > > whenever I turn on or shutdown the guest. The host will stay up for a > > bit and then just hang. No response on SSH, etc. Even ping doesn't > > work. > > > > It's been hard to even get a log to debug the issue, but I've been > > able to get a show-backtrace-all-active-cpus sysrq dmesg on the most > > recent encounter with the issue and it's showing some io_uring > > functions. > > > > Since I've been encountering the issue ever since I switched to QEMU > > 5.0, I suspect io_uring may be the culprit to the issue. > > > > While I'd love to try out the mainline kernel, it's currently not > > feasible at the moment as I have to stay in linux-5.4.y. Backporting > > mainline's io_uring also seems to be a non-trivial job. > > > > Any tips would be appreciated. I can build my own kernel and I'm > > willing to try out (backported) patches. > > > > Thanks. > > > > [243683.539303] NMI backtrace for cpu 1 > > [243683.539303] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P > > W O 5.4.63+ #1 > > [243683.539303] Hardware name: System manufacturer System Product > > Name/PRIME Z370-A, BIOS 2401 07/12/2019 > > [243683.539304] RIP: 0010:io_uring_flush+0x98/0x140 > > [243683.539304] Code: e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a > > 08 48 89 4e 08 48 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 > > 40 00 00 <48> 85 f6 74 15 4c 3b 62 c8 75 0f ba 01 00 00 00 bf 02 00 00 > > 00 e8 > > [243683.539304] RSP: 0018:8881f20c3e28 EFLAGS: 0006 > > [243683.539305] RAX: 888419cd94e0 RBX: 88842ba49800 RCX: > > 888419cd94e0 > > [243683.539305] RDX: 888419cd94e0 RSI: 888419cd94d0 RDI: > > 88842ba49af8 > > [243683.539306] RBP: 88842ba49af8 R08: 0001 R09: > > 88840d17aaf8 > > [243683.539306] R10: 0001 R11: ffec R12: > > 88843c68c080 > > [243683.539306] R13: 88842ba49ae8 R14: 0001 R15: > > > > [243683.539307] FS: () GS:88843ea8() > > knlGS: > > [243683.539307] CS: 0010 DS: ES: CR0: 80050033 > > [243683.539307] CR2: 7f3234b31f90 CR3: 02608001 CR4: > > 003726e0 > > [243683.539307] Call Trace: > > [243683.539308] ? filp_close+0x2a/0x60 > > [243683.539308] ? put_files_struct.part.0+0x57/0xb0 > > [243683.539309] ? do_exit+0x321/0xa70 > > [243683.539309] ? do_group_exit+0x35/0x90 > > [243683.539309] ? __x64_sys_exit_group+0xf/0x10 > > [243683.539309] ? do_syscall_64+0x41/0x160 > > [243683.539309] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [243684.753272] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > [243684.753278] rcu: 1-...0: (1 GPs behind) > > idle=a5e/1/0x4000 softirq=7893711/7893712 fqs=2955 > > [243684.753280] (detected by 3, t=6002 jiffies, g=17109677, q=117817) > > [243684.753282] Sending NMI from CPU 3 to CPUs 1: > > [243684.754285] NMI backtrace for cpu 1 > > [243684.754285] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P > > W O 5.4.63+ #1 > > [243684.754286] Hardware name: System manufacturer System Product > > Name/PRIME Z370-A, BIOS 2401 07/12/2019 > > [243684.754286] RIP: 0010:io_uring_flush+0x83/0x140 > > [243684.754287] Code: 89 ef e8 00 36 92 00 48 8b 83 e8 02 00 00 49 39 > > c5 74 52 4d 85 e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a 08 48 > > 89 4e 08 <48> 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 40 00 > > 00 48 > > [243684.754287] RSP: 0018:8881f20c3e28 EFLAGS: 0002 > > [243684.754288] RAX: 888419cd94e0 RBX: 88842ba49800 RCX: > > 888419cd94e0 > > [243684.754288] RDX: 888419cd94e0 RSI: 888419cd94e0 RDI: > > 88842ba49af8 > > [243684.754289] RBP: 88842ba49af8 R08: 0001 R09: > > 88840d17aaf8 > > [243684.754289] R10: 0001 R11: ffec R12: > > 88843c68c080 > > [243684.754289] R13: 88842ba49ae8 R14: 0001 R15: > > > > [243684.754290] FS: () GS:88843ea8() > > knlGS: > > [243684.754290] CS: 0010 DS: ES: CR0: 80050033 > > [243684.754291] CR2: 7f3234b31f90 CR3: 02608001 CR4: > > 003726e0 > > [243684.754291] Call Trace: > > [243684.754291] ? filp_close+0x2a/0x60 > > [243684.754291] ? put_files_struct.part.0+0x57/0xb0 > > [243684.754292] ? do_exit+0x321/0xa70 > >
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
+Cc: qemu-devel@nongnu.org Hi, On Thu, Oct 01, 2020 at 01:26:51AM +0900, Ju Hyung Park wrote: > Hi everyone. > > I have recently switched to a setup running QEMU 5.0(which supports > io_uring) for a Windows 10 guest on Linux v5.4.63. > The QEMU hosts /dev/nvme0n1p3 to the guest with virtio-blk with > discard/unmap enabled. Please, can you share the qemu command line that you are using? This can be useful for the analysis. Thanks, Stefano > > I've been having a weird issue where the system would randomly hang > whenever I turn on or shutdown the guest. The host will stay up for a > bit and then just hang. No response on SSH, etc. Even ping doesn't > work. > > It's been hard to even get a log to debug the issue, but I've been > able to get a show-backtrace-all-active-cpus sysrq dmesg on the most > recent encounter with the issue and it's showing some io_uring > functions. > > Since I've been encountering the issue ever since I switched to QEMU > 5.0, I suspect io_uring may be the culprit to the issue. > > While I'd love to try out the mainline kernel, it's currently not > feasible at the moment as I have to stay in linux-5.4.y. Backporting > mainline's io_uring also seems to be a non-trivial job. > > Any tips would be appreciated. I can build my own kernel and I'm > willing to try out (backported) patches. > > Thanks. > > [243683.539303] NMI backtrace for cpu 1 > [243683.539303] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P > W O 5.4.63+ #1 > [243683.539303] Hardware name: System manufacturer System Product > Name/PRIME Z370-A, BIOS 2401 07/12/2019 > [243683.539304] RIP: 0010:io_uring_flush+0x98/0x140 > [243683.539304] Code: e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a > 08 48 89 4e 08 48 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 > 40 00 00 <48> 85 f6 74 15 4c 3b 62 c8 75 0f ba 01 00 00 00 bf 02 00 00 > 00 e8 > [243683.539304] RSP: 0018:8881f20c3e28 EFLAGS: 0006 > [243683.539305] RAX: 888419cd94e0 RBX: 88842ba49800 RCX: > 888419cd94e0 > [243683.539305] RDX: 888419cd94e0 RSI: 888419cd94d0 RDI: > 88842ba49af8 > [243683.539306] RBP: 88842ba49af8 R08: 0001 R09: > 88840d17aaf8 > [243683.539306] R10: 0001 R11: ffec R12: > 88843c68c080 > [243683.539306] R13: 88842ba49ae8 R14: 0001 R15: > > [243683.539307] FS: () GS:88843ea8() > knlGS: > [243683.539307] CS: 0010 DS: ES: CR0: 80050033 > [243683.539307] CR2: 7f3234b31f90 CR3: 02608001 CR4: > 003726e0 > [243683.539307] Call Trace: > [243683.539308] ? filp_close+0x2a/0x60 > [243683.539308] ? put_files_struct.part.0+0x57/0xb0 > [243683.539309] ? do_exit+0x321/0xa70 > [243683.539309] ? do_group_exit+0x35/0x90 > [243683.539309] ? __x64_sys_exit_group+0xf/0x10 > [243683.539309] ? do_syscall_64+0x41/0x160 > [243683.539309] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [243684.753272] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [243684.753278] rcu: 1-...0: (1 GPs behind) > idle=a5e/1/0x4000 softirq=7893711/7893712 fqs=2955 > [243684.753280] (detected by 3, t=6002 jiffies, g=17109677, q=117817) > [243684.753282] Sending NMI from CPU 3 to CPUs 1: > [243684.754285] NMI backtrace for cpu 1 > [243684.754285] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P > W O 5.4.63+ #1 > [243684.754286] Hardware name: System manufacturer System Product > Name/PRIME Z370-A, BIOS 2401 07/12/2019 > [243684.754286] RIP: 0010:io_uring_flush+0x83/0x140 > [243684.754287] Code: 89 ef e8 00 36 92 00 48 8b 83 e8 02 00 00 49 39 > c5 74 52 4d 85 e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a 08 48 > 89 4e 08 <48> 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 40 00 > 00 48 > [243684.754287] RSP: 0018:8881f20c3e28 EFLAGS: 0002 > [243684.754288] RAX: 888419cd94e0 RBX: 88842ba49800 RCX: > 888419cd94e0 > [243684.754288] RDX: 888419cd94e0 RSI: 888419cd94e0 RDI: > 88842ba49af8 > [243684.754289] RBP: 88842ba49af8 R08: 0001 R09: > 88840d17aaf8 > [243684.754289] R10: 0001 R11: ffec R12: > 88843c68c080 > [243684.754289] R13: 88842ba49ae8 R14: 0001 R15: > > [243684.754290] FS: () GS:88843ea8() > knlGS: > [243684.754290] CS: 0010 DS: ES: CR0: 80050033 > [243684.754291] CR2: 7f3234b31f90 CR3: 02608001 CR4: > 003726e0 > [243684.754291] Call Trace: > [243684.754291] ? filp_close+0x2a/0x60 > [243684.754291] ? put_files_struct.part.0+0x57/0xb0 > [243684.754292] ? do_exit+0x321/0xa70 > [243684.754292] ? do_group_exit+0x35/0x90 > [243684.754292] ? __x64_sys_exit_group+0xf/0x10 > [243684.754293] ? do_syscall_64+0x41/0x160 > [243684.754293] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 >