Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-19 Thread Pankaj Gupta
@Jack Wang,
Maybe four io_uring patches in 5.4.71 fixes the issue for you as well?

Thanks,
Pankaj

> Hi Jens.
>
> On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe  wrote:
> >
> > Would be great if you could try 5.4.71 and see if that helps for your
> > issue.
> >
>
> Oh wow, yeah it did fix the issue.
>
> I'm able to reliably turn off and start the VM multiple times in a row.
> Double checked by confirming QEMU is dynamically linked to liburing.so.1.
>
> Looks like those 4 io_uring fixes helped.
>
> Thanks!
>



Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-17 Thread Jens Axboe
On 10/17/20 8:29 AM, Ju Hyung Park wrote:
> Hi Jens.
> 
> On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe  wrote:
>>
>> Would be great if you could try 5.4.71 and see if that helps for your
>> issue.
>>
> 
> Oh wow, yeah it did fix the issue.
> 
> I'm able to reliably turn off and start the VM multiple times in a row.
> Double checked by confirming QEMU is dynamically linked to liburing.so.1.
> 
> Looks like those 4 io_uring fixes helped.

Awesome, thanks for testing!

-- 
Jens Axboe




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-17 Thread Ju Hyung Park
Hi Jens.

On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe  wrote:
>
> Would be great if you could try 5.4.71 and see if that helps for your
> issue.
>

Oh wow, yeah it did fix the issue.

I'm able to reliably turn off and start the VM multiple times in a row.
Double checked by confirming QEMU is dynamically linked to liburing.so.1.

Looks like those 4 io_uring fixes helped.

Thanks!



Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-16 Thread Jens Axboe
On 10/16/20 12:04 PM, Ju Hyung Park wrote:
> A small update:
> 
> As per Stefano's suggestion, disabling io_uring support from QEMU from
> the configuration step did fix the problem and I'm no longer having
> hangs.
> 
> Looks like it __is__ an io_uring issue :(

Would be great if you could try 5.4.71 and see if that helps for your
issue.

-- 
Jens Axboe




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-16 Thread Ju Hyung Park
A small update:

As per Stefano's suggestion, disabling io_uring support from QEMU from
the configuration step did fix the problem and I'm no longer having
hangs.

Looks like it __is__ an io_uring issue :(

Btw, I used liburing fe50048 for linking QEMU.

Thanks.


On Fri, Oct 2, 2020 at 4:35 PM Stefano Garzarella  wrote:
>
> Hi Ju,
>
> On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote:
> > Hi Stefano,
> >
> > On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella  
> > wrote:
> > > Please, can you share the qemu command line that you are using?
> > > This can be useful for the analysis.
> >
> > Sure.
>
> Thanks for sharing.
>
> The issue seems related to io_uring and the new io_uring fd monitoring
> implementation available from QEMU 5.0.
>
> I'll try to reproduce.
>
> For now, as a workaround, you can rebuild qemu by disabling io-uring support:
>
>   ../configure --disable-linux-io-uring ...
>
>
> Thanks,
> Stefano
>
> >
> > QEMU:
> > /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S
> > -object 
> > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
> > -blockdev 
> > {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}
> > -blockdev 
> > {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}
> > -machine 
> > pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
> > -cpu 
> > Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset
> > -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10
> > -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2
> > -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,fd=32,server,nowait -mon
> > chardev=charmonitor,id=monitor,mode=control -rtc
> > base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay
> > -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global
> > ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device
> > pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
> > -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
> > -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
> > -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
> > -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device
> > qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev
> > {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"}
> > -device 
> > virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
> > -netdev tap,fd=34,id=hostnet0 -device
> > e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3
> > -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device
> > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
> > vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0
> > -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object
> > rng-random,id=objrng0,filename=/dev/urandom -device
> > virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg
> > timestamp=on
> >
> > And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM.
> >
> > 
> >   win10
> >   7ccc3031-1dab-4267-b72a-d60065b5ff7f
> >   
> >  > xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0;>
> >   http://microsoft.com/win/10"/>
> > 
> >   
> >   8388608
> >   8388608
> >   
> > 
> > 
> >   
> >   4
> >   
> > 
> > 
> > 
> > 
> >   
> >   
> > hvm
> >  > type="pflash">/usr/share/OVMF/OVMF_CODE.fd
> > /var/lib/libvirt/qemu/nvram/win10_VARS.fd
> > 
> > 
> >   
> >   
> > 
> > 
> > 
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> > 
> > 
> >   
> >   
> > 
> >   
> >   

Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-02 Thread Stefano Garzarella
Hi Ju,

On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote:
> Hi Stefano,
> 
> On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella  wrote:
> > Please, can you share the qemu command line that you are using?
> > This can be useful for the analysis.
> 
> Sure.

Thanks for sharing.

The issue seems related to io_uring and the new io_uring fd monitoring
implementation available from QEMU 5.0.

I'll try to reproduce.

For now, as a workaround, you can rebuild qemu by disabling io-uring support:

  ../configure --disable-linux-io-uring ...


Thanks,
Stefano

> 
> QEMU:
> /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S
> -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
> -blockdev 
> {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}
> -blockdev 
> {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}
> -blockdev 
> {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}
> -blockdev 
> {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}
> -machine 
> pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
> -cpu 
> Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset
> -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10
> -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2
> -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,fd=32,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc
> base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay
> -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global
> ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device
> pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
> -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
> -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
> -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
> -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device
> qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev
> {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}
> -blockdev 
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"}
> -device 
> virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
> -netdev tap,fd=34,id=hostnet0 -device
> e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3
> -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
> vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0
> -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object
> rng-random,id=objrng0,filename=/dev/urandom -device
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg
> timestamp=on
> 
> And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM.
> 
> 
>   win10
>   7ccc3031-1dab-4267-b72a-d60065b5ff7f
>   
>  xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0;>
>   http://microsoft.com/win/10"/>
> 
>   
>   8388608
>   8388608
>   
> 
> 
>   
>   4
>   
> 
> 
> 
> 
>   
>   
> hvm
> /usr/share/OVMF/OVMF_CODE.fd
> /var/lib/libvirt/qemu/nvram/win10_VARS.fd
> 
> 
>   
>   
> 
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
>   
>   
> 
>   
>   
> 
> 
> 
> 
>   
>   destroy
>   restart
>   destroy
>   
> 
> 
>   
>   
> /usr/bin/qemu-system-x86_64
> 
>detect_zeroes="unmap"/>
>dev="/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9"/>
>   
>function="0x0"/>
> 
> 
> 
>   
>   
>function="0x0" multifunction="on"/>
> 
> 
>   
>   
>function="0x1"/>
> 
> 
>   
>   
>function="0x2"/>
> 
> 
>   
>   
>function="0x3"/>
> 
> 
>   
>

Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-01 Thread Ju Hyung Park
Hi Stefano,

On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella  wrote:
> Please, can you share the qemu command line that you are using?
> This can be useful for the analysis.

Sure.

QEMU:
/usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
-blockdev 
{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}
-blockdev 
{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}
-blockdev 
{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}
-blockdev 
{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}
-machine 
pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
-cpu 
Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset
-m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10
-overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2
-uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none
-no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=32,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc
base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay
-no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global
ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
-device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device
qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev
{"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}
-blockdev 
{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"}
-device 
virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
-netdev tap,fd=34,id=hostnet0 -device
e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3
-device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0
-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object
rng-random,id=objrng0,filename=/dev/urandom -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg
timestamp=on

And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM.


  win10
  7ccc3031-1dab-4267-b72a-d60065b5ff7f
  
http://libosinfo.org/xmlns/libvirt/domain/1.0;>
  http://microsoft.com/win/10"/>

  
  8388608
  8388608
  


  
  4
  




  
  
hvm
/usr/share/OVMF/OVMF_CODE.fd
/var/lib/libvirt/qemu/nvram/win10_VARS.fd


  
  



  
  
  
  
  
  
  
  


  
  

  
  




  
  destroy
  restart
  destroy
  


  
  
/usr/bin/qemu-system-x86_64

  
  
  
  



  
  
  


  
  
  


  
  
  


  
  
  


  
  


  


  


  
  
  
  




  


  

  
  
  


  


  /dev/urandom
  

  




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-01 Thread Jack Wang
Stefano Garzarella  于2020年10月1日周四 上午10:59写道:
>
> +Cc: qemu-devel@nongnu.org
>
> Hi,
>
> On Thu, Oct 01, 2020 at 01:26:51AM +0900, Ju Hyung Park wrote:
> > Hi everyone.
> >
> > I have recently switched to a setup running QEMU 5.0(which supports
> > io_uring) for a Windows 10 guest on Linux v5.4.63.
> > The QEMU hosts /dev/nvme0n1p3 to the guest with virtio-blk with
> > discard/unmap enabled.
>
> Please, can you share the qemu command line that you are using?
> This can be useful for the analysis.
>
> Thanks,
> Stefano
>
> >
> > I've been having a weird issue where the system would randomly hang
> > whenever I turn on or shutdown the guest. The host will stay up for a
> > bit and then just hang. No response on SSH, etc. Even ping doesn't
> > work.
> >
> > It's been hard to even get a log to debug the issue, but I've been
> > able to get a show-backtrace-all-active-cpus sysrq dmesg on the most
> > recent encounter with the issue and it's showing some io_uring
> > functions.
> >
> > Since I've been encountering the issue ever since I switched to QEMU
> > 5.0, I suspect io_uring may be the culprit to the issue.
> >
> > While I'd love to try out the mainline kernel, it's currently not
> > feasible at the moment as I have to stay in linux-5.4.y. Backporting
> > mainline's io_uring also seems to be a non-trivial job.
> >
> > Any tips would be appreciated. I can build my own kernel and I'm
> > willing to try out (backported) patches.
> >
> > Thanks.
> >
> > [243683.539303] NMI backtrace for cpu 1
> > [243683.539303] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P
> >   W  O  5.4.63+ #1
> > [243683.539303] Hardware name: System manufacturer System Product
> > Name/PRIME Z370-A, BIOS 2401 07/12/2019
> > [243683.539304] RIP: 0010:io_uring_flush+0x98/0x140
> > [243683.539304] Code: e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a
> > 08 48 89 4e 08 48 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00
> > 40 00 00 <48> 85 f6 74 15 4c 3b 62 c8 75 0f ba 01 00 00 00 bf 02 00 00
> > 00 e8
> > [243683.539304] RSP: 0018:8881f20c3e28 EFLAGS: 0006
> > [243683.539305] RAX: 888419cd94e0 RBX: 88842ba49800 RCX:
> > 888419cd94e0
> > [243683.539305] RDX: 888419cd94e0 RSI: 888419cd94d0 RDI:
> > 88842ba49af8
> > [243683.539306] RBP: 88842ba49af8 R08: 0001 R09:
> > 88840d17aaf8
> > [243683.539306] R10: 0001 R11: ffec R12:
> > 88843c68c080
> > [243683.539306] R13: 88842ba49ae8 R14: 0001 R15:
> > 
> > [243683.539307] FS:  () GS:88843ea8()
> > knlGS:
> > [243683.539307] CS:  0010 DS:  ES:  CR0: 80050033
> > [243683.539307] CR2: 7f3234b31f90 CR3: 02608001 CR4:
> > 003726e0
> > [243683.539307] Call Trace:
> > [243683.539308]  ? filp_close+0x2a/0x60
> > [243683.539308]  ? put_files_struct.part.0+0x57/0xb0
> > [243683.539309]  ? do_exit+0x321/0xa70
> > [243683.539309]  ? do_group_exit+0x35/0x90
> > [243683.539309]  ? __x64_sys_exit_group+0xf/0x10
> > [243683.539309]  ? do_syscall_64+0x41/0x160
> > [243683.539309]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [243684.753272] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > [243684.753278] rcu: 1-...0: (1 GPs behind)
> > idle=a5e/1/0x4000 softirq=7893711/7893712 fqs=2955
> > [243684.753280] (detected by 3, t=6002 jiffies, g=17109677, q=117817)
> > [243684.753282] Sending NMI from CPU 3 to CPUs 1:
> > [243684.754285] NMI backtrace for cpu 1
> > [243684.754285] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P
> >   W  O  5.4.63+ #1
> > [243684.754286] Hardware name: System manufacturer System Product
> > Name/PRIME Z370-A, BIOS 2401 07/12/2019
> > [243684.754286] RIP: 0010:io_uring_flush+0x83/0x140
> > [243684.754287] Code: 89 ef e8 00 36 92 00 48 8b 83 e8 02 00 00 49 39
> > c5 74 52 4d 85 e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a 08 48
> > 89 4e 08 <48> 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 40 00
> > 00 48
> > [243684.754287] RSP: 0018:8881f20c3e28 EFLAGS: 0002
> > [243684.754288] RAX: 888419cd94e0 RBX: 88842ba49800 RCX:
> > 888419cd94e0
> > [243684.754288] RDX: 888419cd94e0 RSI: 888419cd94e0 RDI:
> > 88842ba49af8
> > [243684.754289] RBP: 88842ba49af8 R08: 0001 R09:
> > 88840d17aaf8
> > [243684.754289] R10: 0001 R11: ffec R12:
> > 88843c68c080
> > [243684.754289] R13: 88842ba49ae8 R14: 0001 R15:
> > 
> > [243684.754290] FS:  () GS:88843ea8()
> > knlGS:
> > [243684.754290] CS:  0010 DS:  ES:  CR0: 80050033
> > [243684.754291] CR2: 7f3234b31f90 CR3: 02608001 CR4:
> > 003726e0
> > [243684.754291] Call Trace:
> > [243684.754291]  ? filp_close+0x2a/0x60
> > [243684.754291]  ? put_files_struct.part.0+0x57/0xb0
> > [243684.754292]  ? do_exit+0x321/0xa70
> > 

Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-01 Thread Stefano Garzarella
+Cc: qemu-devel@nongnu.org

Hi,

On Thu, Oct 01, 2020 at 01:26:51AM +0900, Ju Hyung Park wrote:
> Hi everyone.
> 
> I have recently switched to a setup running QEMU 5.0(which supports
> io_uring) for a Windows 10 guest on Linux v5.4.63.
> The QEMU hosts /dev/nvme0n1p3 to the guest with virtio-blk with
> discard/unmap enabled.

Please, can you share the qemu command line that you are using?
This can be useful for the analysis.

Thanks,
Stefano

> 
> I've been having a weird issue where the system would randomly hang
> whenever I turn on or shutdown the guest. The host will stay up for a
> bit and then just hang. No response on SSH, etc. Even ping doesn't
> work.
> 
> It's been hard to even get a log to debug the issue, but I've been
> able to get a show-backtrace-all-active-cpus sysrq dmesg on the most
> recent encounter with the issue and it's showing some io_uring
> functions.
> 
> Since I've been encountering the issue ever since I switched to QEMU
> 5.0, I suspect io_uring may be the culprit to the issue.
> 
> While I'd love to try out the mainline kernel, it's currently not
> feasible at the moment as I have to stay in linux-5.4.y. Backporting
> mainline's io_uring also seems to be a non-trivial job.
> 
> Any tips would be appreciated. I can build my own kernel and I'm
> willing to try out (backported) patches.
> 
> Thanks.
> 
> [243683.539303] NMI backtrace for cpu 1
> [243683.539303] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P
>   W  O  5.4.63+ #1
> [243683.539303] Hardware name: System manufacturer System Product
> Name/PRIME Z370-A, BIOS 2401 07/12/2019
> [243683.539304] RIP: 0010:io_uring_flush+0x98/0x140
> [243683.539304] Code: e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a
> 08 48 89 4e 08 48 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00
> 40 00 00 <48> 85 f6 74 15 4c 3b 62 c8 75 0f ba 01 00 00 00 bf 02 00 00
> 00 e8
> [243683.539304] RSP: 0018:8881f20c3e28 EFLAGS: 0006
> [243683.539305] RAX: 888419cd94e0 RBX: 88842ba49800 RCX:
> 888419cd94e0
> [243683.539305] RDX: 888419cd94e0 RSI: 888419cd94d0 RDI:
> 88842ba49af8
> [243683.539306] RBP: 88842ba49af8 R08: 0001 R09:
> 88840d17aaf8
> [243683.539306] R10: 0001 R11: ffec R12:
> 88843c68c080
> [243683.539306] R13: 88842ba49ae8 R14: 0001 R15:
> 
> [243683.539307] FS:  () GS:88843ea8()
> knlGS:
> [243683.539307] CS:  0010 DS:  ES:  CR0: 80050033
> [243683.539307] CR2: 7f3234b31f90 CR3: 02608001 CR4:
> 003726e0
> [243683.539307] Call Trace:
> [243683.539308]  ? filp_close+0x2a/0x60
> [243683.539308]  ? put_files_struct.part.0+0x57/0xb0
> [243683.539309]  ? do_exit+0x321/0xa70
> [243683.539309]  ? do_group_exit+0x35/0x90
> [243683.539309]  ? __x64_sys_exit_group+0xf/0x10
> [243683.539309]  ? do_syscall_64+0x41/0x160
> [243683.539309]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [243684.753272] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [243684.753278] rcu: 1-...0: (1 GPs behind)
> idle=a5e/1/0x4000 softirq=7893711/7893712 fqs=2955
> [243684.753280] (detected by 3, t=6002 jiffies, g=17109677, q=117817)
> [243684.753282] Sending NMI from CPU 3 to CPUs 1:
> [243684.754285] NMI backtrace for cpu 1
> [243684.754285] CPU: 1 PID: 1527 Comm: qemu-system-x86 Tainted: P
>   W  O  5.4.63+ #1
> [243684.754286] Hardware name: System manufacturer System Product
> Name/PRIME Z370-A, BIOS 2401 07/12/2019
> [243684.754286] RIP: 0010:io_uring_flush+0x83/0x140
> [243684.754287] Code: 89 ef e8 00 36 92 00 48 8b 83 e8 02 00 00 49 39
> c5 74 52 4d 85 e4 74 70 48 8b 93 e8 02 00 00 48 8b 32 48 8b 4a 08 48
> 89 4e 08 <48> 89 31 48 89 12 48 89 52 08 48 8b 72 f8 81 4a a8 00 40 00
> 00 48
> [243684.754287] RSP: 0018:8881f20c3e28 EFLAGS: 0002
> [243684.754288] RAX: 888419cd94e0 RBX: 88842ba49800 RCX:
> 888419cd94e0
> [243684.754288] RDX: 888419cd94e0 RSI: 888419cd94e0 RDI:
> 88842ba49af8
> [243684.754289] RBP: 88842ba49af8 R08: 0001 R09:
> 88840d17aaf8
> [243684.754289] R10: 0001 R11: ffec R12:
> 88843c68c080
> [243684.754289] R13: 88842ba49ae8 R14: 0001 R15:
> 
> [243684.754290] FS:  () GS:88843ea8()
> knlGS:
> [243684.754290] CS:  0010 DS:  ES:  CR0: 80050033
> [243684.754291] CR2: 7f3234b31f90 CR3: 02608001 CR4:
> 003726e0
> [243684.754291] Call Trace:
> [243684.754291]  ? filp_close+0x2a/0x60
> [243684.754291]  ? put_files_struct.part.0+0x57/0xb0
> [243684.754292]  ? do_exit+0x321/0xa70
> [243684.754292]  ? do_group_exit+0x35/0x90
> [243684.754292]  ? __x64_sys_exit_group+0xf/0x10
> [243684.754293]  ? do_syscall_64+0x41/0x160
> [243684.754293]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>