Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
I managed to upgrade to qemu 4.1 on a test KVM host and I can confirm I can't repro the issue in this version. Great news that is fixed in 4.1. Thanks everyone for your inputs and the fast replies. Kind regards, Fernando On vie, oct 25, 2019 at 12:28 PM, Fernando Casas Schössow wrote: Thanks for the reply Kevin. I will do my best to upgrade to 4.1, test again and report back if this is fixed or not in that version. Hopefully it is. Fernando On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf wrote: Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben: Hi John, Thanks for looking into this. I can quickly repro the problem with qemu 4.0 binary with debugging symbols enabled as I have it available already. Doing the same with qemu 4.1 or development head may be too much hassle but if it's really the only way I can give it try. We had a lot of iothread related fixes in 4.1, so this would really be the only way to tell if it's a bug that still exists. I suspect that it's already fixed (and to be more precise, I assume that commit d0ee0204f fixed it). Kevin
Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
Thanks for the reply Kevin. I will do my best to upgrade to 4.1, test again and report back if this is fixed or not in that version. Hopefully it is. Fernando On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf wrote: Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben: Hi John, Thanks for looking into this. I can quickly repro the problem with qemu 4.0 binary with debugging symbols enabled as I have it available already. Doing the same with qemu 4.1 or development head may be too much hassle but if it's really the only way I can give it try. We had a lot of iothread related fixes in 4.1, so this would really be the only way to tell if it's a bug that still exists. I suspect that it's already fixed (and to be more precise, I assume that commit d0ee0204f fixed it). Kevin
Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
BTW just to be clear, qemu is crashing in this scenario *only* if iothread is enabled for the guest. Without iothread enabled the operation is completed without any problems. On jue, oct 24, 2019 at 11:07 PM, Fernando Casas Schössow wrote: > Today I updated to qemu 4.0.1 since this was the latest version > available for Alpine and I can confirm that I can repro the issue > with this version as well. > Not sure if relevant but I can also confirm that the problem happens > with Windows Server 2012 R2 but also with Linux guests (it doesn't > matter if the guest use uefi or bios firmware). I made this tests > just to discard things. > > Also as discussed I compiled qemu with debug symbols, repro the > problem, collected a core dump and run both through gdb. This is the > result: > > (gdb) thread apply all bt > > Thread 42 (LWP 33704): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fee02380b64 in ?? () > #3 0x in ?? () > > Thread 41 (LWP 33837): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc1ad5b64 in ?? () > #3 0x in ?? () > > Thread 40 (LWP 33719): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fee02266b64 in ?? () > #3 0x in ?? () > > Thread 39 (LWP 33696): > #0 0x7fee04233171 in syscall () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee02be8b64 in ?? () > #2 0x0030 in ?? () > #3 0x7fee02be2540 in ?? () > #4 0x7fee02be2500 in ?? () > #5 0x7fee02be2548 in ?? () > #6 0x55d7e4987f28 in rcu_gp_event () > #7 0x in ?? () > > Thread 38 (LWP 33839): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc1a83b64 in ?? () > #3 0x in ?? () > > Thread 37 (LWP 33841): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc1737b64 in ?? () > #3 0x in ?? () > > Thread 36 (LWP 33863): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8c83b64 in ?? () > #3 0x in ?? () > > Thread 35 (LWP 33842): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc170eb64 in ?? () > #3 0x in ?? () > > Thread 34 (LWP 33862): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8cacb64 in ?? () > #3 0x in ?? () > > Thread 33 (LWP 33843): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc16e5b64 in ?? () > #3 0x in ?? () > > Thread 32 (LWP 33861): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8cd5b64 in ?? () > #3 0x in ?? () > > Thread 31 (LWP 33844): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc16bcb64 in ?? () > #3 0x in ?? () > > Thread 30 (LWP 33858): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8e83b64 in ?? () > #3 0x in ?? () > > Thread 29 (LWP 33845): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc1693b64 in ?? () > #3 0x in ?? () > > Thread 28 (LWP 33857): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8eacb64 in ?? () > #3 0x in ?? () > > Thread 27 (LWP 33846): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedc166ab64 in ?? () > #3 0x in ?? () > > Thread 26 (LWP 33856): > #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 > #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 > #2 0x7fedb8ed5b6
Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
in ?? () #3 0x in ?? () Thread 21 (LWP 33849): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedbd0d5b64 in ?? () #3 0x in ?? () Thread 20 (LWP 33852): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedbd05ab64 in ?? () #3 0x in ?? () Thread 19 (LWP 33850): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedbd0acb64 in ?? () #3 0x in ?? () Thread 18 (LWP 33851): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedbd083b64 in ?? () #3 0x in ?? () Thread 17 (LWP 33836): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1afeb64 in ?? () #3 0x in ?? () Thread 16 (LWP 33835): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1c5ab64 in ?? () #3 0x in ?? () Thread 15 (LWP 33834): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1c83b64 in ?? () #3 0x in ?? () Thread 14 (LWP 33833): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1cacb64 in ?? () #3 0x in ?? () Thread 13 (LWP 33677): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x55d7e516656c in ?? () #3 0x in ?? () Thread 12 (LWP 33832): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1cd5b64 in ?? () #3 0x in ?? () Thread 11 (LWP 33831): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1cfeb64 in ?? () #3 0x in ?? () Thread 10 (LWP 33829): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1e67b64 in ?? () #3 0x in ?? () Thread 9 (LWP 33828): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1e90b64 in ?? () #3 0x in ?? () Thread 8 (LWP 33827): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fee02a95b64 in ?? () #3 0x in ?? () Thread 7 (LWP 33732): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fee0223db64 in ?? () #3 0x in ?? () Thread 6 (LWP 33706): #0 0x7fee0423263d in ioctl () from /lib/ld-musl-x86_64.so.1 #1 0x0001 in ?? () #2 0x7fee0010 in ?? () #3 0x7fee02351440 in ?? () #4 0x7fee02351400 in ?? () #5 0x in ?? () Thread 5 (LWP 33838): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1aacb64 in ?? () #3 0x in ?? () Thread 4 (LWP 33860): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedb8cfeb64 in ?? () #3 0x in ?? () Thread 3 (LWP 33859): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedb8e5ab64 in ?? () #3 0x in ?? () Thread 2 (LWP 33840): #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1 #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1 #2 0x7fedc1a5ab64 in ?? () #3 0x in ?? () Thread 1 (LWP 33701): #0 0x7fee0421b7a1 in abort () from /lib/ld-musl-x86_64.so.1 #1 0x55d7e6012b70 in ?? () #2 0x0020 in ?? () #3 0x in ?? () (gdb) I'm not a developer nor skilled with gdb but if you provide me the debugging commands I can execute them and reply back with the results. I can also provide the binary and the core dump for analysis if needed. While waiting for replies I will check if I can upgrade to qemu 4.1.0, try to repro and provide the results. Thanks. Fernando On mié, oct 23, 2019 at 7:57 PM, Fernando Casas Schössow wrote: > In virsh I would do this while the guest is running: > > virsh attach-disk--type > cdrom --mode readonly > > Following the example for guest f
Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
Hi John, Thanks for looking into this. I can quickly repro the problem with qemu 4.0 binary with debugging symbols enabled as I have it available already. Doing the same with qemu 4.1 or development head may be too much hassle but if it's really the only way I can give it try. Would it worth it to try with 4.0 first and get the strack trace or it will not help and the only way to go is with 4.1 (or dev)? Thanks, Fernando On mié, oct 23, 2019 at 5:34 PM, John Snow wrote: On 10/18/19 5:41 PM, Fernando Casas Schössow wrote: Hi, Hi! Thanks for the report. Today while working with two different Windows Server 2012 R2 guests I found that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt (virsh or virt-manager) while the guest is running, qemu crashes and the following message is logged: Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx (/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to try to attach an ISO file to the SCSI CDROM while the guest is running. The SCSI controller model is virtio-scsi with iothread enabled. Please find below all the details about my setup that I considered relevant but I missed something please don't hesitate to let me know: Looks like we got aio_context management wrong with iothread for the media change events somewhere. Should be easy enough to fix if we figure out where the bad assumption is. Host arch: x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0 Do you have the ability to try 4.1, or the latest development head with debugging symbols enabled? Linux kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: virtio-scsi (with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window Server 2012 R2 Guest virtio drivers version: 171 (current stable) qemu command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=45,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0 -device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestam
Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
In virsh I would do this while the guest is running: virsh attach-disk--type cdrom --mode readonly Following the example for guest from my first email: virsh attach-disk DCHOMENET01 /resources/virtio-win-0.1.171-stable.iso sdb --type cdrom --mode readonly Right after executing this, qemu crashes and log the assertion. I can repro this also from virt-manager by selecting the cdrom device -> Connect -> selecting the ISO file -> Choose volume -> Ok (basically the same but in the gui). I may be able to try 4.1. Will look into it and report back. On mié, oct 23, 2019 at 7:33 PM, John Snow wrote: On 10/23/19 1:28 PM, Fernando Casas Schössow wrote: Hi John, Thanks for looking into this. I can quickly repro the problem with qemu 4.0 binary with debugging symbols enabled as I have it available already. Doing the same with qemu 4.1 or development head may be too much hassle but if it's really the only way I can give it try. Would it worth it to try with 4.0 first and get the strack trace or it will not help and the only way to go is with 4.1 (or dev)? Thanks, Fernando If 4.0 is what you have access to, having the stack trace for that is better than not, but confirming it happens on the latest release would be nice. Can you share your workflow for virsh that reproduces the failure? --js On mié, oct 23, 2019 at 5:34 PM, John Snow mailto:js...@redhat.com>> wrote: On 10/18/19 5:41 PM, Fernando Casas Schössow wrote: Hi, Hi! Thanks for the report. Today while working with two different Windows Server 2012 R2 guests I found that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt (virsh or virt-manager) while the guest is running, qemu crashes and the following message is logged: Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx (/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to try to attach an ISO file to the SCSI CDROM while the guest is running. The SCSI controller model is virtio-scsi with iothread enabled. Please find below all the details about my setup that I considered relevant but I missed something please don't hesitate to let me know: Looks like we got aio_context management wrong with iothread for the media change events somewhere. Should be easy enough to fix if we figure out where the bad assumption is. Host arch: x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0 Do you have the ability to try 4.1, or the latest development head with debugging symbols enabled? Linux kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: virtio-scsi (with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window Server 2012 R2 Guest virtio drivers version: 171 (current stable) qemu command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=
qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
Hi, Today while working with two different Windows Server 2012 R2 guests I found that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt (virsh or virt-manager) while the guest is running, qemu crashes and the following message is logged: Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx (/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to try to attach an ISO file to the SCSI CDROM while the guest is running. The SCSI controller model is virtio-scsi with iothread enabled. Please find below all the details about my setup that I considered relevant but I missed something please don't hesitate to let me know: Host arch: x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0 Linux kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: virtio-scsi (with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window Server 2012 R2 Guest virtio drivers version: 171 (current stable) qemu command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=45,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0 -device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on I can provide a core dump of the process if needed for debugging and the guest XML as well. Thanks. Fernando
qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt
Hi, Today while working with two different Windows Server 2012 R2 guests I found that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt (virsh or virt-manager) while the guest is running, qemu crashes and the following message is logged: Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx (/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to try to attach an ISO file to the SCSI CDROM while the guest is running. The SCSI controller model is virtio-scsi with iothread enabled. Please find below all the details about my setup that I considered relevant but I missed something please don't hesitate to let me know: Host arch: x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0 Linux kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: virtio-scsi (with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window Server 2012 R2 Guest virtio drivers version: 171 (current stable) qemu command line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=45,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0 -device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on I can provide a core dump of the process if needed for debugging and the guest XML as well. Thanks. Fernando
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Hi Stefan, I can confirm that the symbols are included in the binary using gdb. I will send you and Paolo an email with the link and credentials (if needed) so you can download everything. Thanks! On jue, feb 21, 2019 at 12:11 PM, Stefan Hajnoczi wrote: On Wed, Feb 20, 2019 at 06:56:04PM +, Fernando Casas Schössow wrote: Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB. Hi Fernando, It would be great if you could make a compressed coredump and the corresponding QEMU executable (hopefully with debug symbols) available on a server so Paolo and/or I can download them. If you're wondering about the debug symbols, since you built QEMU from source the binary in x86_64-softmmu/qemu-system-x86_64 should have the debug symbols. "gdb path/to/qemu-system-x86_64" will print "Reading symbols from qemu-system-x86_64...done." if symbols are available. Otherwise it will say "Reading symbols from qemu-system-x86_64...(no debugging symbols found)...done.". Thanks, Stefan
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Hi Paolo, This is Fernando, the one that reported the issue. Regarding the dumps I have three of them including guest memory, 2 for virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which is the proble.) I can upload them to a server you indicate me or I can also put them on a server so you can download them as you see fit. Each dump, compressed, is around 500MB. If it's more convenient for you I can try to get the requested information from gdb. But I will need some guidance since I'm not skilled enough with the debugger. Another option, if you provide me with the right patch, is for me to patch, rebuild QEMU and repro the problem again. With virtio-scsi I'm able to repro this in a matter of hours most of the times, with virtio-blk it will take a couple of days. Just let me know how do you prefer to move forward. Thanks a lot for helping with this! Kind regards, Fernando On mié, feb 20, 2019 at 6:53 PM, Paolo Bonzini wrote: On 20/02/19 17:58, Stefan Hajnoczi wrote: On Mon, Feb 18, 2019 at 07:21:25AM +, Fernando Casas Schössow wrote: It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625 Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx value 58874 was read via the cache while a non-cached read produces 58625! I suspect that 58625 is correct since the vring is already full and the driver wouldn't bump avail_idx any further until requests complete. Fernando also hits this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a virtio-blk device emulation issue. No, I am not aware of any issues. How can I get the core dump (and the corresponding executable to get the symbols)? Alternatively, it should be enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one possibility is to add in vring_avail_idx an assertion like assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo A QEMU core dump is available for debugging. Here is the patch that produced this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan used_idx 58497 2019-02-18 03:20:08.605+: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks.
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Problem reproduced with virtio-scsi as well on the same guest, this time it took less than a day. Information from the log file: vdev 0x55823f119f90 ("virtio-scsi") vq 0x55823f122e80 (idx 2) inuse 128 vring.num 128 old_shadow_avail_idx 58367 last_avail_idx 58113 avail_idx 58367 avail 0x3de8a800 avail_idx (cache bypassed) 58113 used_idx 57985 2019-02-19 04:20:43.291+: shutting down, reason=crashed Got the dump file as well, including guest memory. Size is around 486MB after compression. Is there any other information I should collect to progress the investigation? Thanks. On lun, feb 18, 2019 at 8:21 AM, Fernando Casas Schössow wrote: It took a few days but last night the problem was reproduced. This is the information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 (idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625 used_idx 58497 2019-02-18 03:20:08.605+: shutting down, reason=crashed The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB). I switched the guest now to virtio-scsi to get the information and dump with this setup as well. How should we proceed? Thanks. On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi wrote: Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Thanks for looking into this Stefan. I rebuilt Qemu with the new patch and got a couple of guests running with the new build. Two of them using virtio-scsi and another one using virtio-blk. Now I'm waiting for any of them to crash. I also set libvirt to include the guest memory in the qemu dumps as I understood you will want to look at both (qemu dump and guest memory dump). I will reply to this thread once I have any news. Kind regards. Fernando On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi wrote: On Wed, Feb 06, 2019 at 04:47:19PM +, Fernando Casas Schössow wrote: I could also repro the same with virtio-scsi on the same guest a couple of hours later: 2019-02-06 07:10:37.672+: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-02-06 07:10:37.672+: Domain id=32 is tainted: high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue. Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast gues
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
I could also repro the same with virtio-scsi on the same guest a couple of hours later: 2019-02-06 07:10:37.672+: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-02-06 07:10:37.672+: Domain id=32 is tainted: high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue. Thanks. On mié, feb 6, 2019 at 8:15 AM, Fernando Casas Schössow wrote: I can now confirm that the same happens with virtio-blk and virtio-scsi. Please find below the qemu log enhanced with the new information added by the patch provided by Stefan: vdev 0x55d22b8e10f0 ("virtio-blk") vq 0x55d22b8ebe40 (idx 0) inuse 128 vring.num 128 2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back. Thanks. On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow wrote: I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, St
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
I can now confirm that the same happens with virtio-blk and virtio-scsi. Please find below the qemu log enhanced with the new information added by the patch provided by Stefan: vdev 0x55d22b8e10f0 ("virtio-blk") vq 0x55d22b8ebe40 (idx 0) inuse 128 vring.num 128 2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded I just changed the disk back to virtio-scsi so I can repro this again with the patched qemu and report back. Thanks. On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow wrote: I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi wrote: Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
I can test again with qemu 3.1 but with previous versions yes, it was happening the same with both virtio-blk and virtio-scsi. For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can test with virtio-blk again if that will add value to the investigation. Also I'm attaching a guest console screenshot showing the errors displayed by the guest when it goes unresponsive in case it can help. Thanks for the patch. I will build the custom qemu binary and reproduce the issue. This may take a couple of days since I cannot reproduce it at will. Sometimes it takes 12 hours sometimes 2 days until it happens. Hopefully the code below will add more light on to this problem. Thanks, Fernando On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi wrote: Are you sure this happens with both virtio-blk and virtio-scsi? The following patch adds more debug output. You can build as follows: $ git clone https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the patch here... ^D # For info on build dependencies see https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ make -j4 You can configure a libvirt domain to use your custom QEMU binary by changing the tag to the qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
-drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=46,id=hostnet0,vhost=on,vhostfd=48 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=49,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Thanks. Kind regards, Fernando On vie, feb 1, 2019 at 6:48 AM, Stefan Hajnoczi wrote: On Thu, Jan 31, 2019 at 11:32:32AM +, Fernando Casas Schössow wrote: Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. Do you have any suggestions on how can I progress troubleshooting? Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome. Hi Fernando, Please post your QEMU command-line (ps aux | grep qemu) and the details of the guest operating system and version. Stefan
Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Hi, Sorry for resurrecting this thread after so long but I just upgraded the host to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to avoid this issue so as a workaround I'm using SATA emulated storage which is not ideal but is perfectly stable. Do you have any suggestions on how can I progress troubleshooting? Qemu is not crashing so I don't have any dumps that can be analyzed. The guest is just "stuck" and all I can do is destroy it and start it again. It's really frustrating that after all this time I couldn't find the cause for this issue so any ideas are welcome. Thanks. Fernando ____ From: Fernando Casas Schössow Sent: Saturday, June 24, 2017 10:34 AM To: Ladi Prosek Cc: qemu-de...@nongnu.org Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error Hi Ladi, After running for about 15hrs two different guests (one Windows, one Linux) crashed with around 1 hour difference and the same error in qemu log "Virqueue size exceeded". The Linux guest was already running on virtio_scsi and without virtio_balloon. :( I compiled and attached gdbserver to the qemu process for this guest but when I did this I got the following warning in gdbserver: warning: Cannot call inferior functions, Linux kernel PaX protection forbids return to non-executable pages! The default Alpine kernel is a grsec kernel. Not sure if this will interfere with debugging or not but I suspect yes. If you need me to replace the grsec kernel with a vanilla one (also available as an option in Alpine) let me know and I will do so. Otherwise send me an email directly so I can share with you the host:port details so you can connect to gdbserver. Thanks, Fer On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow wrote: Hi Ladi, Small update. Memtest86+ was running on the host for more than 54 hours. 8 passes were completed and no memory errors found. For now I think we can assume that the host memory is ok. I just started all the guests one hour ago. I will monitor them and once one fails I will attach the debugger and let you know. Thanks. Fer On jue, jun 22, 2017 at 9:43 , Ladi Prosek wrote: Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow mailto:casasferna...@hotmail.com>> wrote: Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone for now then. For the last 15 hours or so I'm running memtest86+ on the host. So far so good. Two passes no errors so far. I will try to leave it running for at least another 24hr and report back the results. Hopefully we can discard the memory issue at hardware level. Regarding KSM, that will be the next thing I will disable if after removing the balloon device guests still crash. About leaving a guest in a failed state for you to debug it remotely, that's absolutely an option. We just need to coordinate so I can give you remote access to the host and so on. Let me know if any preparation is needed in advance and which tools you need installed on the host. I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 and let me know the host name and port (12345 in the above example). Once I again I would like to thank you for all your help and your great disposition! You're absolutely welcome, I don't think I've done anything helpful so far :) Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek mailto:lpro...@redhat.com>> wrote: The host kernel is less likely to be responsible for this, in my opinion. I'd hold off on that for now. And last but not least KSM is enabled on the host. Should I disable it? Could be worth the try. Following your advice I will run memtest on the host and report back. Just as a side comment, the host is running on ECC memory. I see. Would it be possible for you, once a guest is in the broken state, to make it available for debugging? By attaching gdb to the QEMU process for example and letting me poke around it remotely? Thanks!
Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)
Hi again, I hit another occurrence of this issue and collected the qemy dump as well. So at the moment I have two qemu dumps for this issue happening on two different guests. I wish I know how to analyze them myself but is not the case so if anyone in the list is willing to take a look I'm more than willing to share them. Thanks in advance. Fernando On jue, oct 18, 2018 at 3:40 PM, Fernando Casas Schössow wrote: Hi Kevin, Not at the moment. This is a production system and pretty much up to date but can't upgrade to 3.0 yet. If the dump can be of any use, I can upload it somewhere for analysis. BR, Fernando On jue, oct 18, 2018 at 2:38 PM, Kevin Wolf wrote: Hi Fernando, Am 18.10.2018 um 14:25 hat Fernando Casas Schössow geschrieben: I hope this email finds you well and I apologize in advance for resurrecting this thread. I'm currently running on qemu 2.12.1 and I'm still having this issue every few days but now I managed to get a core dump generated (without including the guest memory). Would you take a look at the dump? Please let me know how do you prefer me to share it. The file is around 290MB as is but I can try to compress it. would it be possible for you to test whether the bug still exists in QEMU 3.0? There were a few fixes related to request draining in that release, so maybe it's already solved now. Kevin
Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)
Hi Kevin, Not at the moment. This is a production system and pretty much up to date but can't upgrade to 3.0 yet. If the dump can be of any use, I can upload it somewhere for analysis. BR, Fernando On jue, oct 18, 2018 at 2:38 PM, Kevin Wolf wrote: Hi Fernando, Am 18.10.2018 um 14:25 hat Fernando Casas Schössow geschrieben: I hope this email finds you well and I apologize in advance for resurrecting this thread. I'm currently running on qemu 2.12.1 and I'm still having this issue every few days but now I managed to get a core dump generated (without including the guest memory). Would you take a look at the dump? Please let me know how do you prefer me to share it. The file is around 290MB as is but I can try to compress it. would it be possible for you to test whether the bug still exists in QEMU 3.0? There were a few fixes related to request draining in that release, so maybe it's already solved now. Kevin
Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)
Hi Stefan, I hope this email finds you well and I apologize in advance for resurrecting this thread. I'm currently running on qemu 2.12.1 and I'm still having this issue every few days but now I managed to get a core dump generated (without including the guest memory). Would you take a look at the dump? Please let me know how do you prefer me to share it. The file is around 290MB as is but I can try to compress it. Looking forward your reply. Thanks Kind regards, Fernando From: Stefan Hajnoczi Sent: Monday, January 29, 2018 5:01 PM To: Fernando Casas Schössow Cc: qemu-devel; qemu-block@nongnu.org; Jeff Cody Subject: Re: [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests) On Wed, Dec 13, 2017 at 03:33:01PM +0000, Fernando Casas Schössow wrote: > Maybe I’m missing something here but, if I recall correctly, the qemu process > for the guest is terminated when this problem happens. So how a debugger will > be attached to a process that is gone? Sorry, this got lost in my inbox. assert(false) sends SIGABRT to the process. The default behavior is to dump a core file that can be analyzed later with GDB. Your system must have core dumps enabled (i.e. ulimit -c unlimited). Also, various pieces of software like systemd's coredumpctl can influence where and how core dumps are stored. But in short, an assertion failure produces a core dump. Stefan
Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)
Thanks for the explanation Stefan. Maybe I’m missing something here but, if I recall correctly, the qemu process for the guest is terminated when this problem happens. So how a debugger will be attached to a process that is gone? Also I don’t think the Alpine packages for qemu include the debug info. :( But I’m not 100% sure at the moment. Sent from my iPhone On 13 Dec 2017, at 12:23, Stefan Hajnoczi <stefa...@gmail.com<mailto:stefa...@gmail.com>> wrote: On Mon, Dec 11, 2017 at 01:42:38PM +0000, Fernando Casas Schössow wrote: Hello Stefan, Thanks for your reply. Fortunately I didn’t have the problem again and it’s not clear how it can be consistently reproduced. Daily backups are running as usual at the moment. If there is anything I can do from my side or if you have any ideas to try to reproduce it let me know. Okay, thanks. In case it happens again, here is some information about the assertion failure. It will probably be necessary to use GDB to inspect the failed process (core dump). The starting point for debugging is mirror_run() in block/mirror.c: if (cnt == 0 && should_complete) { /* The dirty bitmap is not updated while operations are pending. * If we're about to exit, wait for pending operations before * calling bdrv_get_dirty_count(bs), or we may exit while the * source has dirty data to copy! * * Note that I/O can be submitted by the guest while * mirror_populate runs, so pause it now. Before deciding * whether to switch to target check one last time if I/O has * come in the meanwhile, and if not flush the data to disk. */ trace_mirror_before_drain(s, cnt); bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } /* The two disks are in sync. Exit and report successful * completion. */ assert(QLIST_EMPTY(>tracked_requests)); The reason why this assertion makes sense is because bdrv_drained_begin(bs) starts a region where I/O is quiesced (all requests should have been completed). Either there is a bug with blockjobs vs bdrv_drained_debug(), or maybe mirror_flush() submitted new I/O requests. It would be interesting to print the bs->tracked_requests linked list in the GDB debugger. Those requests might give a clue about the root cause. Stefan
Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)
Hello Stefan, Thanks for your reply. Fortunately I didn’t have the problem again and it’s not clear how it can be consistently reproduced. Daily backups are running as usual at the moment. If there is anything I can do from my side or if you have any ideas to try to reproduce it let me know. Thanks. Fer Sent from my iPhone On 11 Dec 2017, at 12:56, Stefan Hajnoczi <stefa...@gmail.com<mailto:stefa...@gmail.com>> wrote: On Thu, Dec 07, 2017 at 10:18:52AM +0000, Fernando Casas Schössow wrote: Hi there, Last night while doing a backup of a guest using the live snapshot mechanism the qemu process for the guest seem to had crashed. The snapshot succeeded then the backup of the VM disk had place and also succeeded but the commit to the original disk after the backup seem to have failed. The command I use in the script to take the snapshot is: virsh snapshot-create-as --domain $VM backup-job.qcow2 --disk-only --atomic --quiesce --no-metadata And then to commit back is: virsh blockcommit $VM $TARGETDISK --base $DISKFILE --top $SNAPFILE --active --pivot In the qemu log for the guest I found the following while the commit back was having place: Assertion failed: QLIST_EMPTY(>tracked_requests) (/home/buildozer/aports/main/qemu/src/qemu-2.10.1/block/mirror.c: mirror_run: 884) I'm running qemu 2.10.1 with libvirt 3.9.0 and kernel 4.9.65 on Alpine Linux 3.7. This is the complete guest info from the logs: LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-DOCKER01/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-DOCKER01/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x9 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-sata0-0-0,cache=none,aio=threads -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-DOCKER01/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on I was running on qemu 2.8.1 for months and didn't have any problems with the backups but yesterday I updated to qemu 2.10.1 and I hit this problem last night. Is this a bug? Any ideas will be appreciated. Thanks for reporting this bug. Can you reproduce it reliably? Stefan