Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-25 Thread Fernando Casas Schössow
I managed to upgrade to qemu 4.1 on a test KVM host and I can confirm I can't 
repro the issue in this version.
Great news that is fixed in 4.1.

Thanks everyone for your inputs and the fast replies.

Kind regards,

Fernando

On vie, oct 25, 2019 at 12:28 PM, Fernando Casas Schössow 
 wrote:
Thanks for the reply Kevin.

I will do my best to upgrade to 4.1, test again and report back if this is 
fixed or not in that version.
Hopefully it is.

Fernando

On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf  wrote:
Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben:
Hi John, Thanks for looking into this. I can quickly repro the problem with 
qemu 4.0 binary with debugging symbols enabled as I have it available already. 
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try.
We had a lot of iothread related fixes in 4.1, so this would really be the only 
way to tell if it's a bug that still exists. I suspect that it's already fixed 
(and to be more precise, I assume that commit d0ee0204f fixed it). Kevin






Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-25 Thread Fernando Casas Schössow
Thanks for the reply Kevin.

I will do my best to upgrade to 4.1, test again and report back if this is 
fixed or not in that version.
Hopefully it is.

Fernando

On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf  wrote:
Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben:
Hi John, Thanks for looking into this. I can quickly repro the problem with 
qemu 4.0 binary with debugging symbols enabled as I have it available already. 
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try.
We had a lot of iothread related fixes in 4.1, so this would really be the only 
way to tell if it's a bug that still exists. I suspect that it's already fixed 
(and to be more precise, I assume that commit d0ee0204f fixed it). Kevin




Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-24 Thread Fernando Casas Schössow
BTW just to be clear, qemu is crashing in this scenario *only* if 
iothread is enabled for the guest.
Without iothread enabled the operation is completed without any 
problems.

On jue, oct 24, 2019 at 11:07 PM, Fernando Casas Schössow 
 wrote:
> Today I updated to qemu 4.0.1 since this was the latest version 
> available for Alpine and I can confirm that I can repro the issue 
> with this version as well.
> Not sure if relevant but I can also confirm that the problem happens 
> with Windows Server 2012 R2 but also with Linux guests (it doesn't 
> matter if the guest use uefi or bios firmware). I made this tests 
> just to discard things.
> 
> Also as discussed I compiled qemu with debug symbols, repro the 
> problem, collected a core dump and run both through gdb. This is the 
> result:
> 
> (gdb) thread apply all bt
> 
> Thread 42 (LWP 33704):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fee02380b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 41 (LWP 33837):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc1ad5b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 40 (LWP 33719):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fee02266b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 39 (LWP 33696):
> #0 0x7fee04233171 in syscall () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee02be8b64 in ?? ()
> #2 0x0030 in ?? ()
> #3 0x7fee02be2540 in ?? ()
> #4 0x7fee02be2500 in ?? ()
> #5 0x7fee02be2548 in ?? ()
> #6 0x55d7e4987f28 in rcu_gp_event ()
> #7 0x in ?? ()
> 
> Thread 38 (LWP 33839):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc1a83b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 37 (LWP 33841):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc1737b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 36 (LWP 33863):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8c83b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 35 (LWP 33842):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc170eb64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 34 (LWP 33862):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8cacb64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 33 (LWP 33843):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc16e5b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 32 (LWP 33861):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8cd5b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 31 (LWP 33844):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc16bcb64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 30 (LWP 33858):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8e83b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 29 (LWP 33845):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc1693b64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 28 (LWP 33857):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8eacb64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 27 (LWP 33846):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedc166ab64 in ?? ()
> #3 0x in ?? ()
> 
> Thread 26 (LWP 33856):
> #0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
> #1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
> #2 0x7fedb8ed5b6

Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-24 Thread Fernando Casas Schössow
 in ?? ()
#3 0x in ?? ()

Thread 21 (LWP 33849):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedbd0d5b64 in ?? ()
#3 0x in ?? ()

Thread 20 (LWP 33852):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedbd05ab64 in ?? ()
#3 0x in ?? ()

Thread 19 (LWP 33850):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedbd0acb64 in ?? ()
#3 0x in ?? ()

Thread 18 (LWP 33851):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedbd083b64 in ?? ()
#3 0x in ?? ()

Thread 17 (LWP 33836):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1afeb64 in ?? ()
#3 0x in ?? ()

Thread 16 (LWP 33835):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1c5ab64 in ?? ()
#3 0x in ?? ()

Thread 15 (LWP 33834):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1c83b64 in ?? ()
#3 0x in ?? ()

Thread 14 (LWP 33833):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1cacb64 in ?? ()
#3 0x in ?? ()

Thread 13 (LWP 33677):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x55d7e516656c in ?? ()
#3 0x in ?? ()

Thread 12 (LWP 33832):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1cd5b64 in ?? ()
#3 0x in ?? ()

Thread 11 (LWP 33831):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1cfeb64 in ?? ()
#3 0x in ?? ()

Thread 10 (LWP 33829):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1e67b64 in ?? ()
#3 0x in ?? ()

Thread 9 (LWP 33828):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1e90b64 in ?? ()
#3 0x in ?? ()

Thread 8 (LWP 33827):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fee02a95b64 in ?? ()
#3 0x in ?? ()

Thread 7 (LWP 33732):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fee0223db64 in ?? ()
#3 0x in ?? ()

Thread 6 (LWP 33706):
#0 0x7fee0423263d in ioctl () from /lib/ld-musl-x86_64.so.1
#1 0x0001 in ?? ()
#2 0x7fee0010 in ?? ()
#3 0x7fee02351440 in ?? ()
#4 0x7fee02351400 in ?? ()
#5 0x in ?? ()

Thread 5 (LWP 33838):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1aacb64 in ?? ()
#3 0x in ?? ()

Thread 4 (LWP 33860):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedb8cfeb64 in ?? ()
#3 0x in ?? ()

Thread 3 (LWP 33859):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedb8e5ab64 in ?? ()
#3 0x in ?? ()

Thread 2 (LWP 33840):
#0 0x7fee04252080 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x7fee0424f4b6 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x7fedc1a5ab64 in ?? ()
#3 0x in ?? ()

Thread 1 (LWP 33701):
#0 0x7fee0421b7a1 in abort () from /lib/ld-musl-x86_64.so.1
#1 0x55d7e6012b70 in ?? ()
#2 0x0020 in ?? ()
#3 0x in ?? ()
(gdb)


I'm not a developer nor skilled with gdb but if you provide me the 
debugging commands I can execute them and reply back with the results.
I can also provide the binary and the core dump for analysis if needed.

While waiting for replies I will check if I can upgrade to qemu 4.1.0, 
try to repro and provide the results.

Thanks.

Fernando

On mié, oct 23, 2019 at 7:57 PM, Fernando Casas Schössow 
 wrote:
> In virsh I would do this while the guest is running:
> 
> virsh attach-disk--type 
> cdrom --mode readonly
> 
> Following the example for guest f

Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-23 Thread Fernando Casas Schössow
Hi John,

Thanks for looking into this.
I can quickly repro the problem with qemu 4.0 binary with debugging symbols 
enabled as I have it available already.
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try.

Would it worth it to try with 4.0 first and get the strack trace or it will not 
help and the only way to go is with 4.1 (or dev)?

Thanks,

Fernando

On mié, oct 23, 2019 at 5:34 PM, John Snow  wrote:
On 10/18/19 5:41 PM, Fernando Casas Schössow wrote:
Hi,
Hi! Thanks for the report.
Today while working with two different Windows Server 2012 R2 guests I found 
that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt 
(virsh or virt-manager) while the guest is running, qemu crashes and the 
following message is logged: Assertion failed: blk_get_aio_context(d->conf.blk) 
== s->ctx 
(/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: 
virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to 
try to attach an ISO file to the SCSI CDROM while the guest is running. The 
SCSI controller model is virtio-scsi with iothread enabled. Please find below 
all the details about my setup that I considered relevant but I missed 
something please don't hesitate to let me know:
Looks like we got aio_context management wrong with iothread for the media 
change events somewhere. Should be easy enough to fix if we figure out where 
the bad assumption is.
Host arch: x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0
Do you have the ability to try 4.1, or the latest development head with 
debugging symbols enabled?
Linux kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: 
virtio-scsi (with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window 
Server 2012 R2 Guest virtio drivers version: 171 (current stable) qemu command 
line: /usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S 
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes
 -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu 
IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff
 -drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1
 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object 
iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet 
-no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot 
strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device 
virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 
-chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -chardev socket,id=charchannel1,fd=45,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
 -chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 -device 
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0
 -device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice 
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestam

Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-23 Thread Fernando Casas Schössow
In virsh I would do this while the guest is running:

virsh attach-disk--type cdrom 
--mode readonly

Following the example for guest from my first email:

virsh attach-disk DCHOMENET01 /resources/virtio-win-0.1.171-stable.iso sdb 
--type cdrom --mode readonly

Right after executing this, qemu crashes and log the assertion.
I can repro this also from virt-manager by selecting the cdrom device -> 
Connect -> selecting the ISO file -> Choose volume -> Ok (basically the same 
but in the gui).

I may be able to try 4.1. Will look into it and report back.

On mié, oct 23, 2019 at 7:33 PM, John Snow  wrote:
On 10/23/19 1:28 PM, Fernando Casas Schössow wrote:
Hi John, Thanks for looking into this. I can quickly repro the problem with 
qemu 4.0 binary with debugging symbols enabled as I have it available already. 
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try. Would it worth it to try with 4.0 
first and get the strack trace or it will not help and the only way to go is 
with 4.1 (or dev)? Thanks, Fernando
If 4.0 is what you have access to, having the stack trace for that is better 
than not, but confirming it happens on the latest release would be nice. Can 
you share your workflow for virsh that reproduces the failure? --js
On mié, oct 23, 2019 at 5:34 PM, John Snow 
mailto:js...@redhat.com>> wrote:
On 10/18/19 5:41 PM, Fernando Casas Schössow wrote: Hi, Hi! Thanks for the 
report. Today while working with two different Windows Server 2012 R2 guests I 
found that when I try to attach an ISO file to a SCSI CD-ROM device through 
libvirt (virsh or virt-manager) while the guest is running, qemu crashes and 
the following message is logged: Assertion failed: 
blk_get_aio_context(d->conf.blk) == s->ctx 
(/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: 
virtio_scsi_ctx_check: 246) I can repro this at will. All I have to do is to 
try to attach an ISO file to the SCSI CDROM while the guest is running. The 
SCSI controller model is virtio-scsi with iothread enabled. Please find below 
all the details about my setup that I considered relevant but I missed 
something please don't hesitate to let me know: Looks like we got aio_context 
management wrong with iothread for the media change events somewhere. Should be 
easy enough to fix if we figure out where the bad assumption is. Host arch: 
x86_64 Distro: Alpine Linux 3.10.2 qemu version: 4.0 Do you have the ability to 
try 4.1, or the latest development head with debugging symbols enabled? Linux 
kernel version: 4.19.67 libvirt: 5.5.0 Emulated SCSI controller: virtio-scsi 
(with iothread enabled) Guest firmware: OVMF-EFI Guest OS: Window Server 2012 
R2 Guest virtio drivers version: 171 (current stable) qemu command line: 
/usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes
 -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu 
IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff
 -drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1
 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object 
iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet 
-no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot 
strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device 
virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 
-chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -chardev socket,id=charchannel1,fd=

qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-18 Thread Fernando Casas Schössow
Hi,

Today while working with two different Windows Server 2012 R2 guests I 
found that when I try to attach an ISO file to a SCSI CD-ROM device 
through libvirt (virsh or virt-manager) while the guest is running, 
qemu crashes and the following message is logged:

Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx 
(/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: 
virtio_scsi_ctx_check: 246)

I can repro this at will. All I have to do is to try to attach an ISO 
file to the SCSI CDROM while the guest is running.
The SCSI controller model is virtio-scsi with iothread enabled.
Please find below all the details about my setup that I considered 
relevant but I missed something please don't hesitate to let me know:

Host arch: x86_64
Distro: Alpine Linux 3.10.2
qemu version: 4.0
Linux kernel version: 4.19.67
libvirt: 5.5.0
Emulated SCSI controller: virtio-scsi (with iothread enabled)
Guest firmware: OVMF-EFI
Guest OS: Window Server 2012 R2
Guest virtio drivers version: 171 (current stable)

qemu command line:

/usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S 
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes
 
-machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu 
IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff
 
-drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 
-drive 
file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1
 
-m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 
-object iothread,id=iothread1 -uuid 
f06978ad-2734-44ab-a518-5dfcf71d625e -no-user-config -nodefaults 
-chardev socket,id=charmonitor,fd=33,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc 
base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay 
-no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global 
PIIX4_PM.disable_s4=1 -boot strict=on -device 
qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device 
virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads
 
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 
-drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
 
-netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 
-chardev 
socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 
-chardev socket,id=charchannel1,fd=45,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
 
-chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 
-device 
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0
 
-device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice 
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 
-chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny 
-msg timestamp=on

I can provide a core dump of the process if needed for debugging and 
the guest XML as well.

Thanks.

Fernando




qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-18 Thread Fernando Casas Schössow
Hi,

Today while working with two different Windows Server 2012 R2 guests I found 
that when I try to attach an ISO file to a SCSI CD-ROM device through libvirt 
(virsh or virt-manager) while the guest is running, qemu crashes and the 
following message is logged:

Assertion failed: blk_get_aio_context(d->conf.blk) == s->ctx 
(/home/buildozer/aports/main/qemu/src/qemu-4.0.0/hw/scsi/virtio-scsi.c: 
virtio_scsi_ctx_check: 246)

I can repro this at will. All I have to do is to try to attach an ISO file to 
the SCSI CDROM while the guest is running.
The SCSI controller model is virtio-scsi with iothread enabled.
Please find below all the details about my setup that I considered relevant but 
I missed something please don't hesitate to let me know:

Host arch: x86_64
Distro: Alpine Linux 3.10.2
qemu version: 4.0
Linux kernel version: 4.19.67
libvirt: 5.5.0
Emulated SCSI controller: virtio-scsi (with iothread enabled)
Guest firmware: OVMF-EFI
Guest OS: Window Server 2012 R2
Guest virtio drivers version: 171 (current stable)

qemu command line:

/usr/bin/qemu-system-x86_64 -name guest=DCHOMENET01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-78-DCHOMENET01/master-key.aes
 -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=on -cpu 
IvyBridge,ss=on,vmx=off,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff
 -drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DCHOMENET01_VARS.fd,if=pflash,format=raw,unit=1
 -m 1536 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -object 
iothread,id=iothread1 -uuid f06978ad-2734-44ab-a518-5dfcf71d625e 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet 
-no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot 
strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -device 
virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=1,bus=pci.0,addr=0x5 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/storage/storage-hdd-vms/virtual_machines_hdd/dchomenet01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,discard=unmap,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=43 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:99:b5:62,bus=pci.0,addr=0x3 
-chardev socket,id=charserial0,host=127.0.0.1,port=4900,telnet,server,nowait 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -chardev socket,id=charchannel1,fd=45,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
 -chardev spiceport,id=charchannel2,name=org.spice-space.webdav.0 -device 
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.spice-space.webdav.0
 -device virtio-tablet-pci,id=input2,bus=pci.0,addr=0x7 -spice 
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestamp=on

I can provide a core dump of the process if needed for debugging and the guest 
XML as well.

Thanks.

Fernando




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-21 Thread Fernando Casas Schössow
Hi Stefan,

I can confirm that the symbols are included in the binary using gdb.

I will send you and Paolo an email with the link and credentials (if needed) so 
you can download everything.

Thanks!

On jue, feb 21, 2019 at 12:11 PM, Stefan Hajnoczi  wrote:
On Wed, Feb 20, 2019 at 06:56:04PM +, Fernando Casas Schössow wrote:
Regarding the dumps I have three of them including guest memory, 2 for 
virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which 
is the proble.) I can upload them to a server you indicate me or I can also put 
them on a server so you can download them as you see fit. Each dump, 
compressed, is around 500MB.
Hi Fernando, It would be great if you could make a compressed coredump and the 
corresponding QEMU executable (hopefully with debug symbols) available on a 
server so Paolo and/or I can download them. If you're wondering about the debug 
symbols, since you built QEMU from source the binary in 
x86_64-softmmu/qemu-system-x86_64 should have the debug symbols. "gdb 
path/to/qemu-system-x86_64" will print "Reading symbols from 
qemu-system-x86_64...done." if symbols are available. Otherwise it will say 
"Reading symbols from qemu-system-x86_64...(no debugging symbols 
found)...done.". Thanks, Stefan




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-20 Thread Fernando Casas Schössow
Hi Paolo,

This is Fernando, the one  that reported the issue.
Regarding the dumps I have three of them including guest memory, 2 for 
virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which 
is the proble.) I can upload them to a server you indicate me or I can also put 
them on a server so you can download them as you see fit. Each dump, 
compressed, is around 500MB.

If it's more convenient for you I can try to get the requested information from 
gdb. But I will need some guidance since I'm not skilled enough with the 
debugger.

Another option, if you provide me with the right patch, is for me to patch, 
rebuild QEMU and repro the problem again. With virtio-scsi I'm able to repro 
this in a matter of hours most of the times, with virtio-blk it will take a 
couple of days.

Just let me know how do you prefer to move forward.

Thanks a lot for helping with this!

Kind regards,

Fernando

On mié, feb 20, 2019 at 6:53 PM, Paolo Bonzini  wrote:
On 20/02/19 17:58, Stefan Hajnoczi wrote:
On Mon, Feb 18, 2019 at 07:21:25AM +, Fernando Casas Schössow wrote:
It took a few days but last night the problem was reproduced. This is the 
information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 
(idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 
avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625
Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx 
value 58874 was read via the cache while a non-cached read produces 58625! I 
suspect that 58625 is correct since the vring is already full and the driver 
wouldn't bump avail_idx any further until requests complete. Fernando also hits 
this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a 
virtio-blk device emulation issue.
No, I am not aware of any issues. How can I get the core dump (and the 
corresponding executable to get the symbols)? Alternatively, it should be 
enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one 
possibility is to add in vring_avail_idx an assertion like 
assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + 
offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo
A QEMU core dump is available for debugging. Here is the patch that produced 
this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 
a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c 
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; 
} rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
(virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + 
fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", 
old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + 
fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", 
vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, 
idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- 
core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan
used_idx 58497 2019-02-18 03:20:08.605+: shutting down, reason=crashed The 
dump file, including guest memory, was generated successfully (after gzip the 
file is around 492MB). I switched the guest now to virtio-scsi to get the 
information and dump with this setup as well. How should we proceed? Thanks.



Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-18 Thread Fernando Casas Schössow
Problem reproduced with virtio-scsi as well on the same guest, this time it 
took less than a day.
Information from the log file:

vdev 0x55823f119f90 ("virtio-scsi")
vq 0x55823f122e80 (idx 2)
inuse 128 vring.num 128
old_shadow_avail_idx 58367 last_avail_idx 58113 avail_idx 58367
avail 0x3de8a800 avail_idx (cache bypassed) 58113
used_idx 57985
2019-02-19 04:20:43.291+: shutting down, reason=crashed

Got the dump file as well, including guest memory. Size is around 486MB after 
compression.
Is there any other information I should collect to progress the investigation?

Thanks.

On lun, feb 18, 2019 at 8:21 AM, Fernando Casas Schössow 
 wrote:
It took a few days but last night the problem was reproduced.
This is the information from the log:

vdev 0x55f261d940f0 ("virtio-blk")
vq 0x55f261d9ee40 (idx 0)
inuse 128 vring.num 128
old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
avail 0x3d87a800 avail_idx (cache bypassed) 58625
used_idx 58497
2019-02-18 03:20:08.605+: shutting down, reason=crashed

The dump file, including guest memory, was generated successfully (after gzip 
the file is around 492MB).
I switched the guest now to virtio-scsi to get the information and dump with 
this setup as well.

How should we proceed?

Thanks.

On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi  wrote:
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi 
failed suggests it's not a virtqueue element leak in the virtio-blk or 
virtio-scsi device emulation code. The hung task error messages from inside the 
guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. 
QEMU refuses to process further requests after the error, causing tasks inside 
the guest to get stuck on I/O. I don't have a good theory regarding the root 
cause. Two ideas: 1. The guest is corrupting the vring or submitting more 
requests than will fit into the ring. Somewhat unlikely because it happens with 
both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the 
memory region cache which is used for fast guest RAM accesses. Here is an 
expanded version of the debug patch which might help identify which of these 
scenarios is likely. Sorry, it requires running the guest again! This time 
let's make QEMU dump core so both QEMU state and guest RAM are captured for 
further debugging. That way it will be possible to extract more information 
using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c 
b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- 
a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + 
uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
(virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + 
fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", 
old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + 
fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", 
vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, 
idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- 
core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }






Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-11 Thread Fernando Casas Schössow
Thanks for looking into this Stefan.

I rebuilt Qemu with the new patch and got a couple of guests running with the 
new build. Two of them using virtio-scsi and another one using virtio-blk. Now 
I'm waiting for any of them to crash.
I also set libvirt to include the guest memory in the qemu dumps as I 
understood you will want to look at both (qemu dump and guest memory dump).

I will reply to this thread once I have any news.

Kind regards.

Fernando

On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi  wrote:
On Wed, Feb 06, 2019 at 04:47:19PM +, Fernando Casas Schössow wrote:
I could also repro the same with virtio-scsi on the same guest a couple of 
hours later: 2019-02-06 07:10:37.672+: starting up libvirt version: 4.10.0, 
qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local 
LC_ALL=C 
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
 HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 
-name guest=DOCKER01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes
 -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu 
IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on
 -drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
socket,id=charmonitor,fd=46,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global 
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev socket,id=charchannel0,fd=51,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestamp=on 2019-02-06 07:10:37.672+: Domain id=32 is tainted: 
high-privileges char device redirected to /dev/pts/5 (label charserial0) vdev 
0x5585456ef6b0 ("virtio-scsi") vq 0x5585456f90a0 (idx 2) inuse 128 vring.num 
128 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded I'm 
open to any tests or suggestions that can move the investigation forward and 
find the cause of this issue.
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi 
failed suggests it's not a virtqueue element leak in the virtio-blk or 
virtio-scsi device emulation code. The hung task error messages from inside the 
guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. 
QEMU refuses to process further requests after the error, causing tasks inside 
the guest to get stuck on I/O. I don't have a good theory regarding the root 
cause. Two ideas: 1. The guest is corrupting the vring or submitting more 
requests than will fit into the ring. Somewhat unlikely because it happens with 
both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the 
memory region cache which is used for fast gues

Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-06 Thread Fernando Casas Schössow
I could also repro the same with virtio-scsi on the same guest a couple of 
hours later:

2019-02-06 07:10:37.672+: starting up libvirt version: 4.10.0, qemu 
version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
LC_ALL=C 
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
 HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 
-name guest=DOCKER01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes
 -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu 
IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on
 -drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
socket,id=charmonitor,fd=46,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global 
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev socket,id=charchannel0,fd=51,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestamp=on
2019-02-06 07:10:37.672+: Domain id=32 is tainted: high-privileges
char device redirected to /dev/pts/5 (label charserial0)
vdev 0x5585456ef6b0 ("virtio-scsi")
vq 0x5585456f90a0 (idx 2)
inuse 128 vring.num 128
2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded


I'm open to any tests or suggestions that can move the investigation forward 
and find the cause of this issue.
Thanks.

On mié, feb 6, 2019 at 8:15 AM, Fernando Casas Schössow 
 wrote:
I can now confirm that the same happens with virtio-blk and virtio-scsi.
Please find below the qemu log enhanced with the new information added by the 
patch provided by Stefan:

vdev 0x55d22b8e10f0 ("virtio-blk")
vq 0x55d22b8ebe40 (idx 0)
inuse 128 vring.num 128
2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded

I just changed the disk back to virtio-scsi so I can repro this again with the 
patched qemu and report back.

Thanks.

On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow 
 wrote:

I can test again with qemu 3.1 but with previous versions yes, it was happening 
the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can 
test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by 
the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the 
issue. This may take a couple of days since I cannot reproduce it at will. 
Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, St

Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-05 Thread Fernando Casas Schössow
I can now confirm that the same happens with virtio-blk and virtio-scsi.
Please find below the qemu log enhanced with the new information added by the 
patch provided by Stefan:

vdev 0x55d22b8e10f0 ("virtio-blk")
vq 0x55d22b8ebe40 (idx 0)
inuse 128 vring.num 128
2019-02-06T00:40:41.742552Z qemu-system-x86_64: Virtqueue size exceeded

I just changed the disk back to virtio-scsi so I can repro this again with the 
patched qemu and report back.

Thanks.

On lun, feb 4, 2019 at 8:24 AM, Fernando Casas Schössow 
 wrote:

I can test again with qemu 3.1 but with previous versions yes, it was happening 
the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can 
test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by 
the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the 
issue. This may take a couple of days since I cannot reproduce it at will. 
Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi  wrote:
Are you sure this happens with both virtio-blk and virtio-scsi? The following 
patch adds more debug output. You can build as follows: $ git clone 
https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the 
patch here... ^D # For info on build dependencies see 
https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ 
make -j4 You can configure a libvirt domain to use your custom QEMU binary by 
changing the  tag to the 
qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git 
a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 
--- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); 
virtio_error(vdev, "Virtqueue size exceeded"); goto done; }






Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-03 Thread Fernando Casas Schössow

I can test again with qemu 3.1 but with previous versions yes, it was happening 
the same with both virtio-blk and virtio-scsi.
For 3.1 I can confirm it happens for virtio-scsi (already tested it) and I can 
test with virtio-blk again if that will add value to the investigation.
Also I'm attaching a guest console screenshot showing the errors displayed by 
the guest when it goes unresponsive in case it can help.

Thanks for the patch. I will build the custom qemu binary and reproduce the 
issue. This may take a couple of days since I cannot reproduce it at will. 
Sometimes it takes 12 hours sometimes 2 days until it happens.
Hopefully the code below will add more light on to this problem.

Thanks,

Fernando

On lun, feb 4, 2019 at 7:06 AM, Stefan Hajnoczi  wrote:
Are you sure this happens with both virtio-blk and virtio-scsi? The following 
patch adds more debug output. You can build as follows: $ git clone 
https://git.qemu.org/git/qemu.git $ cd qemu $ patch apply -p1 ...paste the 
patch here... ^D # For info on build dependencies see 
https://wiki.qemu.org/Hosts/Linux $ ./configure --target-list=x86_64-softmmu $ 
make -j4 You can configure a libvirt domain to use your custom QEMU binary by 
changing the  tag to the 
qemu/x86_64-softmmu/qemu-system-x86_64 path. --- diff --git 
a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 22bd1ac34e..aa44bffa1f 100644 
--- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -879,6 +879,9 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); 
virtio_error(vdev, "Virtqueue size exceeded"); goto done; }




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-01 Thread Fernando Casas Schössow
 -drive 
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads
 -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
 -netdev tap,fd=46,id=hostnet0,vhost=on,vhostfd=48 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev socket,id=charchannel0,fd=49,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestamp=on

Thanks.
Kind regards,

Fernando

On vie, feb 1, 2019 at 6:48 AM, Stefan Hajnoczi  wrote:
On Thu, Jan 31, 2019 at 11:32:32AM +, Fernando Casas Schössow wrote:
Sorry for resurrecting this thread after so long but I just upgraded the host 
to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem. At the moment I 
cannot use virtio disks (virtio-blk nor virtio-scsi) with my guests in order to 
avoid this issue so as a workaround I'm using SATA emulated storage which is 
not ideal but is perfectly stable. Do you have any suggestions on how can I 
progress troubleshooting? Qemu is not crashing so I don't have any dumps that 
can be analyzed. The guest is just "stuck" and all I can do is destroy it and 
start it again. It's really frustrating that after all this time I couldn't 
find the cause for this issue so any ideas are welcome.
Hi Fernando, Please post your QEMU command-line (ps aux | grep qemu) and the 
details of the guest operating system and version. Stefan




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-01-31 Thread Fernando Casas Schössow
Hi,

Sorry for resurrecting this thread after so long but I just upgraded the host 
to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem.
At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my 
guests in order to avoid this issue so as a workaround I'm using SATA emulated 
storage which is not ideal but is perfectly stable.

Do you have any suggestions on how can I progress troubleshooting?
Qemu is not crashing so I don't have any dumps that can be analyzed. The guest 
is just "stuck" and all I can do is destroy it and start it again.
It's really frustrating that after all this time I couldn't find the cause for 
this issue so any ideas are welcome.

Thanks.

Fernando

____
From: Fernando Casas Schössow 
Sent: Saturday, June 24, 2017 10:34 AM
To: Ladi Prosek
Cc: qemu-de...@nongnu.org
Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

Hi Ladi,

After running for about 15hrs two different guests (one Windows, one Linux) 
crashed with around 1 hour difference and the same error in qemu log "Virqueue 
size exceeded".

The Linux guest was already running on virtio_scsi and without virtio_balloon. 
:(
I compiled and attached gdbserver to the qemu process for this guest but when I 
did this I got the following warning in gdbserver:

warning: Cannot call inferior functions, Linux kernel PaX protection forbids 
return to non-executable pages!

The default Alpine kernel is a grsec kernel. Not sure if this will interfere 
with debugging or not but I suspect yes.
If you need me to replace the grsec kernel with a vanilla one (also available 
as an option in Alpine) let me know and I will do so.
Otherwise send me an email directly so I can share with you the host:port 
details so you can connect to gdbserver.

Thanks,

Fer

On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow 
 wrote:
Hi Ladi,

Small update. Memtest86+ was running on the host for more than 54 hours. 8 
passes were completed and no memory errors found. For now I think we can assume 
that the host memory is ok.

I just started all the guests one hour ago. I will monitor them and once one 
fails I will attach the debugger and let you know.

Thanks.

Fer

On jue, jun 22, 2017 at 9:43 , Ladi Prosek  wrote:
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow 
mailto:casasferna...@hotmail.com>> wrote:
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone 
for now then. For the last 15 hours or so I'm running memtest86+ on the host. 
So far so good. Two passes no errors so far. I will try to leave it running for 
at least another 24hr and report back the results. Hopefully we can discard the 
memory issue at hardware level. Regarding KSM, that will be the next thing I 
will disable if after removing the balloon device guests still crash. About 
leaving a guest in a failed state for you to debug it remotely, that's 
absolutely an option. We just need to coordinate so I can give you remote 
access to the host and so on. Let me know if any preparation is needed in 
advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the 
VM gets into the broken state please do something like: gdbserver --attach 
host:12345  and let me know the host name and port (12345 in the 
above example).
Once I again I would like to thank you for all your help and your great 
disposition!
You're absolutely welcome, I don't think I've done anything helpful so far :)
Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek 
mailto:lpro...@redhat.com>> wrote: The host kernel is less 
likely to be responsible for this, in my opinion. I'd hold off on that for now. 
And last but not least KSM is enabled on the host. Should I disable it? Could 
be worth the try. Following your advice I will run memtest on the host and 
report back. Just as a side comment, the host is running on ECC memory. I see. 
Would it be possible for you, once a guest is in the broken state, to make it 
available for debugging? By attaching gdb to the QEMU process for example and 
letting me poke around it remotely? Thanks!






Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2018-11-16 Thread Fernando Casas Schössow
Hi again,

I hit another occurrence of this issue and collected the qemy dump as well. So 
at the moment I have two qemu dumps for this issue happening on two different 
guests.
I wish I know how to analyze them myself but is not the case so if anyone in 
the list is willing to take a look I'm more than willing to share them.

Thanks in advance.

Fernando

On jue, oct 18, 2018 at 3:40 PM, Fernando Casas Schössow 
 wrote:
Hi Kevin,

Not at the moment. This is a production system and pretty much up to date but 
can't upgrade to 3.0 yet.
If the dump can be of any use, I can upload it somewhere for analysis.

BR,

Fernando

On jue, oct 18, 2018 at 2:38 PM, Kevin Wolf  wrote:
Hi Fernando, Am 18.10.2018 um 14:25 hat Fernando Casas Schössow geschrieben:
I hope this email finds you well and I apologize in advance for resurrecting 
this thread. I'm currently running on qemu 2.12.1 and I'm still having this 
issue every few days but now I managed to get a core dump generated (without 
including the guest memory). Would you take a look at the dump? Please let me 
know how do you prefer me to share it. The file is around 290MB as is but I can 
try to compress it.
would it be possible for you to test whether the bug still exists in QEMU 3.0? 
There were a few fixes related to request draining in that release, so maybe 
it's already solved now. Kevin






Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2018-10-18 Thread Fernando Casas Schössow
Hi Kevin,

Not at the moment. This is a production system and pretty much up to date but 
can't upgrade to 3.0 yet.
If the dump can be of any use, I can upload it somewhere for analysis.

BR,

Fernando

On jue, oct 18, 2018 at 2:38 PM, Kevin Wolf  wrote:
Hi Fernando, Am 18.10.2018 um 14:25 hat Fernando Casas Schössow geschrieben:
I hope this email finds you well and I apologize in advance for resurrecting 
this thread. I'm currently running on qemu 2.12.1 and I'm still having this 
issue every few days but now I managed to get a core dump generated (without 
including the guest memory). Would you take a look at the dump? Please let me 
know how do you prefer me to share it. The file is around 290MB as is but I can 
try to compress it.
would it be possible for you to test whether the bug still exists in QEMU 3.0? 
There were a few fixes related to request draining in that release, so maybe 
it's already solved now. Kevin




Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2018-10-18 Thread Fernando Casas Schössow
Hi Stefan,

I hope this email finds you well and I apologize in advance for resurrecting 
this thread.
I'm currently running on qemu 2.12.1 and I'm still having this issue every few 
days but now I managed to get a core dump generated (without including the 
guest memory).
Would you take a look at the dump? Please let me know how do you prefer me to 
share it. The file is around 290MB as is but I can try to compress it.

Looking forward your reply.
Thanks
Kind regards,

Fernando

From: Stefan Hajnoczi 
Sent: Monday, January 29, 2018 5:01 PM
To: Fernando Casas Schössow
Cc: qemu-devel; qemu-block@nongnu.org; Jeff Cody
Subject: Re: [Qemu-devel] qemu process crash: Assertion failed: 
QLIST_EMPTY(>tracked_requests)

On Wed, Dec 13, 2017 at 03:33:01PM +0000, Fernando Casas Schössow wrote:
> Maybe I’m missing something here but, if I recall correctly, the qemu process 
> for the guest is terminated when this problem happens. So how a debugger will 
> be attached to a process that is gone?

Sorry, this got lost in my inbox.

assert(false) sends SIGABRT to the process.  The default behavior is to
dump a core file that can be analyzed later with GDB.

Your system must have core dumps enabled (i.e. ulimit -c unlimited).
Also, various pieces of software like systemd's coredumpctl can
influence where and how core dumps are stored.

But in short, an assertion failure produces a core dump.

Stefan


Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2017-12-13 Thread Fernando Casas Schössow
Thanks for the explanation Stefan.

Maybe I’m missing something here but, if I recall correctly, the qemu process 
for the guest is terminated when this problem happens. So how a debugger will 
be attached to a process that is gone?

Also I don’t think the Alpine packages for qemu include the debug info. :(
But I’m not 100% sure at the moment.

Sent from my iPhone

On 13 Dec 2017, at 12:23, Stefan Hajnoczi 
<stefa...@gmail.com<mailto:stefa...@gmail.com>> wrote:

On Mon, Dec 11, 2017 at 01:42:38PM +0000, Fernando Casas Schössow wrote:
Hello Stefan,

Thanks for your reply.
Fortunately I didn’t have the problem again and it’s not clear how it can be 
consistently reproduced. Daily backups are running as usual at the moment.

If there is anything I can do from my side or if you have any ideas to try to 
reproduce it let me know.

Okay, thanks.

In case it happens again, here is some information about the assertion
failure.  It will probably be necessary to use GDB to inspect the failed
process (core dump).  The starting point for debugging is mirror_run()
in block/mirror.c:

 if (cnt == 0 && should_complete) {
 /* The dirty bitmap is not updated while operations are pending.
  * If we're about to exit, wait for pending operations before
  * calling bdrv_get_dirty_count(bs), or we may exit while the
  * source has dirty data to copy!
  *
  * Note that I/O can be submitted by the guest while
  * mirror_populate runs, so pause it now.  Before deciding
  * whether to switch to target check one last time if I/O has
  * come in the meanwhile, and if not flush the data to disk.
  */
 trace_mirror_before_drain(s, cnt);

 bdrv_drained_begin(bs);
 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 if (cnt > 0 || mirror_flush(s) < 0) {
 bdrv_drained_end(bs);
 continue;
 }

 /* The two disks are in sync.  Exit and report successful
  * completion.
  */
 assert(QLIST_EMPTY(>tracked_requests));

The reason why this assertion makes sense is because
bdrv_drained_begin(bs) starts a region where I/O is quiesced (all
requests should have been completed).

Either there is a bug with blockjobs vs bdrv_drained_debug(), or maybe
mirror_flush() submitted new I/O requests.

It would be interesting to print the bs->tracked_requests linked list
in the GDB debugger.  Those requests might give a clue about the root
cause.

Stefan


Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2017-12-11 Thread Fernando Casas Schössow
Hello Stefan,

Thanks for your reply.
Fortunately I didn’t have the problem again and it’s not clear how it can be 
consistently reproduced. Daily backups are running as usual at the moment.

If there is anything I can do from my side or if you have any ideas to try to 
reproduce it let me know.

Thanks.

Fer

Sent from my iPhone

On 11 Dec 2017, at 12:56, Stefan Hajnoczi 
<stefa...@gmail.com<mailto:stefa...@gmail.com>> wrote:

On Thu, Dec 07, 2017 at 10:18:52AM +0000, Fernando Casas Schössow wrote:
Hi there,


Last night while doing a backup of a guest using the live snapshot mechanism 
the qemu process for the guest seem to had crashed.

The snapshot succeeded then the backup of the VM disk had place and also 
succeeded but the commit to the original disk after the backup seem to have 
failed.

The command I use in the script to take the snapshot is:


virsh snapshot-create-as --domain $VM backup-job.qcow2 --disk-only --atomic 
--quiesce --no-metadata


And then to commit back is:


virsh blockcommit $VM $TARGETDISK --base $DISKFILE --top $SNAPFILE --active 
--pivot


In the qemu log for the guest I found the following while the commit back was 
having place:


Assertion failed: QLIST_EMPTY(>tracked_requests) 
(/home/buildozer/aports/main/qemu/src/qemu-2.10.1/block/mirror.c: mirror_run: 
884)

I'm running qemu 2.10.1 with libvirt 3.9.0 and kernel 4.9.65 on Alpine Linux 
3.7.

This is the complete guest info from the logs:


LC_ALL=C 
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
 HOME=/root USER=root QEMU_AUDIO_DRV=spice /usr/bin/qemu-system-x86_64 -name 
guest=DOCKER01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-DOCKER01/master-key.aes
 -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on 
-drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-DOCKER01/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
-device ahci,id=sata0,bus=pci.0,addr=0x9 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-sata0-0-0,cache=none,aio=threads
 -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-DOCKER01/org.qemu.guest_agent.0,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on



I was running on qemu 2.8.1 for months and didn't have any problems with the 
backups but yesterday I updated to qemu 2.10.1 and I hit this problem last 
night.


Is this a bug? Any ideas will be appreciated.

Thanks for reporting this bug.  Can you reproduce it reliably?

Stefan