Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben: > > > I really nobody else able to reproduce this (somebody already tried to > > > reproduce)? > > > > I can get hangs, but that's for job_completed(), not for starting the > > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks > > like a different case to me. > > Please can you post the command line args of your VM? I use something like > > ./x86_64-softmmu/qemu-system-x86_64 -chardev > 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon > 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid -m > 1024 -object 'iothread,id=iothread-virtioscsi0' -device > 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive > 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' > -device > 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' > -machine "type=pc,accel=kvm" > > Do you also run "stress-ng -d 5" indied the VM?
I'm not using the exact same test case, but something that I thought would be similar enough. Specifically, I run the script below, which boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero of=/dev/sda' while the script keeps starting and cancelling backup jobs in the background. Anyway, I finally managed to bisect my problem now (did it wrong the first time) and got this result: 00e30f05de1d19586345ec373970ef4c192c6270 is the first bad commit commit 00e30f05de1d19586345ec373970ef4c192c6270 Author: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> Date: Tue Oct 1 16:14:09 2019 +0300 block/backup: use backup-top instead of write notifiers Drop write notifiers and use filter node instead. = Changes = 1. Add filter-node-name argument for backup qmp api. We have to do it in this commit, as 257 needs to be fixed. 2. There are no more write notifiers here, so is_write_notifier parameter is dropped from block-copy paths. 3. To sync with in-flight requests at job finish we now have drained removing of the filter, we don't need rw-lock. 4. Block-copy is now using BdrvChildren instead of BlockBackends 5. As backup-top owns these children, we also move block-copy state into backup-top's ownership. [...] That's a pretty big change, and I'm not sure how it's related to completed requests hanging in the thread pool instead of reentering the file-posix coroutine. But I also tested it enough that I'm confident it's really the first bad commit. Maybe you want to try if your problem starts at the same commit? Kevin #!/bin/bash qmp() { cat <<EOF {'execute':'qmp_capabilities'} EOF while true; do cat <<EOF { "execute": "drive-backup", "arguments": { "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": "/tmp/backup.raw" } } EOF sleep 1 cat <<EOF { "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} } EOF sleep 2 done } ./qemu-img create -f qcow2 /tmp/test.qcow2 4G for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io /tmp/test.qcow2 qmp | x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -machine pc \ -m 1G \ -object 'iothread,id=iothread-virtioscsi0' \ -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \ -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \ -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \ -device scsi-hd,drive=drive_image1,id=image1 \ -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \ -boot d \ -qmp stdio -monitor vc