> I don't think the assert you are talking about in the subject is added > by 9972354856. That assertion was added by 86698a12f and has been > present since QEMU 2.6. I don't see the relation immediately to > AioContext patches.
You are right, of course. Sorry for misleading you about this. What I meant to write was that git bisect pinpoints commit 9972354856 as the likely culprit ("likely" because of the makeshift testing methodology used). > Is this only during boot/shutdown? If not, it looks like there might be > some other errors occurring that aggravate the device state and cause a > reset by the guest. In fact this has never happened to me upon boot or shutdown. I believe the operating system installed on the storage volume I am testing this with has some kind of disk-intensive activity scheduled to run about twenty minutes after booting. That is why I have to wait that long after booting the VM to determine whether the issue appears. > Anyway, what should happen is something like this: > > - Guest issues a reset request (ide_exec_cmd -> cmd_device_reset) > - The device should now be "busy" and cannot accept any more requests (see > the conditional early in ide_exec_cmd) > - cmd_device_reset drains any existing requests. > - we assert that there are no handles to BH routines that have yet to return > > Normally I'd say this is enough; because: > > Although blk_drain does not prohibit future DMA transfers, it is being > called after an explicit reset request from the guest, and so the device > should be unable to service any further requests. After existing DMA > commands are drained we should be unable to add any further requests. > > It generally shouldn't be possible to see new requests show up here, > unless; > > (A) We are not guarding ide_exec_cmd properly and a new command is sneaking > in while we are trying to reset the device, or > (B) blk_drain is not in fact doing what we expect it to (draining all pending > DMA from an outstanding IDE command we are servicing.) ide_cancel_dma_sync() is also invoked from bmdma_cmd_writeb() and this is in fact the code path taken when the assertion fails. > Since you mentioned that you need to enable TRIM support in order to see > the behavior, perhaps this is a function of a TRIM command being > improperly implemented and causing the guest to panic, and we are indeed > not draining TRIM requests properly. I am not sure what the relation of TRIM to BMDMA is, but I still cannot reproduce the issue without TRIM being enabled. > That's my best wild guess, anyway. If you can't reproduce this > elsewhere, can you run some debug version of this to see under which > codepath we are invoking reset, and what the running command that we are > failing to terminate is? I recompiled QEMU with --enable-debug --extra-cflags="-ggdb -O0" and attached the output of "bt full". If this is not enough, please let me know. ** Attachment added: "Output of "bt full" when the assertion fails" https://bugs.launchpad.net/qemu/+bug/1681439/+attachment/4860013/+files/bt-full.log -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1681439 Title: qemu-system-x86_64: hw/ide/core.c:685: ide_cancel_dma_sync: Assertion `s->bus->dma->aiocb == NULL' failed. Status in QEMU: New Bug description: Since upgrading to QEMU 2.8.0, my Windows 7 64-bit virtual machines started crashing due to the assertion quoted in the summary failing. The assertion in question was added by commit 9972354856 ("block: add BDS field to count in-flight requests"). My tests show that setting discard=unmap is needed to reproduce the issue. Speaking of reproduction, it is a bit flaky, because I have been unable to come up with specific instructions that would allow the issue to be triggered outside of my environment, but I do have a semi-sane way of testing that appears to depend on a specific initial state of data on the underlying storage volume, actions taken within the VM and waiting for about 20 minutes. Here is the shortest QEMU command line that I managed to reproduce the bug with: qemu-system-x86_64 \ -machine pc-i440fx-2.7,accel=kvm \ -m 3072 \ -drive file=/dev/lvm/qemu,format=raw,if=ide,discard=unmap \ -netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no,vhost=on \ -device virtio-net-pci,netdev=hostnet0 \ -vnc :0 The underlying storage (/dev/lvm/qemu) is a thin LVM snapshot. QEMU was compiled using: ./configure --python=/usr/bin/python2.7 --target-list=x86_64-softmmu make -j3 My virtualization environment is not really a critical one and reproduction is not that much of a hassle, so if you need me to gather further diagnostic information or test patches, I will be happy to help. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1681439/+subscriptions