OK, I can probably fix the alignment and use the new undo function to at least improve the characteristics of this failure considerably. Thanks for the pointer on the new iov function.
I see two potential problems still: (1) The IDE device will allow partial transfers to succeed, by doing partial sector writes. AFAIUI the IDE state machine is supposed to do full sector or nothing at all, but maybe I am wrong about that being a requirement. It is at least not a regression, exactly. I can file a separate bug for this. (2) The invalid device inputs are just completely unknown to me right now, and I don't have any hardware in my own house to test this, so I have to deprioritize that until I can get back into the office and regain access to more testing equipment. HELP WANTED, if anyone has a PCI BMDMA that they can orchestrate in a virtual environment to prod it for how it handles certain errant inputs. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1681439 Title: dma_blk_cb leaks memory map handles on misaligned IO Status in QEMU: Confirmed Bug description: Maintainer Edit: The functions in dma-helpers mismanage misaligned IO, badly enough to cause an infinite loop where no progress can be made. This allows the IDE state machine to get wedged such that cancelling DMA can fail; because the DMA helpers have bodged the state of the DMA transfer. See Comment #15 for the in-depth analysis. I've updated the name of this bug to reflect the current status as I understand it. --js Original report: Since upgrading to QEMU 2.8.0, my Windows 7 64-bit virtual machines started crashing due to the assertion quoted in the summary failing. The assertion in question was added by commit 9972354856 ("block: add BDS field to count in-flight requests"). My tests show that setting discard=unmap is needed to reproduce the issue. Speaking of reproduction, it is a bit flaky, because I have been unable to come up with specific instructions that would allow the issue to be triggered outside of my environment, but I do have a semi-sane way of testing that appears to depend on a specific initial state of data on the underlying storage volume, actions taken within the VM and waiting for about 20 minutes. Here is the shortest QEMU command line that I managed to reproduce the bug with: qemu-system-x86_64 \ -machine pc-i440fx-2.7,accel=kvm \ -m 3072 \ -drive file=/dev/lvm/qemu,format=raw,if=ide,discard=unmap \ -netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no,vhost=on \ -device virtio-net-pci,netdev=hostnet0 \ -vnc :0 The underlying storage (/dev/lvm/qemu) is a thin LVM snapshot. QEMU was compiled using: ./configure --python=/usr/bin/python2.7 --target-list=x86_64-softmmu make -j3 My virtualization environment is not really a critical one and reproduction is not that much of a hassle, so if you need me to gather further diagnostic information or test patches, I will be happy to help. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1681439/+subscriptions