We have an issue when using the VFIO-over-socket libmuser PoC (https://www.mail-archive.com/qemu-devel@nongnu.org/msg692251.html) instead of the VFIO kernel module: we notice that DMA regions used by the emulated device can be abruptly removed while the device is still using them.
The PCI device we've implemented is an NVMe controller using SPDK, so it polls the submission queues for new requests. We use the latest SeaBIOS where it tries to boot from the NVMe controller. Several DMA regions are registered (VFIO_IOMMU_MAP_DMA) and then the admin and a submission queues are created. >From this point SPDK polls both queues. Then, the DMA region where the submission queue lies is removed (VFIO_IOMMU_UNMAP_DMA) and then re-added at the same IOVA but at a different offset. SPDK crashes soon after as it accesses invalid memory. There is no other event (e.g. some PCI config space or NVMe register write) happening between unmapping and mapping the DMA region. My guess is that this behavior is legitimate and that this is solved in the VFIO kernel module by releasing the DMA region only after all references to it have been released, which is handled by vfio_pin/unpin_pages, correct? If this is the case then I suppose we need to implement the same logic in libmuser, but I just want to make sure I'm not missing anything as this is a substantial change.