Steve, + John
On 5/12/25 17:32, Steve Sistare wrote:
Support vfio and iommufd devices with the cpr-transfer live migration mode. Devices that do not support live migration can still support cpr-transfer, allowing live update to a new version of QEMU on the same host, with no loss of guest connectivity. No user-visible interfaces are added. For legacy containers: Pass vfio device descriptors to new QEMU. In new QEMU, during vfio_realize, skip the ioctls that configure the device, because it is already configured. Use VFIO_DMA_UNMAP_FLAG_VADDR to abandon the old VA's for DMA mapped regions, and use VFIO_DMA_MAP_FLAG_VADDR to register the new VA in new QEMU and update the locked memory accounting. The physical pages remain pinned, because the descriptor of the device that locked them remains open, so DMA to those pages continues without interruption. Mediated devices are not supported, however, because they require the VA to always be valid, and there is a brief window where no VA is registered. Save the MSI message area as part of vfio-pci vmstate, and pass the interrupt and notifier eventfd's to new QEMU. New QEMU loads the MSI data, then the vfio-pci post_load handler finds the eventfds in CPR state, rebuilds vector data structures, and attaches the interrupts to the new KVM instance. This logic also applies to iommufd containers. For iommufd containers: Use IOMMU_IOAS_MAP_FILE to register memory regions for DMA when they are backed by a file (including a memfd), so DMA mappings do not depend on VA, which can differ after live update. This allows mediated devices to be supported. Pass the iommufd and vfio device descriptors from old to new QEMU. In new QEMU, during vfio_realize, skip the ioctls that configure the device, because it is already configured. In new QEMU, call ioctl(IOMMU_IOAS_CHANGE_PROCESS) to update mm ownership and locked memory accounting. Patches 4 to 12 are specific to legacy containers. Patches 25 to 41 are specific to iommufd containers.
For v4, could you please send a first "part I" with patches [1-20] ? I think these are reviewed, or nearly, and could be merged quickly. Even if the "Live update: vfio and iommufd" series is not fully reviewed yet, there are good signs that it will before the end of the QEMU 10.1 cycle. The same applies to vfio-user. We need to bring together the proposals changing memory_get_xlat_addr(). It's important as it is blocking both the vfio-user series and yours. This can be done in parallel. Then we can address the iommufd part. Thanks, C.