Summary This series RFC would like to resume the discussion about how to introduce the live migration capability to vfio mdev device.
A new subtype region VFIO_REGION_SUBTYPE_DEVICE_STATE is introduced for vfio device status migrate, during the initialization it will check if the region is supported by the vfio device, otherwise it will remain non-migratable. The intention to add the new region is using it for mdev device status save and restore during the migration. The access to this region will be trapped and forward to the mdev device driver, it also uses the first byte in the new region to control the running state of mdev device, so during the migration after stop the mdev driver, qemu could retrieve the specific device status from this region and transfer to the target VM side for the mdev device restore. In addition, during the pre-copy period, it will be able to fetch the dirty bitmap of vfio device through ioctl VFIO_DEVICE_GET_DIRTY_BITMAP iteratively, which will be able to shorten the system downtime during the static copy. Below is the vfio mdev device migration sequence Source VM side: start migration | V in pre-copy stage, fetch the device dirty bitmap and add into qemu dirty list for migrate iteratively. | V get the cpu state change callback, write to the subregion's first byte to stop the mdev device | V quary the dirty page bitmap from iommu container and add into qemu dirty list for last synchronization | V save the deivce status into Qemufile which is read from the vfio device subregion Target VM side: restore the mdev device after get the saved status context from Qemufile | V get the cpu state change callback write to subregion's first byte to start the mdev device to put it in running status | V finish migration V3->V4: 1. add migration_blocker if device state region isnot supported. 2. instead of using vmsd, register SaveVMHandlers for VFIO device to leverage the pro-copy facility, and add new ioctl for VFIO device to fetch dirty bitmap during pro-copy. 3. remove the intel vendor ID dependence for the device state subregion. V2->V3: 1. rebase the patch to Qemu stable 2.10 branch. 2. use a common name for the subregion instead of specific for intel IGD. V1->V2: Per Alex's suggestion: 1. use device subtype region instead of VFIO PCI fixed region. 2. remove unnecessary ioctl, use the first byte of subregion to control the running state of mdev device. 3. for dirty page synchronization, implement the interface with VFIOContainer instead of vfio pci device. Yulei Zhang (4): vfio: introduce a new VFIO subregion for mdev device migration support vfio: Add vm status change callback to stop/restart the mdev device vfio: Add SaveVMHanlders for VFIO device to support live migration vifo: introduce new VFIO ioctl VFIO_IOMMU_GET_DIRTY_BITMAP hw/vfio/common.c | 34 ++++++ hw/vfio/pci.c | 240 ++++++++++++++++++++++++++++++++++++++++-- hw/vfio/pci.h | 2 + include/hw/vfio/vfio-common.h | 1 + linux-headers/linux/vfio.h | 43 +++++++- roms/seabios | 2 +- 6 files changed, 312 insertions(+), 10 deletions(-) -- 2.7.4