Hey, Presented herewith a series based on the basic VFIO migration protocol v2 implementation [1].
It is split from its parent series[5] to solely focus on device dirty page tracking. Device dirty page tracking allows the VFIO device to record its DMAs and report them back when needed. This is part of VFIO migration and is used during pre-copy phase of migration to track the RAM pages that the device has written to and mark those pages dirty, so they can later be re-sent to target. Device dirty page tracking uses the DMA logging uAPI to discover device capabilities, to start and stop tracking, and to get dirty page bitmap report. Extra details and uAPI definition can be found here [3]. Device dirty page tracking operates in VFIOContainer scope. I.e., When dirty tracking is started, stopped or dirty page report is queried, all devices within a VFIOContainer are iterated and for each of them device dirty page tracking is started, stopped or dirty page report is queried, respectively. Device dirty page tracking is used only if all devices within a VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is used, and if that is not supported as well, memory is perpetually marked dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW support, the last two usually have the same effect of perpetually marking all pages dirty. Normally, when asked to start dirty tracking, all the currently DMA mapped ranges are tracked by device dirty page tracking. If using a vIOMMU we block live migration. It's temporary and a separate series is going to add support for it. Thus this series focus on getting the ground work first. The series is organized as follows: - Patches 1-7: Fix bugs and do some preparatory work required prior to adding device dirty page tracking. - Patches 8-11: Implement device dirty page tracking. - Patch 12: Blocks live migration with vIOMMU. - Patches 13-14 Detect device dirty page tracking and document it. Comments, improvements as usual appreciated. Thanks, Joao Changes from v3 [6]: - Added Rbs in patches 4,5,6, 13,14; (Did not add the other because they suffered a lot of changes) - Fix the unblocker of live migration by moving the vfio_unblock_giommu_migration into vfio_instance_finalize() - Refactor/Simplify the test for vIOMMU enabled (patch 12) - Change the style of how we set features::flags (patch 9, 11) - Return -ENOMEM in vfio_bitmap_alloc(), and change callsites to return ret instead of errno (patch 4) - Remove iova-tree includes - Initialize range min{32,64} to UINT{32,64}_MAX to better calculate the minimum range without assumptions. - Add commentary into why we unregister the memory listener - Add commentary about the dual-split of ranges - Removed the mutex because the memory listener is all serialized - Move out the vfio_section_get_iova_range() into its own patch and make vfio_listener_region_add() use it too. - Add a VFIODirtyRanges struct which is allocated from the stack as opposed to being stored in the container and make the listener be registered with it. - Remove stale paragraph from commit message (patch 8) - Unroll vfio_device_dma_logging_set() to its own code in start() which fails early and returns, and stop() which is void and we never return early. Changes from v2 [5]: - Split initial dirty page tracking support from the parent series to split into smaller parts. - Replace an IOVATree with a simple two range setup: one range for 32-bit another one for 64-bit address space. After discussions it was sorted out this way due to unnecessary complexity of IOVAtree while being more efficient too without stressing so much of the UAPI limits. (patch 7 and 8) - For now exclude vIOMMU, and so add a live migration blocker if a vIOMMU is passed in. This will be followed up with vIOMMU support in a separate series. (patch 10) - Add new patches to reuse most helpers used across memory listeners. This is useful for reusal when recording DMA ranges. (patch 5 and 6) - Adjust Documentation to avoid mentioning the vIOMMU and instead claim that vIOMMU with device dirty page tracking is blocked. Cedric gave a Rb, but I've dropped taking into consideration the split and no vIOMMU support (patch 13) - Improve VFIOBitmap to avoid allocating a 16byte structure to place it on the stack. Remove the free helper function. (patch 4) - Fixing the compilation issues (patch 8 and 10). Possibly not 100% addressed as I am still working out the env to repro it. Changes from v1 [4]: - Rebased on latest master branch. As part of it, made some changes in pre-copy to adjust it to Juan's new patches: 1. Added a new patch that passes threshold_size parameter to .state_pending_{estimate,exact}() handlers. 2. Added a new patch that refactors vfio_save_block(). 3. Changed the pre-copy patch to cache and report pending pre-copy size in the .state_pending_estimate() handler. - Removed unnecessary P2P code. This should be added later on when P2P support is added. (Alex) - Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap() (patch #11). (Alex) - Stored vfio_devices_all_device_dirty_tracking()'s value in a local variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11). - Refactored the viommu device dirty tracking ranges creation code to make it clearer (patch #15). - Changed overflow check in vfio_iommu_range_is_device_tracked() to emphasize that we specifically check for 2^64 wrap around (patch #15). - Added R-bs / Acks. [1] https://lore.kernel.org/qemu-devel/167658846945.932837.1420176491103357684.stgit@omen/ [2] https://lore.kernel.org/kvm/20221206083438.37807-3-yish...@nvidia.com/ [3] https://lore.kernel.org/netdev/20220908183448.195262-4-yish...@nvidia.com/ [4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avih...@nvidia.com/ [5] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avih...@nvidia.com/ [6] https://lore.kernel.org/qemu-devel/20230304014343.33646-1-joao.m.mart...@oracle.com/ Avihai Horon (6): vfio/common: Fix error reporting in vfio_get_dirty_bitmap() vfio/common: Fix wrong %m usages vfio/common: Abort migration if dirty log start/stop/sync fails vfio/common: Add VFIOBitmap and alloc function vfio/common: Extract code from vfio_get_dirty_bitmap() to new function docs/devel: Document VFIO device dirty page tracking Joao Martins (8): vfio/common: Add helper to validate iova/end against hostwin vfio/common: Consolidate skip/invalid section into helper vfio/common: Add helper to consolidate iova/end calculation vfio/common: Record DMA mapped IOVA ranges vfio/common: Add device dirty page tracking start/stop vfio/common: Add device dirty page bitmap sync vfio/migration: Block migration with vIOMMU vfio/migration: Query device dirty page tracking support docs/devel/vfio-migration.rst | 46 ++- hw/vfio/common.c | 685 ++++++++++++++++++++++++++++------ hw/vfio/migration.c | 20 + hw/vfio/pci.c | 1 + hw/vfio/trace-events | 2 + include/hw/vfio/vfio-common.h | 17 + 6 files changed, 634 insertions(+), 137 deletions(-) -- 2.17.2