>-----Original Message----- >From: Steven Sistare <steven.sist...@oracle.com> >Subject: Re: [PATCH V3 37/42] vfio/iommufd: reconstruct device > >On 5/16/2025 6:22 AM, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Steve Sistare <steven.sist...@oracle.com> >>> Subject: [PATCH V3 37/42] vfio/iommufd: reconstruct device >>> >>> Reconstruct userland device state after CPR. During vfio_realize, skip >>> all ioctls that configure the device, as it was already configured in old >>> QEMU. >>> >>> Save the ioas_id in vmstate, and skip its allocation in vfio_realize. >>> Because we skip ioctl's, it is not needed at realize time. However, we do >>> need the range info, so defer the call to iommufd_cdev_get_info_iova_range >>> to a post_load handler, at which time the ioas_id is known. >>> >>> This reconstruction is not complete. hwpt_id and devid need special >>> treatment, handled in subsequent patches. >>> >>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>> --- >>> hw/vfio/cpr-iommufd.c | 8 ++++++++ >>> hw/vfio/iommufd.c | 17 +++++++++++++++++ >>> 2 files changed, 25 insertions(+) >>> >>> diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c >>> index b760bd3..3d430f0 100644 >>> --- a/hw/vfio/cpr-iommufd.c >>> +++ b/hw/vfio/cpr-iommufd.c >>> @@ -31,6 +31,13 @@ static int vfio_container_post_load(void *opaque, int >>> version_id) >>> VFIOIOMMUFDContainer *container = opaque; >>> VFIOContainerBase *bcontainer = &container->bcontainer; >>> VFIODevice *vbasedev; >>> + Error *err = NULL; >>> + uint32_t ioas_id = container->ioas_id; >>> + >>> + if (!iommufd_cdev_get_info_iova_range(container, ioas_id, &err)) { >>> + error_report_err(err); >>> + return -1; >>> + } >>> >>> QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) { >>> vbasedev->cpr.reused = false; >>> @@ -47,6 +54,7 @@ static const VMStateDescription vfio_container_vmstate >= { >>> .post_load = vfio_container_post_load, >>> .needed = cpr_needed_for_reuse, >>> .fields = (VMStateField[]) { >>> + VMSTATE_UINT32(ioas_id, VFIOIOMMUFDContainer), >>> VMSTATE_END_OF_LIST() >>> } >>> }; >>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>> index 046f601..c49a7e7 100644 >>> --- a/hw/vfio/iommufd.c >>> +++ b/hw/vfio/iommufd.c >>> @@ -122,6 +122,10 @@ static bool >>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp) >>> goto err_kvm_device_add; >>> } >>> >>> + if (vbasedev->cpr.reused) { >>> + goto skip_bind; >>> + } >>> + >>> /* Bind device to iommufd */ >>> bind.iommufd = iommufd->fd; >>> if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) { >>> @@ -133,6 +137,8 @@ static bool >>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp) >>> vbasedev->devid = bind.out_devid; >>> trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name, >>> vbasedev->fd, vbasedev->devid); >>> + >>> +skip_bind: >>> return true; >>> err_bind: >>> iommufd_cdev_kvm_device_del(vbasedev); >>> @@ -580,6 +586,11 @@ static bool iommufd_cdev_attach(const char *name, >>> VFIODevice *vbasedev, >>> } >>> } >>> >>> + if (vbasedev->cpr.reused) { >>> + ioas_id = -1; /* ioas_id will be received from vmstate */ >>> + goto skip_ioas_alloc; >>> + } >>> + >>> /* Need to allocate a new dedicated container */ >>> if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp)) { >>> goto err_alloc_ioas; >>> @@ -587,6 +598,7 @@ static bool iommufd_cdev_attach(const char *name, >>> VFIODevice *vbasedev, >>> >>> trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id); >>> >>> +skip_ioas_alloc: >>> container = >>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); >>> container->be = vbasedev->iommufd; >>> container->ioas_id = ioas_id; >>> @@ -605,6 +617,10 @@ static bool iommufd_cdev_attach(const char *name, >>> VFIODevice *vbasedev, >>> goto err_discard_disable; >>> } >>> >>> + if (vbasedev->cpr.reused) { >>> + goto skip_info; >> >> I suspect this will break virtio-iommu, see virtio_iommu_set_iommu_device(). >> When virtio-iommu try to get host_iova_ranges, it's not ready until post >> load. > >Thanks, I'll look into it. >Can you give me a clue or a pointer on command line options to set this up?
-device virtio-iommu-pci \ -device vfio-pci,host=0000:01:00.0 \ -trace virtio_iommu_host_resv_regions The vfio device needs to have reserved region, then diff the trace between old and new qemu can show us if reserved region is lost in new qemu.