>-----Original Message-----
>From: Steven Sistare <steven.sist...@oracle.com>
>Subject: Re: [PATCH V3 37/42] vfio/iommufd: reconstruct device
>
>On 5/16/2025 6:22 AM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steve Sistare <steven.sist...@oracle.com>
>>> Subject: [PATCH V3 37/42] vfio/iommufd: reconstruct device
>>>
>>> Reconstruct userland device state after CPR.  During vfio_realize, skip
>>> all ioctls that configure the device, as it was already configured in old
>>> QEMU.
>>>
>>> Save the ioas_id in vmstate, and skip its allocation in vfio_realize.
>>> Because we skip ioctl's, it is not needed at realize time.  However, we do
>>> need the range info, so defer the call to iommufd_cdev_get_info_iova_range
>>> to a post_load handler, at which time the ioas_id is known.
>>>
>>> This reconstruction is not complete.  hwpt_id and devid need special
>>> treatment, handled in subsequent patches.
>>>
>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
>>> ---
>>> hw/vfio/cpr-iommufd.c |  8 ++++++++
>>> hw/vfio/iommufd.c     | 17 +++++++++++++++++
>>> 2 files changed, 25 insertions(+)
>>>
>>> diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
>>> index b760bd3..3d430f0 100644
>>> --- a/hw/vfio/cpr-iommufd.c
>>> +++ b/hw/vfio/cpr-iommufd.c
>>> @@ -31,6 +31,13 @@ static int vfio_container_post_load(void *opaque, int
>>> version_id)
>>>      VFIOIOMMUFDContainer *container = opaque;
>>>      VFIOContainerBase *bcontainer = &container->bcontainer;
>>>      VFIODevice *vbasedev;
>>> +    Error *err = NULL;
>>> +    uint32_t ioas_id = container->ioas_id;
>>> +
>>> +    if (!iommufd_cdev_get_info_iova_range(container, ioas_id, &err)) {
>>> +        error_report_err(err);
>>> +        return -1;
>>> +    }
>>>
>>>      QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
>>>          vbasedev->cpr.reused = false;
>>> @@ -47,6 +54,7 @@ static const VMStateDescription vfio_container_vmstate
>= {
>>>      .post_load = vfio_container_post_load,
>>>      .needed = cpr_needed_for_reuse,
>>>      .fields = (VMStateField[]) {
>>> +        VMSTATE_UINT32(ioas_id, VFIOIOMMUFDContainer),
>>>          VMSTATE_END_OF_LIST()
>>>      }
>>> };
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 046f601..c49a7e7 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -122,6 +122,10 @@ static bool
>>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>>>          goto err_kvm_device_add;
>>>      }
>>>
>>> +    if (vbasedev->cpr.reused) {
>>> +        goto skip_bind;
>>> +    }
>>> +
>>>      /* Bind device to iommufd */
>>>      bind.iommufd = iommufd->fd;
>>>      if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) {
>>> @@ -133,6 +137,8 @@ static bool
>>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>>>      vbasedev->devid = bind.out_devid;
>>>      trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
>>>                                          vbasedev->fd, vbasedev->devid);
>>> +
>>> +skip_bind:
>>>      return true;
>>> err_bind:
>>>      iommufd_cdev_kvm_device_del(vbasedev);
>>> @@ -580,6 +586,11 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>>          }
>>>      }
>>>
>>> +    if (vbasedev->cpr.reused) {
>>> +        ioas_id = -1;           /* ioas_id will be received from vmstate */
>>> +        goto skip_ioas_alloc;
>>> +    }
>>> +
>>>      /* Need to allocate a new dedicated container */
>>>      if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp)) {
>>>          goto err_alloc_ioas;
>>> @@ -587,6 +598,7 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>>
>>>      trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
>>>
>>> +skip_ioas_alloc:
>>>      container =
>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>>      container->be = vbasedev->iommufd;
>>>      container->ioas_id = ioas_id;
>>> @@ -605,6 +617,10 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>>          goto err_discard_disable;
>>>      }
>>>
>>> +    if (vbasedev->cpr.reused) {
>>> +        goto skip_info;
>>
>> I suspect this will break virtio-iommu, see virtio_iommu_set_iommu_device().
>> When virtio-iommu try to get host_iova_ranges, it's not ready until post 
>> load.
>
>Thanks, I'll look into it.
>Can you give me a clue or a pointer on command line options to set this up?

    -device virtio-iommu-pci \
    -device vfio-pci,host=0000:01:00.0 \
    -trace virtio_iommu_host_resv_regions

The vfio device needs to have reserved region, then diff the trace between old 
and new qemu can show us if reserved region is lost in new qemu.

Reply via email to