On 5/20/2025 5:16 AM, Duan, Zhenzhong wrote:
-----Original Message-----
From: Steven Sistare <steven.sist...@oracle.com>
Subject: Re: [PATCH V3 39/42] vfio/iommufd: reconstruct hwpt

On 5/18/2025 11:25 PM, Duan, Zhenzhong wrote:
-----Original Message-----
From: Steve Sistare <steven.sist...@oracle.com>
Subject: [PATCH V3 39/42] vfio/iommufd: reconstruct hwpt

Save the hwpt_id in vmstate.  In realize, skip its allocation from
iommufd_cdev_attach -> iommufd_cdev_attach_container ->
iommufd_cdev_autodomains_get.

Rebuild userland structures to hold hwpt_id by calling
iommufd_cdev_rebuild_hwpt at post load time.  This depends on hw_caps,
which
was restored by the post_load call to vfio_device_hiod_create_and_realize.

Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
---
hw/vfio/cpr-iommufd.c      |  7 +++++++
hw/vfio/iommufd.c          | 24 ++++++++++++++++++++++--
hw/vfio/trace-events       |  1 +
hw/vfio/vfio-iommufd.h     |  3 +++
include/hw/vfio/vfio-cpr.h |  1 +
5 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
index 24cdf10..6d3f4e0 100644
--- a/hw/vfio/cpr-iommufd.c
+++ b/hw/vfio/cpr-iommufd.c
@@ -110,6 +110,12 @@ static int vfio_device_post_load(void *opaque, int
version_id)
          error_report_err(err);
          return false;
      }
+    if (!vbasedev->mdev) {
+        VFIOIOMMUFDContainer *container = container_of(vbasedev-
bcontainer,
+                                                       VFIOIOMMUFDContainer,
+                                                       bcontainer);
+        iommufd_cdev_rebuild_hwpt(vbasedev, container);
+    }
      return true;
}

@@ -121,6 +127,7 @@ static const VMStateDescription vfio_device_vmstate
= {
      .needed = cpr_needed_for_reuse,
      .fields = (VMStateField[]) {
          VMSTATE_INT32(devid, VFIODevice),
+        VMSTATE_UINT32(cpr.hwpt_id, VFIODevice),
          VMSTATE_END_OF_LIST()
      }
};
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index d980684..ec79c83 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -318,6 +318,7 @@ static bool
iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
static void iommufd_cdev_use_hwpt(VFIODevice *vbasedev, VFIOIOASHwpt
*hwpt)
{
      vbasedev->hwpt = hwpt;
+    vbasedev->cpr.hwpt_id = hwpt->hwpt_id;
      vbasedev->iommu_dirty_tracking = iommufd_hwpt_dirty_tracking(hwpt);
      QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
}
@@ -373,6 +374,23 @@ static bool iommufd_cdev_make_hwpt(VFIODevice
*vbasedev,
      return true;
}

+void iommufd_cdev_rebuild_hwpt(VFIODevice *vbasedev,
+                               VFIOIOMMUFDContainer *container)
+{
+    VFIOIOASHwpt *hwpt;
+    int hwpt_id = vbasedev->cpr.hwpt_id;
+
+    trace_iommufd_cdev_rebuild_hwpt(container->be->fd, hwpt_id);
+
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        if (hwpt->hwpt_id == hwpt_id) {
+            iommufd_cdev_use_hwpt(vbasedev, hwpt);
+            return;
+        }
+    }
+    iommufd_cdev_make_hwpt(vbasedev, container, hwpt_id, false, NULL);
+}
+
static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
                                           VFIOIOMMUFDContainer *container,
                                           Error **errp)
@@ -567,7 +585,8 @@ static bool iommufd_cdev_attach(const char *name,
VFIODevice *vbasedev,
              vbasedev->iommufd != container->be) {
              continue;
          }
-        if (!iommufd_cdev_attach_container(vbasedev, container, &err)) {
+        if (!vbasedev->cpr.reused &&
+            !iommufd_cdev_attach_container(vbasedev, container, &err)) {
              const char *msg = error_get_pretty(err);

              trace_iommufd_cdev_fail_attach_existing_container(msg);
@@ -605,7 +624,8 @@ skip_ioas_alloc:
      bcontainer = &container->bcontainer;
      vfio_address_space_insert(space, bcontainer);

-    if (!iommufd_cdev_attach_container(vbasedev, container, errp)) {
+    if (!vbasedev->cpr.reused &&
+        !iommufd_cdev_attach_container(vbasedev, container, errp)) {

All container attaching is bypassed in new qemu. I have a concern that new
qemu doesn't generate same containers as old qemu if there are more than one
container in old qemu.
Then there can be devices attached to wrong container or attaching fail in post
load.

Yes, this relates to our discussion in patch 35.  Please explain, how can a 
single
iommufd backend have multiple containers?

Similar as legacy container, there can be multiple containers in one address 
space.
If existing mapping in one container conflicts with new device's reserved 
region,
Attaching to that container will fail and a new container need to be created to 
accept new device's reserved region.

Maybe you need to do same thing just like you do for legacy container, e.g., 
saving  ioas_id just like you saving container->fd, then checking existing 
ioas_id and restore iommufd container based on that.

Thanks, now I understand.
iommufd_cdev_attach calls
  iommufd_cdev_attach_container -> 
iommufd_cdev_attach_ioas_hwpt(container->ioas_id)
until it finds a container that works, or creates a new container with a new 
ioas_id.

To fix, I need to record each device's ioas_id in cpr-state, so it is available 
when
vfio_realize -> iommufd_cdev_attach is called.  Saving it in vmstate as I do 
now and
recovering it in a post_load handler is too late.

- Steve


Reply via email to