On 4/25/24 10:46, Duan, Zhenzhong wrote:
Hi Cédric,

-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

Hello Zhenzhong,

On 4/18/24 10:42, Duan, Zhenzhong wrote:
Hi Cédric,

-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

Hello Zhenzhong

On 4/17/24 11:24, Duan, Zhenzhong wrote:


-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

On 4/17/24 06:21, Duan, Zhenzhong wrote:


-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

Hello,

On 4/16/24 09:09, Duan, Zhenzhong wrote:
Hi Cédric,

-----Original Message-----
From: Cédric Le Goater <c...@redhat.com>
Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
compatibility check with host IOMMU cap/ecap

On 4/8/24 10:44, Zhenzhong Duan wrote:
From: Yi Liu <yi.l....@intel.com>

If check fails, the host side device(either vfio or vdpa device)
should
not
be passed to guest.

Implementation details for different backends will be in
following
patches.

Signed-off-by: Yi Liu <yi.l....@intel.com>
Signed-off-by: Yi Sun <yi.y....@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
---
       hw/i386/intel_iommu.c | 35
+++++++++++++++++++++++++++++++++++
       1 file changed, 35 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4f84e2e801..a49b587c73 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
       #include "sysemu/kvm.h"
       #include "sysemu/dma.h"
       #include "sysemu/sysemu.h"
+#include "sysemu/iommufd.h"
       #include "hw/i386/apic_internal.h"
       #include "kvm/kvm_i386.h"
       #include "migration/vmstate.h"
@@ -3819,6 +3820,32 @@ VTDAddressSpace
*vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
           return vtd_dev_as;
       }

+static int vtd_check_legacy_hdev(IntelIOMMUState *s,
+                                 HostIOMMUDevice *hiod,
+                                 Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
+                                  HostIOMMUDevice *hiod,
+                                  Error **errp)
+{
+    return 0;
+}
+
+static int vtd_check_hdev(IntelIOMMUState *s,
VTDHostIOMMUDevice
*vtd_hdev,
+                          Error **errp)
+{
+    HostIOMMUDevice *hiod = vtd_hdev->dev;
+
+    if (object_dynamic_cast(OBJECT(hiod),
TYPE_HIOD_IOMMUFD))
{
+        return vtd_check_iommufd_hdev(s, hiod, errp);
+    }
+
+    return vtd_check_legacy_hdev(s, hiod, errp);
+}


I think we should be using the .get_host_iommu_info() class
handler
instead. Can we refactor the code slightly to avoid this check on
the type ?

There is some difficulty ini avoiding this check, the behavior of
vtd_check_legacy_hdev
and vtd_check_iommufd_hdev are different especially after
nesting
support introduced.
vtd_check_iommufd_hdev() has much wider check over cap/ecap
bits
besides aw_bits.

I think it is important to fully separate the vIOMMU model from the
host IOMMU backing device.

This comment is true for the structures also.

Could we introduce a new HostIOMMUDeviceClass
handler .check_hdev() handler, which would
call .get_host_iommu_info() ?

This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should
be
a common structure 'HostIOMMUDeviceInfo' holding all attributes
for the different backends. Each .get_host_iommu_info() implementation
would translate the specific host iommu device data presentation
into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.

I see, it's just not easy to define the unified elements in
HostIOMMUDeviceInfo
so that they maps to bits or fields in host return IOMMU info.

The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
new
API needs to be completely defined for it. The IOMMU backend
implementation
could be anything, legacy, iommufd, iommufd v2, some other framework
and
the vIOMMU shouldn't be aware of its implementation.

Exposing the kernel structures as done below should be avoided because
they are part of the QEMU <-> kernel IOMMUFD interface.


Different platform returned host IOMMU info is platform specific.
For vtd and siommu:

struct iommu_hw_info_vtd {
          __u32 flags;
          __u32 __reserved;
          __aligned_u64 cap_reg;
          __aligned_u64 ecap_reg;
};

struct iommu_hw_info_arm_smmuv3 {
         __u32 flags;
         __u32 __reserved;
         __u32 idr[6];
         __u32 iidr;
         __u32 aidr;
};

I can think of two kinds of declaration of HostIOMMUDeviceInfo:

struct HostIOMMUDeviceInfo {
      uint8_t aw_bits;
      enum iommu_hw_info_type type;
      union {
          struct iommu_hw_info_vtd vtd;
          struct iommu_hw_info_arm_smmuv3;
          ......
      } data;
}

or

struct HostIOMMUDeviceInfo {
      uint8_t aw_bits;
      enum iommu_hw_info_type type;
      __u32 flags;
      __aligned_u64 cap_reg;
      __aligned_u64 ecap_reg;
      __u32 idr[6];
      __u32 iidr;
      __u32 aidr;
     ......
}

Not clear if any is your expected format.

'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
type attribute and host iommu device type definitions, or as you
suggested with a QOM interface. This is more complex however. In
this case, I would suggest to implement a .compatible() handler to
compare the host iommu device type with the vIOMMU type.

The resulting check_hdev routine would look something like :

static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
*vtd_hdev,
                            Error **errp)
{
      HostIOMMUDevice *hiod = vtd_hdev->dev;
      HostIOMMUDeviceClass *hiodc =
HOST_IOMMU_DEVICE_GET_CLASS(hiod);
      HostIOMMUDevice info;
      int host_aw_bits, ret;

      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
      if (ret) {
          return ret;
      }

      ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
      if (ret) {
          return ret;
      }

      if (s->aw_bits > info.aw_bits) {
          error_setg(errp, "aw-bits %d > host aw-bits %d",
                     s->aw_bits, info.aw_bits);
          return -EINVAL;
      }
}

and the HostIOMMUDeviceClass::is_compatible() handler would call a
vIOMMUInterface::compatible() handler simply returning
IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?

Not quite get what HostIOMMUDeviceClass::is_compatible() does.

HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
to determine which IOMMU types are exposed by the host, then calls the
vIOMMUInterface::compatible() handler to do the compare. API is to be
defined.

As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
interface capabilities, or features, to check more precisely the level
of compatibility between the vIOMMU and the host IOMMU device. This is
similar to what is done between QEMU and KVM.

If you think this is too complex, include type in HostIOMMUDeviceInfo.

Currently legacy and IOMMUFD host device has different check logic, how
it can help
in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into
a single vtd_check_hdev()?

IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation,
but
if you think the Intel vIOMMU should access directly the iommufd backend
when available, then we should drop this proposal and revisit the design
to take a different approach.

I implemented a draft following your suggestions so we could explore further.
See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3_tmp

In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
just like KVM CAPs.
A common HostIOMMUDeviceCaps structure is introduced to be used by
both legacy and iommufd backend.

It indeed is cleaner. Only problem is I failed to implement .compatible()
as all the check could go ahead by just calling check_cap().
Could you help a quick check to see if I misunderstood any of your suggestion?

Thanks for the changes. It looks cleaner and simpler ! Some comments,

* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't
  remember if you told me already you had plans for future changes.
  Sorry about that if this is the case. I forgot.

* I would use the 'host_iommu_device_' prefix for external routines
  which are part of the HostIOMMUDevice API and use 'hiod_' for
  internal routines where it makes sense, to limit the name length for
  instance.

* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to
  HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a
  theoretical example of a different IOMMU interface. I don't think we
  need to anticipate yet :)

* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from
  'linux/iommufd.h', that's not my preferred choice but I won't
  object. The result looks good.

* HostIOMMUDevice now has a realize() routine to query the host IOMMU
  capability for later usage. This is a good idea. However, you could
  change the return value to bool and avoid the ERRP_GUARD() prologue.

* Beware of :
struct Range {
        /*
         * Do not access members directly, use the functions!
         * A non-empty range has @lob <= @upb.
         * An empty range has @lob == @upb + 1.
         */
        uint64_t lob;        /* inclusive lower bound */
        uint64_t upb;        /* inclusive upper bound */
    };
* I think you could introduce a new VFIOIOMMUClass attribute. Let's
  call it VFIOIOMMUClass::hiod_typename. The creation of HostIOMMUDevice
  would become generic and you could move :

    hiod= HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_VFIO));
    HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
    if (*errp) {
        object_unref(hiod);
        return -EINVAL;
    }
    vbasedev->hiod = hiod;

  at the end of vfio_attach_device().


Thanks,

C.



Reply via email to