Hi Yi, On 8/27/25 2:30 PM, Yi Liu wrote: > Hi Eric, > > On 2025/8/27 19:22, Eric Auger wrote: >> Hi >> >> On 8/27/25 1:13 PM, Yi Liu wrote: >>> On 2025/8/22 14:40, Zhenzhong Duan wrote: >>>> Introduce a new PCIIOMMUOps optional callback, get_viommu_cap() which >>>> allows to retrieve capabilities exposed by a vIOMMU. The first planned >>>> vIOMMU device capability is VIOMMU_CAP_HW_NESTED that advertises the >>>> support of HW nested stage translation scheme. >>>> pci_device_get_viommu_cap >>>> is a wrapper that can be called on a PCI device potentially >>>> protected by >>>> a vIOMMU. >>>> >>>> get_viommu_cap() is designed to return 64bit bitmap of purely emulated >>>> capabilities which are only determined by user's configuration, no >>>> host >>>> capabilities involved. Reasons are: >>>> >>>> 1. host may has heterogeneous IOMMUs, each with different capabilities >>>> 2. this is migration friendly, return value is consistent between >>>> source >>>> and target. >>>> 3. host IOMMU capabilities are passed to vIOMMU through >>>> set_iommu_device() >>>> interface which have to be after attach_device(), when >>>> get_viommu_cap() >>>> is called in attach_device(), there is no way for vIOMMU to >>>> get host >>>> IOMMU capabilities yet, so only emulated capabilities can be >>>> returned. >>>> See below sequence: >>>> >>>> vfio_device_attach(): >>>> iommufd_cdev_attach(): >>>> pci_device_get_viommu_cap() for HW nesting cap >>>> create a nesting parent hwpt >>>> attach device to the hwpt >>>> vfio_device_hiod_create_and_realize() creating hiod >>>> ... >>>> pci_device_set_iommu_device(hiod) >>> >>>> Suggested-by: Yi Liu <yi.l....@intel.com> >>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>> --- >>>> MAINTAINERS | 1 + >>>> include/hw/iommu.h | 19 +++++++++++++++++++ >>>> include/hw/pci/pci.h | 25 +++++++++++++++++++++++++ >>>> hw/pci/pci.c | 11 +++++++++++ >>>> 4 files changed, 56 insertions(+) >>>> create mode 100644 include/hw/iommu.h >>>> >>>> diff --git a/MAINTAINERS b/MAINTAINERS >>>> index a07086ed76..54fb878128 100644 >>>> --- a/MAINTAINERS >>>> +++ b/MAINTAINERS >>>> @@ -2305,6 +2305,7 @@ F: include/system/iommufd.h >>>> F: backends/host_iommu_device.c >>>> F: include/system/host_iommu_device.h >>>> F: include/qemu/chardev_open.h >>>> +F: include/hw/iommu.h >>>> F: util/chardev_open.c >>>> F: docs/devel/vfio-iommufd.rst >>>> diff --git a/include/hw/iommu.h b/include/hw/iommu.h >>>> new file mode 100644 >>>> index 0000000000..7dd0c11b16 >>>> --- /dev/null >>>> +++ b/include/hw/iommu.h >>>> @@ -0,0 +1,19 @@ >>>> +/* >>>> + * General vIOMMU capabilities, flags, etc >>>> + * >>>> + * Copyright (C) 2025 Intel Corporation. >>>> + * >>>> + * SPDX-License-Identifier: GPL-2.0-or-later >>>> + */ >>>> + >>>> +#ifndef HW_IOMMU_H >>>> +#define HW_IOMMU_H >>>> + >>>> +#include "qemu/bitops.h" >>>> + >>>> +enum { >>>> + /* hardware nested stage-1 page table support */ >>>> + VIOMMU_CAP_HW_NESTED = BIT_ULL(0), >>> >>> This naming is a bit confusing. get_viommu_cap indicates it will return >>> the viommu's capability while this naming is HW_NESTED. It's conflict >>> with the commit message which claims only emulated capability will be >>> returned. >> >> it actually means the viommu has the code to handle HW nested case, >> independently on the actual HW support. >> maybe remove the "emulation" wording. > > yeah, I know the meaning and the purpose here. Just not quite satisfied > with the naming. > >> >> Otherwise we may also use the virtio has_feature naming? > > has_feature seems better. Looks to ask if vIOMMU has something and then > do something. > >> >>> >>> TBH. I'm hesitating to name it as get_viommu_cap. The scope is a little >>> larger than what we want so far. So I'm wondering if it can be done >>> in a >>> more straightforward way. e.g. just a bool op named >>> iommu_nested_wanted(). Just an example, maybe better naming. We can >>> extend the op to be returning a u64 value in the future when we see >>> another request on VFIO from vIOMMU. >> personnally I am fine with the bitmask which looks more future proof. > > not quite sure if there is another info that needs to be checked in > this "VFIO asks vIOMMU" manner. Have you seen one beside this > nested hwpt requirement by vIOMMU?
I don't remember any at this point. But I guess with ARM CCA device passthrough we might have other needs Eric > > Regards, > Yi Liu >