Hi Shameer, On 11/20/25 2:22 PM, Shameer Kolothum wrote: > From: Yi Liu <[email protected]> > > If user wants to expose PASID capability in vIOMMU, then VFIO would also > need to report the PASID cap for this device if the underlying hardware > supports it as well. > > As a start, this chooses to put the vPASID cap in the last 8 bytes of the > vconfig space. This is a choice in the good hope of no conflict with any > existing cap or hidden registers. For the devices that has hidden registers, > user should figure out a proper offset for the vPASID cap. This may require > an option for user to config it. Here we leave it as a future extension. > There are more discussions on the mechanism of finding the proper offset. > > https://lore.kernel.org/kvm/bn9pr11mb5276318969a212ad0649c7be8c...@bn9pr11mb5276.namprd11.prod.outlook.com/ > > Since we add a check to ensure the vIOMMU supports PASID, only devices > under those vIOMMUs can synthesize the vPASID capability. This gives > users control over which devices expose vPASID. > > Signed-off-by: Yi Liu <[email protected]> > Tested-by: Zhangfei Gao <[email protected]> > Reviewed-by: Jonathan Cameron <[email protected]> > Signed-off-by: Shameer Kolothum <[email protected]> > --- > hw/vfio/pci.c | 38 ++++++++++++++++++++++++++++++++++++++ > include/hw/iommu.h | 1 + > 2 files changed, 39 insertions(+) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 8b8bc5a421..e11e39d667 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -24,6 +24,7 @@ > #include <sys/ioctl.h> > > #include "hw/hw.h" > +#include "hw/iommu.h" > #include "hw/pci/msi.h" > #include "hw/pci/msix.h" > #include "hw/pci/pci_bridge.h" > @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, > uint16_t pos) > > static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > { > + HostIOMMUDevice *hiod = vdev->vbasedev.hiod; > + HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod); > PCIDevice *pdev = PCI_DEVICE(vdev); > + uint64_t max_pasid_log2 = 0; > + bool pasid_cap_added = false; > + uint64_t hw_caps; > uint32_t header; > uint16_t cap_id, next, size; > uint8_t cap_ver; > @@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > pcie_add_capability(pdev, cap_id, cap_ver, next, size); > } > break; > + /* > + * VFIO kernel does not expose the PASID CAP today. We may synthesize > + * one later through IOMMUFD APIs. If VFIO ever starts exposing it, > + * record its presence here so we do not create a duplicate CAP. > + */ > + case PCI_EXT_CAP_ID_PASID: > + pasid_cap_added = true; > + /* fallthrough */ > default: > pcie_add_capability(pdev, cap_id, cap_ver, next, size); > } > > } > > +#ifdef CONFIG_IOMMUFD > + /* Try to retrieve PASID CAP through IOMMUFD APIs */ > + if (!pasid_cap_added && hiodc && hiodc->get_cap) { > + hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, > NULL); > + hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2, > + &max_pasid_log2, NULL); > + } > + > + /* > + * If supported, adds the PASID capability in the end of the PCIe config > + * space. TODO: Add option for enabling pasid at a safe offset. > + */ > + if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) & > + VIOMMU_FLAG_PASID_SUPPORTED)) { > + bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC); > + bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV); > + > + pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE - > PCI_EXT_CAP_PASID_SIZEOF, > + max_pasid_log2, exec_perm, priv_mod); > + /* PASID capability is fully emulated by QEMU */ > + memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8); > + } > +#endif > + > /* Cleanup chain head ID if necessary */ > if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) { > pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0); > diff --git a/include/hw/iommu.h b/include/hw/iommu.h > index 9b8bb94fc2..9635770bee 100644 > --- a/include/hw/iommu.h > +++ b/include/hw/iommu.h > @@ -20,6 +20,7 @@ > enum viommu_flags { > /* vIOMMU needs nesting parent HWPT to create nested HWPT */ > VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0), > + VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1), > }; > > #endif /* HW_IOMMU_H */ Besides the fact the offset is arbitrarily chosen so that this is the last cap of the vconfig space, the code looks good to me. So Reviewed-by: Eric Auger <[email protected]>
Just wondering whether we couldn't add some generic pcie code that parses the extended cap linked list to check the offset range is not used by another cap before allowing the insertion at a given offset? This wouldn't prevent a subsequent addition from failing but at least we would know if there is some collision.this could be added later on though. Thanks Eric
