Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hello Shameer, On 1/2/26 16:35, Shameer Kolothum wrote: Hi Cédric, -Original Message- From: Cédric Le Goater Sent: 15 December 2025 10:55 To: Shameer Kolothum ; qemu- [email protected]; [email protected] Cc: [email protected]; [email protected]; Jason Gunthorpe ; Nicolin Chen ; [email protected]; [email protected]; Nathan Chen ; Matt Ochs ; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Krishnakant Jaju Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM External email: Use caution opening links or attachments On 11/20/25 14:22, Shameer Kolothum wrote: From: Yi Liu If user wants to expose PASID capability in vIOMMU, then VFIO would also need to report the PASID cap for this device if the underlying hardware supports it as well. As a start, this chooses to put the vPASID cap in the last 8 bytes of the vconfig space. This is a choice in the good hope of no conflict with any existing cap or hidden registers. For the devices that has hidden registers, user should figure out a proper offset for the vPASID cap. This may require an option for user to config it. Here we leave it as a future extension. There are more discussions on the mechanism of finding the proper offset. https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2 @BN9PR11MB5276.namprd11.prod.outlook.com/ Since we add a check to ensure the vIOMMU supports PASID, only devices under those vIOMMUs can synthesize the vPASID capability. This gives users control over which devices expose vPASID. Signed-off-by: Yi Liu Tested-by: Zhangfei Gao Reviewed-by: Jonathan Cameron Signed-off-by: Shameer Kolothum --- hw/vfio/pci.c | 38 ++ include/hw/iommu.h | 1 + 2 files changed, 39 insertions(+) I just noticed another problem with this change. It relies on the availability of the HostIOMMUDevice which doesn't exist with VFIO mdev devices, such as vGPU. QEMU simply coredumps :/ We will have to check/protect QEMU in some ways. I need to take a closer look because mdev handling seems to be spread across the code and may need to be improved first. I did attempt a rework on this patch and the previous one(patch #31) to address the above issue and to avoid the #ifdef CONFIG_IOMMUFD in vfio. Please find below: Patch #1: This adds get_pasid_info to HostIOMMUDeviceClass. One thing I am not sure, below is to use #ifdef CONFIG_LINUX or not. Please take a look and let me know if this is the right direction or not. I don't think CONFIG_LINUX is needed there because the declarations are not specific to linux. A simple way to try a windows build is with : --cross-prefix=x86_64-w64-mingw32- you might need to add : --disable-sdl and targets should be aarch64-softmmu,ppc64-softmmu,x86_64-softmmu,s390x-softmmu You should resend a v7, in whole or in parts, as you wish. Thanks, C. From e1305b0d44b2002778059decc3d6b220414b0589 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Fri, 2 Jan 2026 14:50:58 + Subject: [PATCH 1/2] backends/iommufd: Add get_pasid_info TODO: Signed-off-by: Shameer Kolothum --- backends/iommufd.c | 17 + include/system/host_iommu_device.h | 19 +++ 2 files changed, 36 insertions(+) diff --git a/backends/iommufd.c b/backends/iommufd.c index 2c9ce1a03a..7beff372ba 100644 --- a/backends/iommufd.c +++ b/backends/iommufd.c @@ -634,11 +634,28 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp) } } +static bool hiod_iommufd_get_pasid_info(HostIOMMUDevice *hiod, +HostIOMMUDevicePasidInfo *pasid_info) +{ +HostIOMMUDeviceCaps *caps = &hiod->caps; + +if (!caps->max_pasid_log2) { +return false; +} + +g_assert(pasid_info); +pasid_info->exec_perm = (caps->hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC); +pasid_info->priv_mod = (caps->hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV); +pasid_info->max_pasid_log2 = caps->max_pasid_log2; +return true; +} + static void hiod_iommufd_class_init(ObjectClass *oc, const void *data) { HostIOMMUDeviceClass *hioc = HOST_IOMMU_DEVICE_CLASS(oc); hioc->get_cap = hiod_iommufd_get_cap; +hioc->get_pasid_info = hiod_iommufd_get_pasid_info; }; static const TypeInfo types[] = { diff --git a/include/system/host_iommu_device.h b/include/system/host_iommu_device.h index bfb2b60478..6e62f643fe 100644 --- a/include/system/host_iommu_device.h +++ b/include/system/host_iommu_device.h @@ -22,6 +22,13 @@ typedef union VendorCaps { struct iommu_hw_info_arm_smmuv3 smmuv3; } VendorCaps; + +typedef struct HostIOMMUDevicePasidInfo { +bool exec_perm; +bool priv_mod; +uint64_t max_pasid_log2; +} HostIOMMUDevicePasidInfo; + /**
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
On Tue, 6 Jan 2026 14:22:57 +0100 Eric Auger wrote: > On 1/6/26 12:38 PM, Shameer Kolothum wrote: > > Hi Eric, > > > >> -Original Message- > >> From: Eric Auger > >> Sent: 06 January 2026 10:55 > >> To: Shameer Kolothum ; Yi Liu > >> ; [email protected]; [email protected] > >> Cc: [email protected]; Jason Gunthorpe ; Nicolin > >> Chen ; [email protected]; [email protected]; > >> Nathan Chen ; Matt Ochs ; > >> [email protected]; [email protected]; > >> [email protected]; [email protected]; > >> [email protected]; [email protected]; Krishnakant Jaju > >> ; [email protected] > >> Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM > >> > >> External email: Use caution opening links or attachments > >> > >> > >> Hi Shameer, > > [...] > > > >>>>> Besides the fact the offset is arbitrarily chosen so that this is the > >>>>> last cap of the vconfig space, the code looks good to me. > >>>>> So > >>>>> Reviewed-by: Eric Auger > >>>>> > >>>>> Just wondering whether we couldn't add some generic pcie code that > >>>>> parses the extended cap linked list to check the offset range is not > >>>>> used by another cap before allowing the insertion at a given offset? > >>>>> This wouldn't prevent a subsequent addition from failing but at least we > >>>>> would know if there is some collision.this could be added later on > >>>>> though. > >>>>> > >>>> You're absolutely right. My approach of using the last 8 bytes was a > >>>> shortcut to avoid implementing proper capability parsing logic > >>>> (importing pci_regs.h and maintaining a cap_id-to-cap_size mapping > >>>> table), and it simplified PASID capability detection by only examining > >>>> the last 8bytes by a simple dump :(. However, this approach is not > >>>> good as we cannot guarantee that the last 8bytes are unused by any > >>>> device. > >>>> > >>>> Let's just implement the logic to walk the linked list of ext_caps to > >>>> find an appropriate offset for our use case. > >>> I had a go at this. Based on my understanding, even if we walk the PCIe > >>> extended capability linked list, we still can't easily determine the size > >>> occupied by the last capability as the extended capability header does not > >>> encode a length, it only provides the "next" pointer, and for the last > >>> entry > >>> next == 0. > >> If my understanding is correct when walking the linked list, you can > >> enumerate the start index and the PCIe extended Capability variable size > >> which is made of fix header size + register block variable size which > >> depends on the capability ID). After that we shall be able to allocate a > >> slot within holes or at least check that adding the new prop at the end > >> of the 4kB is safe, no?. What do I miss? > > I think the main issue is that we can't know whether the apparent "holes" > > between extended capabilities are actually free. Depending on the vendor > > implementation, those regions may be reserved or used for vendor specific > > purposes, and I am not sure(please correct me) PCIe spec guarantee that > > such gaps are available for reuse. Hence thought of relying on the “next” > > pointer as a safe bet. > > > > Even if we look at the last CAP ID and derive a size based on the > > spec defined register layout, we still can;t know whether there is > > any additional vendor specific data beyond that "size". It is still > > a best guess and I don't think we gain much in adding this additional > > check. > > Ah OK I see what you mean (you may have discussed that earlier in other > threads sorry). So you may have vendor specific private data in the > holes. In that case I guess we cannot do much :-/ Also, we can only know the size of capabilities that are currently defined, we don't do a great job of keeping up with the latest ECNs. Unless we have device specific knowledge, the best we can do is hope that a gap between capabilities is unused. It might be a helpful indicator to verify the config space we intend to overlap is zero, though we can get false positives with such a method if we overlap a capability that k
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
On 1/6/26 12:38 PM, Shameer Kolothum wrote: > Hi Eric, > >> -Original Message- >> From: Eric Auger >> Sent: 06 January 2026 10:55 >> To: Shameer Kolothum ; Yi Liu >> ; [email protected]; [email protected] >> Cc: [email protected]; Jason Gunthorpe ; Nicolin >> Chen ; [email protected]; [email protected]; >> Nathan Chen ; Matt Ochs ; >> [email protected]; [email protected]; >> [email protected]; [email protected]; >> [email protected]; [email protected]; Krishnakant Jaju >> ; [email protected] >> Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM >> >> External email: Use caution opening links or attachments >> >> >> Hi Shameer, > [...] > >>>>> Besides the fact the offset is arbitrarily chosen so that this is the >>>>> last cap of the vconfig space, the code looks good to me. >>>>> So >>>>> Reviewed-by: Eric Auger >>>>> >>>>> Just wondering whether we couldn't add some generic pcie code that >>>>> parses the extended cap linked list to check the offset range is not >>>>> used by another cap before allowing the insertion at a given offset? >>>>> This wouldn't prevent a subsequent addition from failing but at least we >>>>> would know if there is some collision.this could be added later on though. >>>>> >>>> You're absolutely right. My approach of using the last 8 bytes was a >>>> shortcut to avoid implementing proper capability parsing logic >>>> (importing pci_regs.h and maintaining a cap_id-to-cap_size mapping >>>> table), and it simplified PASID capability detection by only examining >>>> the last 8bytes by a simple dump :(. However, this approach is not >>>> good as we cannot guarantee that the last 8bytes are unused by any >>>> device. >>>> >>>> Let's just implement the logic to walk the linked list of ext_caps to >>>> find an appropriate offset for our use case. >>> I had a go at this. Based on my understanding, even if we walk the PCIe >>> extended capability linked list, we still can't easily determine the size >>> occupied by the last capability as the extended capability header does not >>> encode a length, it only provides the "next" pointer, and for the last entry >>> next == 0. >> If my understanding is correct when walking the linked list, you can >> enumerate the start index and the PCIe extended Capability variable size >> which is made of fix header size + register block variable size which >> depends on the capability ID). After that we shall be able to allocate a >> slot within holes or at least check that adding the new prop at the end >> of the 4kB is safe, no?. What do I miss? > I think the main issue is that we can't know whether the apparent "holes" > between extended capabilities are actually free. Depending on the vendor > implementation, those regions may be reserved or used for vendor specific > purposes, and I am not sure(please correct me) PCIe spec guarantee that > such gaps are available for reuse. Hence thought of relying on the “next” > pointer as a safe bet. > > Even if we look at the last CAP ID and derive a size based on the > spec defined register layout, we still can;t know whether there is > any additional vendor specific data beyond that "size". It is still > a best guess and I don't think we gain much in adding this additional > check. Ah OK I see what you mean (you may have discussed that earlier in other threads sorry). So you may have vendor specific private data in the holes. In that case I guess we cannot do much :-/ > > Perhaps, I think we could inform the user that we are placing > teh PASID at the last offset and the onus is on user to make sure > it is safe to do so. or another solution is to let the user opt-in for this hasardous placement using an explicit x- prefixed option? Dunno Thanks Eric > > Thoughts? > > Thanks, > Shameer >
RE: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hi Eric, > -Original Message- > From: Eric Auger > Sent: 06 January 2026 10:55 > To: Shameer Kolothum ; Yi Liu > ; [email protected]; [email protected] > Cc: [email protected]; Jason Gunthorpe ; Nicolin > Chen ; [email protected]; [email protected]; > Nathan Chen ; Matt Ochs ; > [email protected]; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected]; Krishnakant Jaju > ; [email protected] > Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM > > External email: Use caution opening links or attachments > > > Hi Shameer, [...] > >>> Besides the fact the offset is arbitrarily chosen so that this is the > >>> last cap of the vconfig space, the code looks good to me. > >>> So > >>> Reviewed-by: Eric Auger > >>> > >>> Just wondering whether we couldn't add some generic pcie code that > >>> parses the extended cap linked list to check the offset range is not > >>> used by another cap before allowing the insertion at a given offset? > >>> This wouldn't prevent a subsequent addition from failing but at least we > >>> would know if there is some collision.this could be added later on though. > >>> > >> You're absolutely right. My approach of using the last 8 bytes was a > >> shortcut to avoid implementing proper capability parsing logic > >> (importing pci_regs.h and maintaining a cap_id-to-cap_size mapping > >> table), and it simplified PASID capability detection by only examining > >> the last 8bytes by a simple dump :(. However, this approach is not > >> good as we cannot guarantee that the last 8bytes are unused by any > >> device. > >> > >> Let's just implement the logic to walk the linked list of ext_caps to > >> find an appropriate offset for our use case. > > I had a go at this. Based on my understanding, even if we walk the PCIe > > extended capability linked list, we still can't easily determine the size > > occupied by the last capability as the extended capability header does not > > encode a length, it only provides the "next" pointer, and for the last entry > > next == 0. > If my understanding is correct when walking the linked list, you can > enumerate the start index and the PCIe extended Capability variable size > which is made of fix header size + register block variable size which > depends on the capability ID). After that we shall be able to allocate a > slot within holes or at least check that adding the new prop at the end > of the 4kB is safe, no?. What do I miss? I think the main issue is that we can't know whether the apparent "holes" between extended capabilities are actually free. Depending on the vendor implementation, those regions may be reserved or used for vendor specific purposes, and I am not sure(please correct me) PCIe spec guarantee that such gaps are available for reuse. Hence thought of relying on the “next” pointer as a safe bet. Even if we look at the last CAP ID and derive a size based on the spec defined register layout, we still can;t know whether there is any additional vendor specific data beyond that "size". It is still a best guess and I don't think we gain much in adding this additional check. Perhaps, I think we could inform the user that we are placing teh PASID at the last offset and the onus is on user to make sure it is safe to do so. Thoughts? Thanks, Shameer
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hi Shameer, On 1/5/26 5:33 PM, Shameer Kolothum wrote: > Hi Eric/ Yi, > > [Cc: Alex] > >> -Original Message- >> From: Yi Liu >> Sent: 09 December 2025 11:17 >> To: [email protected]; Shameer Kolothum >> ; [email protected]; qemu- >> [email protected] >> Cc: [email protected]; Jason Gunthorpe ; Nicolin >> Chen ; [email protected]; [email protected]; >> Nathan Chen ; Matt Ochs ; >> [email protected]; [email protected]; >> [email protected]; [email protected]; >> [email protected]; [email protected]; Krishnakant Jaju >> >> Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM >> >> External email: Use caution opening links or attachments >> >> >> On 2025/12/9 17:51, Eric Auger wrote: >>> Hi Shameer, >>> On 11/20/25 2:22 PM, Shameer Kolothum wrote: >>>> From: Yi Liu >>>> >>>> If user wants to expose PASID capability in vIOMMU, then VFIO would also >>>> need to report the PASID cap for this device if the underlying hardware >>>> supports it as well. >>>> >>>> As a start, this chooses to put the vPASID cap in the last 8 bytes of the >>>> vconfig space. This is a choice in the good hope of no conflict with any >>>> existing cap or hidden registers. For the devices that has hidden >>>> registers, >>>> user should figure out a proper offset for the vPASID cap. This may require >>>> an option for user to config it. Here we leave it as a future extension. >>>> There are more discussions on the mechanism of finding the proper offset. >>>> >>>> >> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8C >> [email protected]/ >>>> Since we add a check to ensure the vIOMMU supports PASID, only devices >>>> under those vIOMMUs can synthesize the vPASID capability. This gives >>>> users control over which devices expose vPASID. >>>> >>>> Signed-off-by: Yi Liu >>>> Tested-by: Zhangfei Gao >>>> Reviewed-by: Jonathan Cameron >>>> Signed-off-by: Shameer Kolothum >>>> --- >>>> hw/vfio/pci.c | 38 ++ >>>> include/hw/iommu.h | 1 + >>>> 2 files changed, 39 insertions(+) >>>> >>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >>>> index 8b8bc5a421..e11e39d667 100644 >>>> --- a/hw/vfio/pci.c >>>> +++ b/hw/vfio/pci.c >>>> @@ -24,6 +24,7 @@ >>>> #include >>>> >>>> #include "hw/hw.h" >>>> +#include "hw/iommu.h" >>>> #include "hw/pci/msi.h" >>>> #include "hw/pci/msix.h" >>>> #include "hw/pci/pci_bridge.h" >>>> @@ -2500,7 +2501,12 @@ static int >> vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos) >>>> static void vfio_add_ext_cap(VFIOPCIDevice *vdev) >>>> { >>>> +HostIOMMUDevice *hiod = vdev->vbasedev.hiod; >>>> +HostIOMMUDeviceClass *hiodc = >> HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>>> PCIDevice *pdev = PCI_DEVICE(vdev); >>>> +uint64_t max_pasid_log2 = 0; >>>> +bool pasid_cap_added = false; >>>> +uint64_t hw_caps; >>>> uint32_t header; >>>> uint16_t cap_id, next, size; >>>> uint8_t cap_ver; >>>> @@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice >> *vdev) >>>> pcie_add_capability(pdev, cap_id, cap_ver, next, size); >>>> } >>>> break; >>>> +/* >>>> + * VFIO kernel does not expose the PASID CAP today. We may >> synthesize >>>> + * one later through IOMMUFD APIs. If VFIO ever starts exposing >>>> it, >>>> + * record its presence here so we do not create a duplicate CAP. >>>> + */ >>>> +case PCI_EXT_CAP_ID_PASID: >>>> + pasid_cap_added = true; >>>> + /* fallthrough */ >>>> default: >>>> pcie_add_capability(pdev, cap_id, cap_ver, next, size); >>>> } >>>> >>>> } >>>> >>>> +#ifdef CONFIG_IOMMUFD >>>> +/* Try to retrieve PASID CAP through IOMMU
RE: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hi Eric/ Yi, [Cc: Alex] > -Original Message- > From: Yi Liu > Sent: 09 December 2025 11:17 > To: [email protected]; Shameer Kolothum > ; [email protected]; qemu- > [email protected] > Cc: [email protected]; Jason Gunthorpe ; Nicolin > Chen ; [email protected]; [email protected]; > Nathan Chen ; Matt Ochs ; > [email protected]; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected]; Krishnakant Jaju > > Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM > > External email: Use caution opening links or attachments > > > On 2025/12/9 17:51, Eric Auger wrote: > > Hi Shameer, > > On 11/20/25 2:22 PM, Shameer Kolothum wrote: > >> From: Yi Liu > >> > >> If user wants to expose PASID capability in vIOMMU, then VFIO would also > >> need to report the PASID cap for this device if the underlying hardware > >> supports it as well. > >> > >> As a start, this chooses to put the vPASID cap in the last 8 bytes of the > >> vconfig space. This is a choice in the good hope of no conflict with any > >> existing cap or hidden registers. For the devices that has hidden > >> registers, > >> user should figure out a proper offset for the vPASID cap. This may require > >> an option for user to config it. Here we leave it as a future extension. > >> There are more discussions on the mechanism of finding the proper offset. > >> > >> > https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8C > [email protected]/ > >> > >> Since we add a check to ensure the vIOMMU supports PASID, only devices > >> under those vIOMMUs can synthesize the vPASID capability. This gives > >> users control over which devices expose vPASID. > >> > >> Signed-off-by: Yi Liu > >> Tested-by: Zhangfei Gao > >> Reviewed-by: Jonathan Cameron > >> Signed-off-by: Shameer Kolothum > >> --- > >> hw/vfio/pci.c | 38 ++ > >> include/hw/iommu.h | 1 + > >> 2 files changed, 39 insertions(+) > >> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index 8b8bc5a421..e11e39d667 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -24,6 +24,7 @@ > >> #include > >> > >> #include "hw/hw.h" > >> +#include "hw/iommu.h" > >> #include "hw/pci/msi.h" > >> #include "hw/pci/msix.h" > >> #include "hw/pci/pci_bridge.h" > >> @@ -2500,7 +2501,12 @@ static int > vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos) > >> > >> static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > >> { > >> +HostIOMMUDevice *hiod = vdev->vbasedev.hiod; > >> +HostIOMMUDeviceClass *hiodc = > HOST_IOMMU_DEVICE_GET_CLASS(hiod); > >> PCIDevice *pdev = PCI_DEVICE(vdev); > >> +uint64_t max_pasid_log2 = 0; > >> +bool pasid_cap_added = false; > >> +uint64_t hw_caps; > >> uint32_t header; > >> uint16_t cap_id, next, size; > >> uint8_t cap_ver; > >> @@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice > *vdev) > >> pcie_add_capability(pdev, cap_id, cap_ver, next, size); > >> } > >> break; > >> +/* > >> + * VFIO kernel does not expose the PASID CAP today. We may > synthesize > >> + * one later through IOMMUFD APIs. If VFIO ever starts exposing > >> it, > >> + * record its presence here so we do not create a duplicate CAP. > >> + */ > >> +case PCI_EXT_CAP_ID_PASID: > >> + pasid_cap_added = true; > >> + /* fallthrough */ > >> default: > >> pcie_add_capability(pdev, cap_id, cap_ver, next, size); > >> } > >> > >> } > >> > >> +#ifdef CONFIG_IOMMUFD > >> +/* Try to retrieve PASID CAP through IOMMUFD APIs */ > >> +if (!pasid_cap_added && hiodc && hiodc->get_cap) { > >> +hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, > &hw_caps, NULL); > >> +hiodc->get_cap(hiod, > HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2, > >> + &max_pasid_log2, NULL); > >> +
RE: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hi Cédric, > -Original Message- > From: Cédric Le Goater > Sent: 15 December 2025 10:55 > To: Shameer Kolothum ; qemu- > [email protected]; [email protected] > Cc: [email protected]; [email protected]; Jason Gunthorpe > ; Nicolin Chen ; > [email protected]; [email protected]; Nathan Chen > ; Matt Ochs ; > [email protected]; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected]; [email protected]; > Krishnakant Jaju > Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM > > External email: Use caution opening links or attachments > > > On 11/20/25 14:22, Shameer Kolothum wrote: > > From: Yi Liu > > > > If user wants to expose PASID capability in vIOMMU, then VFIO would also > > need to report the PASID cap for this device if the underlying hardware > > supports it as well. > > > > As a start, this chooses to put the vPASID cap in the last 8 bytes of the > > vconfig space. This is a choice in the good hope of no conflict with any > > existing cap or hidden registers. For the devices that has hidden registers, > > user should figure out a proper offset for the vPASID cap. This may require > > an option for user to config it. Here we leave it as a future extension. > > There are more discussions on the mechanism of finding the proper offset. > > > > > https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2 > @BN9PR11MB5276.namprd11.prod.outlook.com/ > > > > Since we add a check to ensure the vIOMMU supports PASID, only devices > > under those vIOMMUs can synthesize the vPASID capability. This gives > > users control over which devices expose vPASID. > > > > Signed-off-by: Yi Liu > > Tested-by: Zhangfei Gao > > Reviewed-by: Jonathan Cameron > > Signed-off-by: Shameer Kolothum > > --- > > hw/vfio/pci.c | 38 ++ > > include/hw/iommu.h | 1 + > > 2 files changed, 39 insertions(+) > > > I just noticed another problem with this change. It relies on the > availability of the HostIOMMUDevice which doesn't exist with VFIO > mdev devices, such as vGPU. QEMU simply coredumps :/ > > We will have to check/protect QEMU in some ways. I need to take > a closer look because mdev handling seems to be spread across > the code and may need to be improved first. I did attempt a rework on this patch and the previous one(patch #31) to address the above issue and to avoid the #ifdef CONFIG_IOMMUFD in vfio. Please find below: Patch #1: This adds get_pasid_info to HostIOMMUDeviceClass. One thing I am not sure, below is to use #ifdef CONFIG_LINUX or not. Please take a look and let me know if this is the right direction or not. From e1305b0d44b2002778059decc3d6b220414b0589 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Fri, 2 Jan 2026 14:50:58 + Subject: [PATCH 1/2] backends/iommufd: Add get_pasid_info TODO: Signed-off-by: Shameer Kolothum --- backends/iommufd.c | 17 + include/system/host_iommu_device.h | 19 +++ 2 files changed, 36 insertions(+) diff --git a/backends/iommufd.c b/backends/iommufd.c index 2c9ce1a03a..7beff372ba 100644 --- a/backends/iommufd.c +++ b/backends/iommufd.c @@ -634,11 +634,28 @@ static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp) } } +static bool hiod_iommufd_get_pasid_info(HostIOMMUDevice *hiod, +HostIOMMUDevicePasidInfo *pasid_info) +{ +HostIOMMUDeviceCaps *caps = &hiod->caps; + +if (!caps->max_pasid_log2) { +return false; +} + +g_assert(pasid_info); +pasid_info->exec_perm = (caps->hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC); +pasid_info->priv_mod = (caps->hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV); +pasid_info->max_pasid_log2 = caps->max_pasid_log2; +return true; +} + static void hiod_iommufd_class_init(ObjectClass *oc, const void *data) { HostIOMMUDeviceClass *hioc = HOST_IOMMU_DEVICE_CLASS(oc); hioc->get_cap = hiod_iommufd_get_cap; +hioc->get_pasid_info = hiod_iommufd_get_pasid_info; }; static const TypeInfo types[] = { diff --git a/include/system/host_iommu_device.h b/include/system/host_iommu_device.h index bfb2b60478..6e62f643fe 100644 --- a/include/system/host_iommu_device.h +++ b/include/system/host_iommu_device.h @@ -22,6 +22,13 @@ typedef union VendorCaps { struct iommu_hw_info_arm_smmuv3 smmuv3; } VendorCaps; + +typedef struct HostIOMMUDevicePasidInfo { +bool exec_perm; +bool priv_mod; +uint64_t max_pasid_log2; +} HostIOMMUDevicePasidInfo; + /** * struct HostIOMMUDeviceCaps - Define host IOMMU d
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
On 11/20/25 14:22, Shameer Kolothum wrote:
From: Yi Liu
If user wants to expose PASID capability in vIOMMU, then VFIO would also
need to report the PASID cap for this device if the underlying hardware
supports it as well.
As a start, this chooses to put the vPASID cap in the last 8 bytes of the
vconfig space. This is a choice in the good hope of no conflict with any
existing cap or hidden registers. For the devices that has hidden registers,
user should figure out a proper offset for the vPASID cap. This may require
an option for user to config it. Here we leave it as a future extension.
There are more discussions on the mechanism of finding the proper offset.
https://lore.kernel.org/kvm/bn9pr11mb5276318969a212ad0649c7be8c...@bn9pr11mb5276.namprd11.prod.outlook.com/
Since we add a check to ensure the vIOMMU supports PASID, only devices
under those vIOMMUs can synthesize the vPASID capability. This gives
users control over which devices expose vPASID.
Signed-off-by: Yi Liu
Tested-by: Zhangfei Gao
Reviewed-by: Jonathan Cameron
Signed-off-by: Shameer Kolothum
---
hw/vfio/pci.c | 38 ++
include/hw/iommu.h | 1 +
2 files changed, 39 insertions(+)
I just noticed another problem with this change. It relies on the
availability of the HostIOMMUDevice which doesn't exist with VFIO
mdev devices, such as vGPU. QEMU simply coredumps :/
We will have to check/protect QEMU in some ways. I need to take
a closer look because mdev handling seems to be spread across
the code and may need to be improved first.
Thanks,
C.
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8b8bc5a421..e11e39d667 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -24,6 +24,7 @@
#include
#include "hw/hw.h"
+#include "hw/iommu.h"
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "hw/pci/pci_bridge.h"
@@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev,
uint16_t pos)
static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
{
+HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
+HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
PCIDevice *pdev = PCI_DEVICE(vdev);
+uint64_t max_pasid_log2 = 0;
+bool pasid_cap_added = false;
+uint64_t hw_caps;
uint32_t header;
uint16_t cap_id, next, size;
uint8_t cap_ver;
@@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
break;
+/*
+ * VFIO kernel does not expose the PASID CAP today. We may synthesize
+ * one later through IOMMUFD APIs. If VFIO ever starts exposing it,
+ * record its presence here so we do not create a duplicate CAP.
+ */
+case PCI_EXT_CAP_ID_PASID:
+ pasid_cap_added = true;
+ /* fallthrough */
default:
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
}
+#ifdef CONFIG_IOMMUFD
+/* Try to retrieve PASID CAP through IOMMUFD APIs */
+if (!pasid_cap_added && hiodc && hiodc->get_cap) {
+hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
+hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
+ &max_pasid_log2, NULL);
+}
+
+/*
+ * If supported, adds the PASID capability in the end of the PCIe config
+ * space. TODO: Add option for enabling pasid at a safe offset.
+ */
+if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
+ VIOMMU_FLAG_PASID_SUPPORTED)) {
+bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC);
+bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV);
+
+pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE -
PCI_EXT_CAP_PASID_SIZEOF,
+max_pasid_log2, exec_perm, priv_mod);
+/* PASID capability is fully emulated by QEMU */
+memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
+}
+#endif
+
/* Cleanup chain head ID if necessary */
if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0x) {
pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
index 9b8bb94fc2..9635770bee 100644
--- a/include/hw/iommu.h
+++ b/include/hw/iommu.h
@@ -20,6 +20,7 @@
enum viommu_flags {
/* vIOMMU needs nesting parent HWPT to create nested HWPT */
VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
+VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
};
#endif /* HW_IOMMU_H */
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
On 2025/12/9 17:51, Eric Auger wrote:
Hi Shameer,
On 11/20/25 2:22 PM, Shameer Kolothum wrote:
From: Yi Liu
If user wants to expose PASID capability in vIOMMU, then VFIO would also
need to report the PASID cap for this device if the underlying hardware
supports it as well.
As a start, this chooses to put the vPASID cap in the last 8 bytes of the
vconfig space. This is a choice in the good hope of no conflict with any
existing cap or hidden registers. For the devices that has hidden registers,
user should figure out a proper offset for the vPASID cap. This may require
an option for user to config it. Here we leave it as a future extension.
There are more discussions on the mechanism of finding the proper offset.
https://lore.kernel.org/kvm/bn9pr11mb5276318969a212ad0649c7be8c...@bn9pr11mb5276.namprd11.prod.outlook.com/
Since we add a check to ensure the vIOMMU supports PASID, only devices
under those vIOMMUs can synthesize the vPASID capability. This gives
users control over which devices expose vPASID.
Signed-off-by: Yi Liu
Tested-by: Zhangfei Gao
Reviewed-by: Jonathan Cameron
Signed-off-by: Shameer Kolothum
---
hw/vfio/pci.c | 38 ++
include/hw/iommu.h | 1 +
2 files changed, 39 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8b8bc5a421..e11e39d667 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -24,6 +24,7 @@
#include
#include "hw/hw.h"
+#include "hw/iommu.h"
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "hw/pci/pci_bridge.h"
@@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev,
uint16_t pos)
static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
{
+HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
+HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
PCIDevice *pdev = PCI_DEVICE(vdev);
+uint64_t max_pasid_log2 = 0;
+bool pasid_cap_added = false;
+uint64_t hw_caps;
uint32_t header;
uint16_t cap_id, next, size;
uint8_t cap_ver;
@@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
break;
+/*
+ * VFIO kernel does not expose the PASID CAP today. We may synthesize
+ * one later through IOMMUFD APIs. If VFIO ever starts exposing it,
+ * record its presence here so we do not create a duplicate CAP.
+ */
+case PCI_EXT_CAP_ID_PASID:
+ pasid_cap_added = true;
+ /* fallthrough */
default:
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
}
+#ifdef CONFIG_IOMMUFD
+/* Try to retrieve PASID CAP through IOMMUFD APIs */
+if (!pasid_cap_added && hiodc && hiodc->get_cap) {
+hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
+hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
+ &max_pasid_log2, NULL);
+}
+
+/*
+ * If supported, adds the PASID capability in the end of the PCIe config
+ * space. TODO: Add option for enabling pasid at a safe offset.
+ */
+if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
+ VIOMMU_FLAG_PASID_SUPPORTED)) {
+bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC);
+bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV);
+
+pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE -
PCI_EXT_CAP_PASID_SIZEOF,
+max_pasid_log2, exec_perm, priv_mod);
+/* PASID capability is fully emulated by QEMU */
+memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
+}
+#endif
+
/* Cleanup chain head ID if necessary */
if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0x) {
pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
index 9b8bb94fc2..9635770bee 100644
--- a/include/hw/iommu.h
+++ b/include/hw/iommu.h
@@ -20,6 +20,7 @@
enum viommu_flags {
/* vIOMMU needs nesting parent HWPT to create nested HWPT */
VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
+VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
};
#endif /* HW_IOMMU_H */
Besides the fact the offset is arbitrarily chosen so that this is the
last cap of the vconfig space, the code looks good to me.
So
Reviewed-by: Eric Auger
Just wondering whether we couldn't add some generic pcie code that
parses the extended cap linked list to check the offset range is not
used by another cap before allowing the insertion at a given offset?
This wouldn't prevent a subsequent addition from failing but at least we
would know if there is some collision.this could be added later on though.
You're absolutely right. My approach of using the last 8 bytes was a
shortcut to avoid implementing proper capability parsing logic
(importing pci_regs.h
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
Hi Shameer,
On 11/20/25 2:22 PM, Shameer Kolothum wrote:
> From: Yi Liu
>
> If user wants to expose PASID capability in vIOMMU, then VFIO would also
> need to report the PASID cap for this device if the underlying hardware
> supports it as well.
>
> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> vconfig space. This is a choice in the good hope of no conflict with any
> existing cap or hidden registers. For the devices that has hidden registers,
> user should figure out a proper offset for the vPASID cap. This may require
> an option for user to config it. Here we leave it as a future extension.
> There are more discussions on the mechanism of finding the proper offset.
>
> https://lore.kernel.org/kvm/bn9pr11mb5276318969a212ad0649c7be8c...@bn9pr11mb5276.namprd11.prod.outlook.com/
>
> Since we add a check to ensure the vIOMMU supports PASID, only devices
> under those vIOMMUs can synthesize the vPASID capability. This gives
> users control over which devices expose vPASID.
>
> Signed-off-by: Yi Liu
> Tested-by: Zhangfei Gao
> Reviewed-by: Jonathan Cameron
> Signed-off-by: Shameer Kolothum
> ---
> hw/vfio/pci.c | 38 ++
> include/hw/iommu.h | 1 +
> 2 files changed, 39 insertions(+)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8b8bc5a421..e11e39d667 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -24,6 +24,7 @@
> #include
>
> #include "hw/hw.h"
> +#include "hw/iommu.h"
> #include "hw/pci/msi.h"
> #include "hw/pci/msix.h"
> #include "hw/pci/pci_bridge.h"
> @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev,
> uint16_t pos)
>
> static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> {
> +HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
> +HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
> PCIDevice *pdev = PCI_DEVICE(vdev);
> +uint64_t max_pasid_log2 = 0;
> +bool pasid_cap_added = false;
> +uint64_t hw_caps;
> uint32_t header;
> uint16_t cap_id, next, size;
> uint8_t cap_ver;
> @@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> }
> break;
> +/*
> + * VFIO kernel does not expose the PASID CAP today. We may synthesize
> + * one later through IOMMUFD APIs. If VFIO ever starts exposing it,
> + * record its presence here so we do not create a duplicate CAP.
> + */
> +case PCI_EXT_CAP_ID_PASID:
> + pasid_cap_added = true;
> + /* fallthrough */
> default:
> pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> }
>
> }
>
> +#ifdef CONFIG_IOMMUFD
> +/* Try to retrieve PASID CAP through IOMMUFD APIs */
> +if (!pasid_cap_added && hiodc && hiodc->get_cap) {
> +hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps,
> NULL);
> +hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
> + &max_pasid_log2, NULL);
> +}
> +
> +/*
> + * If supported, adds the PASID capability in the end of the PCIe config
> + * space. TODO: Add option for enabling pasid at a safe offset.
> + */
> +if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
> + VIOMMU_FLAG_PASID_SUPPORTED)) {
> +bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC);
> +bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV);
> +
> +pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE -
> PCI_EXT_CAP_PASID_SIZEOF,
> +max_pasid_log2, exec_perm, priv_mod);
> +/* PASID capability is fully emulated by QEMU */
> +memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
> +}
> +#endif
> +
> /* Cleanup chain head ID if necessary */
> if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0x) {
> pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
> diff --git a/include/hw/iommu.h b/include/hw/iommu.h
> index 9b8bb94fc2..9635770bee 100644
> --- a/include/hw/iommu.h
> +++ b/include/hw/iommu.h
> @@ -20,6 +20,7 @@
> enum viommu_flags {
> /* vIOMMU needs nesting parent HWPT to create nested HWPT */
> VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
> };
>
> #endif /* HW_IOMMU_H */
Besides the fact the offset is arbitrarily chosen so that this is the
last cap of the vconfig space, the code looks good to me.
So
Reviewed-by: Eric Auger
Just wondering whether we couldn't add some generic pcie code that
parses the extended cap linked list to check the offset range is not
used by another cap before allowing the insertion at a given offset?
This wouldn't prevent a subsequent addition from failing but at least we
would know if there is some collision.this could be added later on though.
Thanks
Eric
Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
On Thu, Nov 20, 2025 at 01:22:12PM +, Shameer Kolothum wrote: > From: Yi Liu > > If user wants to expose PASID capability in vIOMMU, then VFIO would also > need to report the PASID cap for this device if the underlying hardware > supports it as well. > > As a start, this chooses to put the vPASID cap in the last 8 bytes of the > vconfig space. This is a choice in the good hope of no conflict with any > existing cap or hidden registers. For the devices that has hidden registers, > user should figure out a proper offset for the vPASID cap. This may require > an option for user to config it. Here we leave it as a future extension. > There are more discussions on the mechanism of finding the proper offset. > > https://lore.kernel.org/kvm/bn9pr11mb5276318969a212ad0649c7be8c...@bn9pr11mb5276.namprd11.prod.outlook.com/ > > Since we add a check to ensure the vIOMMU supports PASID, only devices > under those vIOMMUs can synthesize the vPASID capability. This gives > users control over which devices expose vPASID. > > Signed-off-by: Yi Liu > Tested-by: Zhangfei Gao > Reviewed-by: Jonathan Cameron > Signed-off-by: Shameer Kolothum Reviewed-by: Nicolin Chen
