Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 08:14:29PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote: > > I think the same PCI driver with a small flag to support the PF or > > VF is not the same as two completely different drivers in different > > subsystems > > There are counter-examples: ixgbe vs. ixgbevf. > > Note that also a single driver can support both, an SVA device and an > mdev device, sharing code for accessing parts of the device like queues > and handling interrupts. Needing a mdev device at all is the larger issue, mdev means the kernel must carry a lot of emulation code depending on how the SVA device is designed. Eg creating queues may require an emulated BAR. Shifting that code to userspace and having a single clean 'SVA' interface from the kernel for the device makes a lot more sense, esepcially from a security perspective. Forcing all vIOMMU stuff to only use VFIO permanently closes this as an option. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote: > I think the same PCI driver with a small flag to support the PF or > VF is not the same as two completely different drivers in different > subsystems There are counter-examples: ixgbe vs. ixgbevf. Note that also a single driver can support both, an SVA device and an mdev device, sharing code for accessing parts of the device like queues and handling interrupts. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 05:55:40PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote: > > This whole thread was brought up by IDXD which has a SVA driver and > > now wants to add a vfio-mdev driver too. SVA devices that want to be > > plugged into VMs are going to be common - this architecture that a SVA > > driver cannot cover the kvm case seems problematic. > > Isn't that the same pattern as having separate drivers for VFs and the > parent device in SR-IOV? I think the same PCI driver with a small flag to support the PF or VF is not the same as two completely different drivers in different subsystems Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote: > This whole thread was brought up by IDXD which has a SVA driver and > now wants to add a vfio-mdev driver too. SVA devices that want to be > plugged into VMs are going to be common - this architecture that a SVA > driver cannot cover the kvm case seems problematic. Isn't that the same pattern as having separate drivers for VFs and the parent device in SR-IOV? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 03:35:32PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote: > > The point is that other places beyond VFIO need this > > Which and why? > > > Sure, but sometimes it is necessary, and in those cases the answer > > can't be "rewrite a SVA driver to use vfio" > > This is getting to abstract. Can you come up with an example where > handling this in VFIO or an endpoint device kernel driver does not work? This whole thread was brought up by IDXD which has a SVA driver and now wants to add a vfio-mdev driver too. SVA devices that want to be plugged into VMs are going to be common - this architecture that a SVA driver cannot cover the kvm case seems problematic. Yes, everything can have a SVA driver and a vfio-mdev, it works just fine, but it is not very clean or simple. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote: > The point is that other places beyond VFIO need this Which and why? > Sure, but sometimes it is necessary, and in those cases the answer > can't be "rewrite a SVA driver to use vfio" This is getting to abstract. Can you come up with an example where handling this in VFIO or an endpoint device kernel driver does not work? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 03:03:18PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote: > > Userspace needs fine grained control over the composition of the page > > table behind the PASID, 1:1 with the mm_struct is only one use case. > > VFIO already offers an interface for that. It shouldn't be too > complicated to expand that for PASID-bound page-tables. > > > Userspace needs to be able to handle IOMMU faults, apparently > > Could be implemented by a fault-fd handed out by VFIO. The point is that other places beyond VFIO need this > I really don't think that user-space should have to deal with details > like PASIDs or other IOMMU internals, unless absolutly necessary. This > is an OS we work on, and the idea behind an OS is to abstract the > hardware away. Sure, but sometimes it is necessary, and in those cases the answer can't be "rewrite a SVA driver to use vfio" Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote: > Userspace needs fine grained control over the composition of the page > table behind the PASID, 1:1 with the mm_struct is only one use case. VFIO already offers an interface for that. It shouldn't be too complicated to expand that for PASID-bound page-tables. > Userspace needs to be able to handle IOMMU faults, apparently Could be implemented by a fault-fd handed out by VFIO. > The Intel guys had a bunch of other stuff too, looking through the new > API they are proposing for vfio gives some flavour what they think is > needed.. I really don't think that user-space should have to deal with details like PASIDs or other IOMMU internals, unless absolutly necessary. This is an OS we work on, and the idea behind an OS is to abstract the hardware away. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 02:18:52PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote: > > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > > > So having said this, what is the benefit of exposing those SVA internals > > > to user-space? > > > > Only the device use of the PASID is device specific, the actual PASID > > and everything on the IOMMU side is generic. > > > > There is enough API there it doesn't make sense to duplicate it into > > every single SVA driver. > > What generic things have to be done by the drivers besides > allocating/deallocating PASIDs and binding an address space to it? > > Is there anything which isn't better handled in a kernel-internal > library which drivers just use? Userspace needs fine grained control over the composition of the page table behind the PASID, 1:1 with the mm_struct is only one use case. Userspace needs to be able to handle IOMMU faults, apparently The Intel guys had a bunch of other stuff too, looking through the new API they are proposing for vfio gives some flavour what they think is needed.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote: > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > > So having said this, what is the benefit of exposing those SVA internals > > to user-space? > > Only the device use of the PASID is device specific, the actual PASID > and everything on the IOMMU side is generic. > > There is enough API there it doesn't make sense to duplicate it into > every single SVA driver. What generic things have to be done by the drivers besides allocating/deallocating PASIDs and binding an address space to it? Is there anything which isn't better handled in a kernel-internal library which drivers just use? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > So having said this, what is the benefit of exposing those SVA internals > to user-space? Only the device use of the PASID is device specific, the actual PASID and everything on the IOMMU side is generic. There is enough API there it doesn't make sense to duplicate it into every single SVA driver. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > From: Jason Wang > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). Honestly, I fail to see the benefit of offloading these IOMMU specific setup tasks to user-space. The ways PASID and the device partitioning it allows are used are very device specific. A GPU will be partitioned completly different than a network card. So the device drivers should use the (v)SVA APIs to setup the partitioning in a way which makes sense for the device. And VFIO is of course a user by itself, as it allows assigning device partitions to guests. Or even allow assigning complete devices and allow the guests to partition it themselfes. So having said this, what is the benefit of exposing those SVA internals to user-space? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午11:54, Liu, Yi L wrote: Hi Jason, From: Jason Wang Sent: Thursday, October 22, 2020 10:56 AM [...] If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? yes, that's true. So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Yes, fine with me. Thanks Regards, Yi Liu Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, > From: Jason Wang > Sent: Thursday, October 22, 2020 10:56 AM > [...] > If you(Intel) don't have plan to do vDPA, you should not prevent other vendors > from implementing PASID capable hardware through non-VFIO subsystem/uAPI > on top of your SIOV architecture. Isn't it? yes, that's true. > So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g > it's not > hard to have a PASID capable virtio device through qemu, and we can start from > there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Regards, Yi Liu > > Thanks > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午1:51, Raj, Ashok wrote: On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: I think we agreed (or agree to disagree and commit) for device types that we have for SIOV, VFIO based approach works well without having to re-invent another way to do the same things. Not looking for a shortcut by any means, but we need to plan around existing hardware though. Looks like vDPA took some shortcuts then to not abstract iommu uAPI instead :-)? When all necessary hardware was available.. This would be a solved puzzle. I think it is the opposite, vIOMMU and related has outgrown VFIO as the "home" and needs to stand alone. Apparently the HW that will need PASID for vDPA is Intel HW, so if So just to make this clear, I did check internally if there are any plans for vDPA + SVM. There are none at the moment. Not SVM, SIOV. ... And that included.. I should have said vDPA + PASID, No current plans. I have no idea who set expectations with you. Can you please put me in touch with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Can you explain why it must care VFIO abstractions? vDPA is trying to hide device details which is fundamentally different with what VFIO wants to do. vDPA allows the parent to deal with IOMMU stuffs, and if necessary, the parent can talk with IOMMU drivers directly via IOMMU APIs. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The reality is VFIO should not be the only user for (v)SVA/SIOV/PASID. The kernel hard already had users like GPU or uacce. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. The mdev is not perfect for sure, but it's another topic. If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:32:18PM -0300, Jason Gunthorpe wrote: > On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > > > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > > SVM is orthogonal to how we achieve mdev passthrough to guest and > > vSVM. > > Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed > on the VDPA side as well, I think that is why JasonW brought this up > in the first place. True, to that effect we are working on trying to move PASID allocation outside of VFIO, so both agents VFIO and vDPA with PASID, when that comes available can support one way to allocate and manage PASID's from user space. Since the IOASID is almost standalone, this is possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > SVM is orthogonal to how we achieve mdev passthrough to guest and > vSVM. Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed on the VDPA side as well, I think that is why JasonW brought this up in the first place. We may not see vSVA for VDPA, but that seems like some special sub mode of all the other vIOMMU and PASID stuff, and not a completely distinct thing. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 03:24:42PM -0300, Jason Gunthorpe wrote: > > > Contrary to your argument, vDPA went with a half blown device only > > iommu user without considering existing abstractions like containers > > VDPA IOMMU was done *for Intel*, as the kind of half-architected thing > you are advocating should be allowed for IDXD here. Not sure why you > think bashing that work is going to help your case here. I'm not bashing that work, sorry if it comes out that way, but just feels like double standards. I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native SVM is orthogonal to how we achieve mdev passthrough to guest and vSVM. We visited that exact thing multiple times. Doing SVM is quite simple and doesn't carry the weight of other (Kevin explained this in detail not too long ago) long list of things we need to accomplish for mdev pass through. For SVM, just access to hw, mmio and bind_mm to get a PASID bound with IOMMU. For IDXD that creates passthough devices for guest access and vSVM is through the VFIO path. For guest SVM, we expose mdev's to guest OS, idxd in the guest provides vSVM services. vSVM is *not* build around native SVM interfaces. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > > I think we agreed (or agree to disagree and commit) for device > > > > > > types that > > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > > re-invent > > > > > > another way to do the same things. Not looking for a shortcut by > > > > > > any means, > > > > > > but we need to plan around existing hardware though. Looks like > > > > > > vDPA took > > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > > the "home" and needs to stand alone. > > > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > > > So just to make this clear, I did check internally if there are any > > > > plans > > > > for vDPA + SVM. There are none at the moment. > > > > > > Not SVM, SIOV. > > > > ... And that included.. I should have said vDPA + PASID, No current plans. > > I have no idea who set expectations with you. Can you please put me in > > touch > > with that person, privately is fine. > > It was the team that aruged VDPA had to be done through VFIO - SIOV > and PASID was one of their reasons it had to be VFIO, check the list > archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. > > If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 10:51:46AM -0700, Raj, Ashok wrote: > > If they didn't plan to use it, bit of a strawman argument, right? > > This doesn't need to continue like the debates :-) Pun intended :-) > > I don't think it makes any sense to have an abstract strawman argument > design discussion. Yi is looking into for pasid management alone. Rest > of the IOMMU related topics should wait until we have another > *real* use requiring consolidation. Actually I'm really annoyed right now that the other Intel team wasted quiet a lot of the rest of our time on arguing about vDPA and vfio with no actual interest in this technology. So you'll excuse me if I'm not particularly enamored with this discussion right now. > Contrary to your argument, vDPA went with a half blown device only > iommu user without considering existing abstractions like containers VDPA IOMMU was done *for Intel*, as the kind of half-architected thing you are advocating should be allowed for IDXD here. Not sure why you think bashing that work is going to help your case here. I'm saying Intel needs to get its architecture together and stop ceating this mess across the kernel to support Intel devices. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > I think we agreed (or agree to disagree and commit) for device types > > > > > that > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > re-invent > > > > > another way to do the same things. Not looking for a shortcut by any > > > > > means, > > > > > but we need to plan around existing hardware though. Looks like vDPA > > > > > took > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > the "home" and needs to stand alone. > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > So just to make this clear, I did check internally if there are any plans > > > for vDPA + SVM. There are none at the moment. > > > > Not SVM, SIOV. > > ... And that included.. I should have said vDPA + PASID, No current plans. > I have no idea who set expectations with you. Can you please put me in touch > with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives If they didn't plan to use it, bit of a strawman argument, right? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/20 下午10:19, Liu, Yi L wrote: From: Jason Gunthorpe Sent: Tuesday, October 20, 2020 10:02 PM [...] Whoever provides the vIOMMU emulation and relays the page fault to the guest has to translate the RID - that's the point. But the device info (especially the sub-device info) is within the passthru framework (e.g. VFIO). So page fault reporting needs to go through passthru framework. what does that have to do with VFIO? How will VPDA provide the vIOMMU emulation? a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor specification, right? you may correct me if I'm missing anything. I'm asking how will VDPA translate the RID when VDPA triggers a page fault that has to be relayed to the guest. VDPA also has virtual PCI devices it creates. I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU or other type vIOMMU. The kernel code is ready. Note that vhost suppport for vIOMMU is even earlier than VFIO. The API is designed to be generic is not limited to any specific type of vIOMMU. For qemu, it just need a patch to implement map/unmap notifier as what VFIO did. Thanks Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > I think we agreed (or agree to disagree and commit) for device types > > > > that > > > > we have for SIOV, VFIO based approach works well without having to > > > > re-invent > > > > another way to do the same things. Not looking for a shortcut by any > > > > means, > > > > but we need to plan around existing hardware though. Looks like vDPA > > > > took > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > the "home" and needs to stand alone. > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > So just to make this clear, I did check internally if there are any plans > > for vDPA + SVM. There are none at the moment. > > Not SVM, SIOV. ... And that included.. I should have said vDPA + PASID, No current plans. I have no idea who set expectations with you. Can you please put me in touch with that person, privately is fine. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > I think we agreed (or agree to disagree and commit) for device types that > > > we have for SIOV, VFIO based approach works well without having to > > > re-invent > > > another way to do the same things. Not looking for a shortcut by any > > > means, > > > but we need to plan around existing hardware though. Looks like vDPA took > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > necessary hardware was available.. This would be a solved puzzle. > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > the "home" and needs to stand alone. > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > So just to make this clear, I did check internally if there are any plans > for vDPA + SVM. There are none at the moment. Not SVM, SIOV. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > I think we agreed (or agree to disagree and commit) for device types that > > we have for SIOV, VFIO based approach works well without having to > > re-invent > > another way to do the same things. Not looking for a shortcut by any means, > > but we need to plan around existing hardware though. Looks like vDPA took > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > necessary hardware was available.. This would be a solved puzzle. > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > the "home" and needs to stand alone. > > Apparently the HW that will need PASID for vDPA is Intel HW, so if So just to make this clear, I did check internally if there are any plans for vDPA + SVM. There are none at the moment. It seems like you have better insight into our plans ;-). Please do let me know who confirmed vDPA roadmap with you and I would love to talk to them to clear the air. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > I think we agreed (or agree to disagree and commit) for device types that > we have for SIOV, VFIO based approach works well without having to re-invent > another way to do the same things. Not looking for a shortcut by any means, > but we need to plan around existing hardware though. Looks like vDPA took > some shortcuts then to not abstract iommu uAPI instead :-)? When all > necessary hardware was available.. This would be a solved puzzle. I think it is the opposite, vIOMMU and related has outgrown VFIO as the "home" and needs to stand alone. Apparently the HW that will need PASID for vDPA is Intel HW, so if more is needed to do a good design you are probably the only one that can get it/do it. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 02:03:36PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote: > > Hi Jason, > > > > > > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > > > > > I'm sure there will be some > > > > > weird overlaps because we can't delete any of the existing VFIO APIs, > > > > > but > > > > > that > > > > > should not be a blocker. > > > > > > > > but the weird thing is what we should consider. And it perhaps not just > > > > overlap, it may be a re-definition of VFIO container. As I mentioned, > > > > VFIO > > > > container is IOMMU context from the day it was defined. It could be the > > > > blocker. :-( > > > > > > So maybe you have to broaden the VFIO container to be usable by other > > > subsystems. The discussion here is about what the uAPI should look > > > like in a fairly abstract way. When we say 'dev/sva' it just some > > > placeholder for a shared cdev that provides the necessary > > > dis-aggregated functionality > > > > > > It could be an existing cdev with broader functionaltiy, it could > > > really be /dev/iommu, etc. This is up to the folks building it to > > > decide. > > > > > > > I'm not expert on vDPA for now, but I saw you three open source > > > > veterans have a similar idea for a place to cover IOMMU handling, > > > > I think it may be a valuable thing to do. I said "may be" as I'm not > > > > sure about Alex's opinion on such idea. But the sure thing is this > > > > idea may introduce weird overlap even re-definition of existing > > > > thing as I replied above. We need to evaluate the impact and mature > > > > the idea step by step. > > > > > > This has happened before, uAPIs do get obsoleted and replaced with > > > more general/better versions. It is often too hard to create a uAPI > > > that lasts for decades when the HW landscape is constantly changing > > > and sometime a reset is needed. > > > > I'm throwing this out with a lot of hesitation, but I'm going to :-) > > > > So we have been disussing this for months now, with some high level vision > > trying to get the uAPI's solidified with a vDPA hardware that might > > potentially have SIOV/SVM like extensions in hardware which actualy doesn't > > exist today. Understood people have plans. > > > Given that vDPA today has diverged already with duplicating use of IOMMU > > api's without making an effort to gravitate to /dev/iommu as how you are > > proposing. > > I see it more like, given that we already know we have multiple users > of IOMMU, adding new IOMMU focused features has to gravitate toward > some kind of convergance. > > Currently things are not so bad, VDPA is just getting started and the > current IOMMU feature set is not so big. > > PASID/vIOMMU/etc/et are all stressing this more, I think the > responsibility falls to the people proposing these features to do the > architecture work. > > > The question is should we hold hostage the current vSVM/vIOMMU efforts > > without even having made an effort for current vDPA/VFIO convergence. > > I don't think it is "held hostage" it is a "no shortcuts" approach, > there was always a recognition that future VDPA drivers will need some > work to integrated with vIOMMU realted stuff. I think we agreed (or agree to disagree and commit) for device types that we have for SIOV, VFIO based approach works well without having to re-invent another way to do the same things. Not looking for a shortcut by any means, but we need to plan around existing hardware though. Looks like vDPA took some shortcuts then to not abstract iommu uAPI instead :-)? When all necessary hardware was available.. This would be a solved puzzle. > > This is no different than the IMS discussion. The first proposed patch > was really simple, but a layering violation. > > The correct solution was some wild 20 patch series modernizing how x86 That was more like 48 patches, not 20 :-). But we had a real device with IMS to model and create these new abstractions and test them against. For vDPA+SVM we have non-intersecting conversations at the moment with no real hardware to model our discussion around. > interrupts works because it had outgrown itself. This general approach > to use the shared MSI infrastructure was pointed out at the very > beginning of IMS, BTW. Agreed, and thankfully Thomas worked hard and made it a lot easier :-). Today IMS only deals with on device store. Although IMS could mean just simply having system memory to hold the interrupt attributes. This is how some of the graphics devices would be with context holding interrupt attributes. But certainly not rushing this since we need a REAL user to be there before we support DEV_MSI that uses msg_addr/msg_data held in system memory. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/ma
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote: > Hi Jason, > > > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > > > I'm sure there will be some > > > > weird overlaps because we can't delete any of the existing VFIO APIs, > > > > but > > > > that > > > > should not be a blocker. > > > > > > but the weird thing is what we should consider. And it perhaps not just > > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > > > container is IOMMU context from the day it was defined. It could be the > > > blocker. :-( > > > > So maybe you have to broaden the VFIO container to be usable by other > > subsystems. The discussion here is about what the uAPI should look > > like in a fairly abstract way. When we say 'dev/sva' it just some > > placeholder for a shared cdev that provides the necessary > > dis-aggregated functionality > > > > It could be an existing cdev with broader functionaltiy, it could > > really be /dev/iommu, etc. This is up to the folks building it to > > decide. > > > > > I'm not expert on vDPA for now, but I saw you three open source > > > veterans have a similar idea for a place to cover IOMMU handling, > > > I think it may be a valuable thing to do. I said "may be" as I'm not > > > sure about Alex's opinion on such idea. But the sure thing is this > > > idea may introduce weird overlap even re-definition of existing > > > thing as I replied above. We need to evaluate the impact and mature > > > the idea step by step. > > > > This has happened before, uAPIs do get obsoleted and replaced with > > more general/better versions. It is often too hard to create a uAPI > > that lasts for decades when the HW landscape is constantly changing > > and sometime a reset is needed. > > I'm throwing this out with a lot of hesitation, but I'm going to :-) > > So we have been disussing this for months now, with some high level vision > trying to get the uAPI's solidified with a vDPA hardware that might > potentially have SIOV/SVM like extensions in hardware which actualy doesn't > exist today. Understood people have plans. > Given that vDPA today has diverged already with duplicating use of IOMMU > api's without making an effort to gravitate to /dev/iommu as how you are > proposing. I see it more like, given that we already know we have multiple users of IOMMU, adding new IOMMU focused features has to gravitate toward some kind of convergance. Currently things are not so bad, VDPA is just getting started and the current IOMMU feature set is not so big. PASID/vIOMMU/etc/et are all stressing this more, I think the responsibility falls to the people proposing these features to do the architecture work. > The question is should we hold hostage the current vSVM/vIOMMU efforts > without even having made an effort for current vDPA/VFIO convergence. I don't think it is "held hostage" it is a "no shortcuts" approach, there was always a recognition that future VDPA drivers will need some work to integrated with vIOMMU realted stuff. This is no different than the IMS discussion. The first proposed patch was really simple, but a layering violation. The correct solution was some wild 20 patch series modernizing how x86 interrupts works because it had outgrown itself. This general approach to use the shared MSI infrastructure was pointed out at the very beginning of IMS, BTW. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > I'm sure there will be some > > > weird overlaps because we can't delete any of the existing VFIO APIs, but > > > that > > > should not be a blocker. > > > > but the weird thing is what we should consider. And it perhaps not just > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > > container is IOMMU context from the day it was defined. It could be the > > blocker. :-( > > So maybe you have to broaden the VFIO container to be usable by other > subsystems. The discussion here is about what the uAPI should look > like in a fairly abstract way. When we say 'dev/sva' it just some > placeholder for a shared cdev that provides the necessary > dis-aggregated functionality > > It could be an existing cdev with broader functionaltiy, it could > really be /dev/iommu, etc. This is up to the folks building it to > decide. > > > I'm not expert on vDPA for now, but I saw you three open source > > veterans have a similar idea for a place to cover IOMMU handling, > > I think it may be a valuable thing to do. I said "may be" as I'm not > > sure about Alex's opinion on such idea. But the sure thing is this > > idea may introduce weird overlap even re-definition of existing > > thing as I replied above. We need to evaluate the impact and mature > > the idea step by step. > > This has happened before, uAPIs do get obsoleted and replaced with > more general/better versions. It is often too hard to create a uAPI > that lasts for decades when the HW landscape is constantly changing > and sometime a reset is needed. I'm throwing this out with a lot of hesitation, but I'm going to :-) So we have been disussing this for months now, with some high level vision trying to get the uAPI's solidified with a vDPA hardware that might potentially have SIOV/SVM like extensions in hardware which actualy doesn't exist today. Understood people have plans. Given that vDPA today has diverged already with duplicating use of IOMMU api's without making an effort to gravitate to /dev/iommu as how you are proposing. I think we all understand creating a permanent uAPI is hard, and they can evolve in future. Maybe we should start work on how to converge on generalizing the IOMMU story first with what we have today (vDPA + VFIO) convergence and let it evolve with real hardware and new features like SVM/SIOV in mind. This is going to take time and we can start with what we have today for pulling vDPA and VFIO pieces first. The question is should we hold hostage the current vSVM/vIOMMU efforts without even having made an effort for current vDPA/VFIO convergence. > > The jump to shared PASID based IOMMU feels like one of those moments here. As we have all noted, even without PASID we have divergence today? > > > > Whoever provides the vIOMMU emulation and relays the page fault to the > > > guest > > > has to translate the RID - > > > > that's the point. But the device info (especially the sub-device info) is > > within the passthru framework (e.g. VFIO). So page fault reporting needs > > to go through passthru framework. > > > > > what does that have to do with VFIO? > > > > > > How will VPDA provide the vIOMMU emulation? > > > > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor > > specification, right? you may correct me if I'm missing anything. > > I'm asking how will VDPA translate the RID when VDPA triggers a page > fault that has to be relayed to the guest. VDPA also has virtual PCI > devices it creates. > > We can't rely on VFIO to be the place that the vIOMMU lives because it > excludes/complicates everything that is not VFIO from using that > stuff. > > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 10:02 PM [...] > > > Whoever provides the vIOMMU emulation and relays the page fault to the > guest > > > has to translate the RID - > > > > that's the point. But the device info (especially the sub-device info) is > > within the passthru framework (e.g. VFIO). So page fault reporting needs > > to go through passthru framework. > > > > > what does that have to do with VFIO? > > > > > > How will VPDA provide the vIOMMU emulation? > > > > a pardon here. I believe vIOMMU emulation should be based on IOMMU > vendor > > specification, right? you may correct me if I'm missing anything. > > I'm asking how will VDPA translate the RID when VDPA triggers a page > fault that has to be relayed to the guest. VDPA also has virtual PCI > devices it creates. I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU or other type vIOMMU. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 10:05 PM > To: Liu, Yi L > > On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote: > > > From: Jason Gunthorpe > > > Sent: Tuesday, October 20, 2020 9:55 PM > > > > > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > > > > > See previous discussion with Kevin. If I understand correctly, > > > > > you expect a > > > shared > > > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > > > > > L2 table sharing is not mandatory. The mapping is the same, but no > > > > need to assume L2 tables are shared. Especially for VFIO/vDPA > > > > devices. Even within a passthru framework, like VFIO, if the > > > > attributes of backend IOMMU are not the same, the L2 page table are not > shared, but the mapping is the same. > > > > > > I think not being able to share the PASID shows exactly why this > > > VFIO centric approach is bad. > > > > no, I didn't say PASID is not sharable. My point is sharing L2 page > > table is not mandatory. > > IMHO a PASID should be 1:1 with a page table, what does it even mean to share > a PASID but have different page tables? PASID is actually 1:1 with an address space. Not really needs to be 1:1 with page table. :-) Regards, Yi Liu > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, October 20, 2020 9:55 PM > > > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > > > See previous discussion with Kevin. If I understand correctly, you > > > > expect a > > shared > > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > > > L2 table sharing is not mandatory. The mapping is the same, but no need to > > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > > > a passthru framework, like VFIO, if the attributes of backend IOMMU are > > > not > > > the same, the L2 page table are not shared, but the mapping is the same. > > > > I think not being able to share the PASID shows exactly why this VFIO > > centric approach is bad. > > no, I didn't say PASID is not sharable. My point is sharing L2 page table is > not mandatory. IMHO a PASID should be 1:1 with a page table, what does it even mean to share a PASID but have different page tables? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > I'm sure there will be some > > weird overlaps because we can't delete any of the existing VFIO APIs, but > > that > > should not be a blocker. > > but the weird thing is what we should consider. And it perhaps not just > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > container is IOMMU context from the day it was defined. It could be the > blocker. :-( So maybe you have to broaden the VFIO container to be usable by other subsystems. The discussion here is about what the uAPI should look like in a fairly abstract way. When we say 'dev/sva' it just some placeholder for a shared cdev that provides the necessary dis-aggregated functionality It could be an existing cdev with broader functionaltiy, it could really be /dev/iommu, etc. This is up to the folks building it to decide. > I'm not expert on vDPA for now, but I saw you three open source > veterans have a similar idea for a place to cover IOMMU handling, > I think it may be a valuable thing to do. I said "may be" as I'm not > sure about Alex's opinion on such idea. But the sure thing is this > idea may introduce weird overlap even re-definition of existing > thing as I replied above. We need to evaluate the impact and mature > the idea step by step. This has happened before, uAPIs do get obsoleted and replaced with more general/better versions. It is often too hard to create a uAPI that lasts for decades when the HW landscape is constantly changing and sometime a reset is needed. The jump to shared PASID based IOMMU feels like one of those moments here. > > Whoever provides the vIOMMU emulation and relays the page fault to the guest > > has to translate the RID - > > that's the point. But the device info (especially the sub-device info) is > within the passthru framework (e.g. VFIO). So page fault reporting needs > to go through passthru framework. > > > what does that have to do with VFIO? > > > > How will VPDA provide the vIOMMU emulation? > > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor > specification, right? you may correct me if I'm missing anything. I'm asking how will VDPA translate the RID when VDPA triggers a page fault that has to be relayed to the guest. VDPA also has virtual PCI devices it creates. We can't rely on VFIO to be the place that the vIOMMU lives because it excludes/complicates everything that is not VFIO from using that stuff. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 9:55 PM > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > See previous discussion with Kevin. If I understand correctly, you expect > > > a > shared > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > L2 table sharing is not mandatory. The mapping is the same, but no need to > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > > a passthru framework, like VFIO, if the attributes of backend IOMMU are not > > the same, the L2 page table are not shared, but the mapping is the same. > > I think not being able to share the PASID shows exactly why this VFIO > centric approach is bad. no, I didn't say PASID is not sharable. My point is sharing L2 page table is not mandatory. Regards, Yi Liu > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > See previous discussion with Kevin. If I understand correctly, you expect a > > shared > > L2 table if vDPA and VFIO device are using the same PASID. > > L2 table sharing is not mandatory. The mapping is the same, but no need to > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > a passthru framework, like VFIO, if the attributes of backend IOMMU are not > the same, the L2 page table are not shared, but the mapping is the same. I think not being able to share the PASID shows exactly why this VFIO centric approach is bad. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Monday, October 19, 2020 10:25 PM > > On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote: > > Hi Jason, > > > > Good to see your response. > > Ah, I was away got it. :-) > > > > > Second, IOMMU nested translation is a per IOMMU domain > > > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > > > (alloc/free domain, attach/detach device, set/get domain > > > > > attribute, etc.), reporting/enabling the nesting capability is > > > > > an natural extension to the domain uAPI of existing passthrough > frameworks. > > > > > Actually, VFIO already includes a nesting enable interface even > > > > > before this series. So it doesn't make sense to generalize this > > > > > uAPI out. > > > > > > The subsystem that obtains an IOMMU domain for a device would have > > > to register it with an open FD of the '/dev/sva'. That is the > > > connection between the two subsystems. It would be some simple > > > kernel internal > > > stuff: > > > > > > sva = get_sva_from_file(fd); > > > > Is this fd provided by userspace? I suppose the /dev/sva has a set of > > uAPIs which will finally program page table to host iommu driver. As > > far as I know, it's weird for VFIO user. Why should VFIO user connect > > to a /dev/sva fd after it sets a proper iommu type to the opened > > container. VFIO container already stands for an iommu context with > > which userspace could program page mapping to host iommu. > > Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it > can > be shared between more subsystems that need it. I understand you here. :-) > I'm sure there will be some > weird overlaps because we can't delete any of the existing VFIO APIs, but > that > should not be a blocker. but the weird thing is what we should consider. And it perhaps not just overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO container is IOMMU context from the day it was defined. It could be the blocker. :-( > Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is > a possible path. This looks to be similar with the proposal from Jason Wang and Kevin Tian. It is an idea to add "/dev/iommu" and delegate the IOMMU domain alloc, device attach/detach which is no in passthru framework to an independent kernel driver. Just as Jason Wang said replace vfio iommu type1 driver. Jason Wang: "And all the proposal in this series is to reuse the container fd. It should be possible to replace e.g type1 IOMMU with a unified module." link: https://lore.kernel.org/kvm/20201019142526.gj6...@nvidia.com/T/#md49fe9ac9d9eff6ddf5b8c2ee2f27eb2766f66f3 Kevin Tian: "Based on above, I feel a more reasonable way is to first make a /dev/iommu uAPI supporting DMA map/unmap usages and then introduce vSVA to it. Doing this order is because DMA map/unmap is widely used thus can better help verify the core logic with many existing devices." link: https://lore.kernel.org/kvm/mwhpr11mb1645c702d148a2852b41fca08c...@mwhpr11mb1645.namprd11.prod.outlook.com/ > > If your plan is to just opencode everything into VFIO then I don't > see how VDPA will work well, and if proper in-kernel abstractions are built I > fail to see how > routing some of it through userspace is a fundamental problem. I'm not expert on vDPA for now, but I saw you three open source veterans have a similar idea for a place to cover IOMMU handling, I think it may be a valuable thing to do. I said "may be" as I'm not sure about Alex's opinion on such idea. But the sure thing is this idea may introduce weird overlap even re-definition of existing thing as I replied above. We need to evaluate the impact and mature the idea step by step. That means it would take time, so perhaps we may do it in a staging way. First having a "/dev/iommu" be ready to handle page MAP/UNMAP which can be used by both VFIO and vDPA, mean- while let VFIO grow up (adding features) by itself and consider to adopt the new /dev/iommu later once /dev/iommu is competent. Of course this needs Alex's approval. And then adding new features to /dev/iommu, like SVA. > > > > sva_register_device_to_pasid(sva, pasid, pci_device, > > > iommu_domain); > > > > So this is supposed to be called by VFIO/VDPA to register the info to > > /dev/sva. > > right? And in dev/sva, it will also maintain the device/iommu_domain > > and pasid info? will it be duplicated with VFIO/VDPA? > > Each part needs to have the information it needs? yeah, but it's the duplication which I'm not very much in. Perhaps the idea from Jason Wang and Kevin would avoid such duplication. > > > > > Moreover, mapping page fault to subdevice requires pre- > > > > > registering subdevice fault data to IOMMU layer when binding > > > > > guest page table, while such fault data can be only retrieved > > > > > from parent driver through VFIO/VDPA. > > > > > > Not sure what this means, page fault should be tied to the PASID, > > > any hookup needed for that
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Tuesday, October 20, 2020 5:20 PM > > Hi Yi: > > On 2020/10/20 ??4:19, Liu, Yi L wrote: > >> Yes, but since PASID is a global identifier now, I think kernel > >> should track the a device list per PASID? > > We have such track. It's done in iommu driver. You can refer to the > > struct intel_svm. PASID is a global identifier, but it doesn’t affect > > that the PASID table is per-device. > > > >> So for such binding, PASID should be > >> sufficient for uAPI. > > not quite get it. PASID may be bound to multiple devices, how do you > > figure out the target device if you don’t provide such info. > > > I may miss soemthing but is there any reason that userspace need to figure out > the target device? PASID is about address space not a specific device I think. If you have multiple devices assigned to a VM, you won't expect to bind all of them to a PASID in a single bind operation, right? you may want to bind only the devices you really mean. This manner should be more flexible and reasonable. :-) > > > > > The binding request is initiated by the virtual IOMMU, when > > capturing guest attempt of binding page table to a virtual PASID > > entry for a given device. > And for L2 page table programming, if PASID is use by both e.g VFIO > and vDPA, user need to choose one of uAPI to build l2 mappings? > >>> for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I > >>> guess it is tlb flush. so you are right. Keeping L1/L2 page table > >>> management in a single uAPI set is also a reason for my current > >>> series which extends VFIO for L1 management. > >> I'm afraid that would introduce confusing to userspace. E.g: > >> > >> 1) when having only vDPA device, it uses vDPA uAPI to do l2 > >> management > >> 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the > >> l2 management? > > I think vDPA will still use its own l2 for the l2 mappings. not sure > > why you need vDPA use VFIO's l2 management. I don't think it is the case. > > > See previous discussion with Kevin. If I understand correctly, you expect a > shared > L2 table if vDPA and VFIO device are using the same PASID. L2 table sharing is not mandatory. The mapping is the same, but no need to assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within a passthru framework, like VFIO, if the attributes of backend IOMMU are not the same, the L2 page table are not shared, but the mapping is the same. > In this case, if l2 is still managed separately, there will be duplicated > request of > map and unmap. yes, but this is not a functional issue, right? If we want to solve it, we should have a single uAPI set which can handle both L1 and L2 management. That's also why you proposed to replace type1 driver. right? Regards, Yi Liu > > Thanks > > > > > > Regards, > > Yi Liu > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Yi: On 2020/10/20 下午4:19, Liu, Yi L wrote: Yes, but since PASID is a global identifier now, I think kernel should track the a device list per PASID? We have such track. It's done in iommu driver. You can refer to the struct intel_svm. PASID is a global identifier, but it doesn’t affect that the PASID table is per-device. So for such binding, PASID should be sufficient for uAPI. not quite get it. PASID may be bound to multiple devices, how do you figure out the target device if you don’t provide such info. I may miss soemthing but is there any reason that userspace need to figure out the target device? PASID is about address space not a specific device I think. The binding request is initiated by the virtual IOMMU, when capturing guest attempt of binding page table to a virtual PASID entry for a given device. And for L2 page table programming, if PASID is use by both e.g VFIO and vDPA, user need to choose one of uAPI to build l2 mappings? for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I guess it is tlb flush. so you are right. Keeping L1/L2 page table management in a single uAPI set is also a reason for my current series which extends VFIO for L1 management. I'm afraid that would introduce confusing to userspace. E.g: 1) when having only vDPA device, it uses vDPA uAPI to do l2 management 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the l2 management? I think vDPA will still use its own l2 for the l2 mappings. not sure why you need vDPA use VFIO's l2 management. I don't think it is the case. See previous discussion with Kevin. If I understand correctly, you expect a shared L2 table if vDPA and VFIO device are using the same PASID. In this case, if l2 is still managed separately, there will be duplicated request of map and unmap. Thanks Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hey Jason, > From: Jason Wang > Sent: Tuesday, October 20, 2020 2:18 PM > > On 2020/10/15 ??6:14, Liu, Yi L wrote: > >> From: Jason Wang > >> Sent: Thursday, October 15, 2020 4:41 PM > >> > >> > >> On 2020/10/15 ??3:58, Tian, Kevin wrote: > From: Jason Wang > Sent: Thursday, October 15, 2020 2:52 PM > > > On 2020/10/14 ??11:08, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Tuesday, October 13, 2020 2:22 PM > >> > >> > >> On 2020/10/12 ??4:38, Tian, Kevin wrote: > From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > > >>> [...] > >>> > If it's possible, I would suggest a generic uAPI instead of > >>> a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of > other subsystems that could benefit from this (e.g vDPA). > > Have you ever considered this approach? > > >>> Hi, Jason, > >>> > >>> We did some study on this approach and below is the output. It's a > >>> long writing but I didn't find a way to further abstract w/o > >>> losing necessary context. Sorry about that. > >>> > >>> Overall the real purpose of this series is to enable IOMMU nested > >>> translation capability with vSVA as one major usage, through below > >>> new uAPIs: > >>> 1) Report/enable IOMMU nested translation capability; > >>> 2) Allocate/free PASID; > >>> 3) Bind/unbind guest page table; > >>> 4) Invalidate IOMMU cache; > >>> 5) Handle IOMMU page request/response (not in this series); > >>> 1/3/4) is the minimal set for using IOMMU nested translation, with > >>> the other two optional. For example, the guest may enable vSVA on > >>> a device without using PASID. Or, it may bind its gIOVA page table > >>> which doesn't require page fault support. Finally, all operations > >>> can be applied to either physical device or subdevice. > >>> > >>> Then we evaluated each uAPI whether generalizing it is a good > >>> thing both in concept and regarding to complexity. > >>> > >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID > >>> allocation/free is through the IOASID sub-system. > >> A question here, is IOASID expected to be the single management > >> interface for PASID? > > yes > > > >> (I'm asking since there're already vendor specific IDA based PASID > >> allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > > changed to use the new generic interface. Jacob/Jean can better > > comment if other reason exists for this exception. > If there's no exception it should be fixed. > > > >>> From this angle > >>> we feel generalizing PASID management does make some sense. > >>> First, PASID is just a number and not related to any device before > >>> it's bound to a page table and IOMMU domain. Second, PASID is a > >>> global resource (at least on Intel VT-d), > >> I think we need a definition of "global" here. It looks to me for > >> vt-d the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > > in concept. > I think that's the requirement of PCIE spec which said PASID + RID > identifies the process address space ID. > > > > However on Intel platform we require PASIDs to be managed in > > system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > > and ENQCMD together. > Any reason for such requirement? (I'm not familiar with ENQCMD, but > my understanding is that vSVA, SIOV or SR-IOV doesn't have the > requirement for system-wide PASID). > >>> ENQCMD is a new instruction to allow multiple processes submitting > >>> workload to one shared workqueue. Each process has an unique PASID > >>> saved in a MSR, which is included in the ENQCMD payload to indicate > >>> the address space when the CPU sends to the device. As one process > >>> might issue ENQCMD to multiple devices, OS-wide PASID allocation is > >>> required both in host and guest side. > >>> > >>> When executing ENQCMD in the guest to a SIOV device, the guest > >>> programmed value in the PASID_MSR must be translated to a host PASID > >>> value for proper function/isolation as PASID represents the address > >>> space. The translation is done through a new VMCS PASID translation > >>> structure (per-VM, and 1:1 mapping). From this angle the host PASIDs > >>> must be allocated 'globally' cross all assigned devices otherwise it > >>> may lead to 1:N mapping when a guest process issues ENQCMD to multiple > >>> assigned devices/subdevices. > >>> > >>> There will be a KVM forum session for this topic btw. > >> > >> Thanks for the background. Now I see the restrict comes from ENQCMD. > >> > >> >
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 下午6:14, Liu, Yi L wrote: From: Jason Wang Sent: Thursday, October 15, 2020 4:41 PM On 2020/10/15 ??3:58, Tian, Kevin wrote: From: Jason Wang Sent: Thursday, October 15, 2020 2:52 PM On 2020/10/14 ??11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 ??4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. Thanks for the background. Now I see the restrict comes from ENQCMD. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote: > Hi Jason, > > Good to see your response. Ah, I was away > > > > Second, IOMMU nested translation is a per IOMMU domain > > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > > (alloc/free domain, attach/detach device, set/get domain attribute, > > > > etc.), reporting/enabling the nesting capability is an natural > > > > extension to the domain uAPI of existing passthrough frameworks. > > > > Actually, VFIO already includes a nesting enable interface even > > > > before this series. So it doesn't make sense to generalize this uAPI > > > > out. > > > > The subsystem that obtains an IOMMU domain for a device would have to > > register it with an open FD of the '/dev/sva'. That is the connection > > between the two subsystems. It would be some simple kernel internal > > stuff: > > > > sva = get_sva_from_file(fd); > > Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs > which will finally program page table to host iommu driver. As far as I know, > it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after > it sets a proper iommu type to the opened container. VFIO container already > stands for an iommu context with which userspace could program page mapping > to host iommu. Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it can be shared between more subsystems that need it. I'm sure there will be some weird overlaps because we can't delete any of the existing VFIO APIs, but that should not be a blocker. Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is a possible path. If your plan is to just opencode everything into VFIO then I don't see how VDPA will work well, and if proper in-kernel abstractions are built I fail to see how routing some of it through userspace is a fundamental problem. > > sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); > > So this is supposed to be called by VFIO/VDPA to register the info to > /dev/sva. > right? And in dev/sva, it will also maintain the device/iommu_domain and pasid > info? will it be duplicated with VFIO/VDPA? Each part needs to have the information it needs? > > > > Moreover, mapping page fault to subdevice requires pre- > > > > registering subdevice fault data to IOMMU layer when binding > > > > guest page table, while such fault data can be only retrieved from > > > > parent driver through VFIO/VDPA. > > > > Not sure what this means, page fault should be tied to the PASID, any > > hookup needed for that should be done in-kernel when the device is > > connected to the PASID. > > you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to > software together with the requestor id of the device. For the page request > injects to guest, it should have the device info. Whoever provides the vIOMMU emulation and relays the page fault to the guest has to translate the RID - what does that have to do with VFIO? How will VPDA provide the vIOMMU emulation? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, Good to see your response. > From: Jason Gunthorpe > Sent: Friday, October 16, 2020 11:37 PM > > On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote: > > Hi, Alex and Jason (G), > > > > How about your opinion for this new proposal? For now looks both > > Jason (W) and Jean are OK with this direction and more discussions > > are possibly required for the new /dev/ioasid interface. Internally > > we're doing a quick prototype to see any unforeseen issue with this > > separation. > > Assuming VDPA and VFIO will be the only two users so duplicating > everything only twice sounds pretty restricting to me. > > > > Second, IOMMU nested translation is a per IOMMU domain > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > (alloc/free domain, attach/detach device, set/get domain attribute, > > > etc.), reporting/enabling the nesting capability is an natural > > > extension to the domain uAPI of existing passthrough frameworks. > > > Actually, VFIO already includes a nesting enable interface even > > > before this series. So it doesn't make sense to generalize this uAPI > > > out. > > The subsystem that obtains an IOMMU domain for a device would have to > register it with an open FD of the '/dev/sva'. That is the connection > between the two subsystems. It would be some simple kernel internal > stuff: > > sva = get_sva_from_file(fd); Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs which will finally program page table to host iommu driver. As far as I know, it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after it sets a proper iommu type to the opened container. VFIO container already stands for an iommu context with which userspace could program page mapping to host iommu. > sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva. right? And in dev/sva, it will also maintain the device/iommu_domain and pasid info? will it be duplicated with VFIO/VDPA? > Not sure why this is a roadblock? > > How would this be any different from having some kernel libsva that > VDPA and VFIO would both rely on? > > You don't plan to just open code all this stuff in VFIO, do you? > > > > Then the tricky part comes with the remaining operations (3/4/5), > > > which are all backed by iommu_ops thus effective only within an > > > IOMMU domain. To generalize them, the first thing is to find a way > > > to associate the sva_FD (opened through generic /dev/sva) with an > > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > > to replicate {domain<->device/subdevice} association in /dev/sva > > > path because some operations (e.g. page fault) is triggered/handled > > > per device/subdevice. Therefore, /dev/sva must provide both per- > > > domain and per-device uAPIs similar to what VFIO/VDPA already > > > does. > > Yes, the point here was to move the general APIs out of VFIO and into > a sharable location. So, of course one would expect some duplication > during the transition period. > > > > Moreover, mapping page fault to subdevice requires pre- > > > registering subdevice fault data to IOMMU layer when binding > > > guest page table, while such fault data can be only retrieved from > > > parent driver through VFIO/VDPA. > > Not sure what this means, page fault should be tied to the PASID, any > hookup needed for that should be done in-kernel when the device is > connected to the PASID. you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to software together with the requestor id of the device. For the page request injects to guest, it should have the device info. Regards, Yi Liu > > > > space but they may be organized in multiple IOMMU domains based > > > on their bus type. How (should we let) the userspace know the > > > domain information and open an sva_FD for each domain is the main > > > problem here. > > Why is one sva_FD per iommu domain required? The HW can attach the > same PASID to multiple iommu domains, right? > > > > In the end we just realized that doing such generalization doesn't > > > really lead to a clear design and instead requires tight coordination > > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > > (especially about synchronization when the domain/device > > > association is changed or when the device/subdevice is being reset/ > > > drained). Finally it may become a usability burden to the userspace > > > on proper use of the two interfaces on the assigned device. > > If you have a list of things that needs to be done to attach a PCI > device to a PASID then of course they should be tidy kernel APIs > already, and not just hard wired into VFIO. > > The worst outcome would be to have VDPA and VFIO have to different > ways to do all of this with a different set of bugs. Bug fixes/new > features in VFIO won't flow over to VDPA. > > Jason ___
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote: > Hi, Alex and Jason (G), > > How about your opinion for this new proposal? For now looks both > Jason (W) and Jean are OK with this direction and more discussions > are possibly required for the new /dev/ioasid interface. Internally > we're doing a quick prototype to see any unforeseen issue with this > separation. Assuming VDPA and VFIO will be the only two users so duplicating everything only twice sounds pretty restricting to me. > > Second, IOMMU nested translation is a per IOMMU domain > > capability. Since IOMMU domains are managed by VFIO/VDPA > > (alloc/free domain, attach/detach device, set/get domain attribute, > > etc.), reporting/enabling the nesting capability is an natural > > extension to the domain uAPI of existing passthrough frameworks. > > Actually, VFIO already includes a nesting enable interface even > > before this series. So it doesn't make sense to generalize this uAPI > > out. The subsystem that obtains an IOMMU domain for a device would have to register it with an open FD of the '/dev/sva'. That is the connection between the two subsystems. It would be some simple kernel internal stuff: sva = get_sva_from_file(fd); sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); Not sure why this is a roadblock? How would this be any different from having some kernel libsva that VDPA and VFIO would both rely on? You don't plan to just open code all this stuff in VFIO, do you? > > Then the tricky part comes with the remaining operations (3/4/5), > > which are all backed by iommu_ops thus effective only within an > > IOMMU domain. To generalize them, the first thing is to find a way > > to associate the sva_FD (opened through generic /dev/sva) with an > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > to replicate {domain<->device/subdevice} association in /dev/sva > > path because some operations (e.g. page fault) is triggered/handled > > per device/subdevice. Therefore, /dev/sva must provide both per- > > domain and per-device uAPIs similar to what VFIO/VDPA already > > does. Yes, the point here was to move the general APIs out of VFIO and into a sharable location. So, of course one would expect some duplication during the transition period. > > Moreover, mapping page fault to subdevice requires pre- > > registering subdevice fault data to IOMMU layer when binding > > guest page table, while such fault data can be only retrieved from > > parent driver through VFIO/VDPA. Not sure what this means, page fault should be tied to the PASID, any hookup needed for that should be done in-kernel when the device is connected to the PASID. > > space but they may be organized in multiple IOMMU domains based > > on their bus type. How (should we let) the userspace know the > > domain information and open an sva_FD for each domain is the main > > problem here. Why is one sva_FD per iommu domain required? The HW can attach the same PASID to multiple iommu domains, right? > > In the end we just realized that doing such generalization doesn't > > really lead to a clear design and instead requires tight coordination > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > (especially about synchronization when the domain/device > > association is changed or when the device/subdevice is being reset/ > > drained). Finally it may become a usability burden to the userspace > > on proper use of the two interfaces on the assigned device. If you have a list of things that needs to be done to attach a PCI device to a PASID then of course they should be tidy kernel APIs already, and not just hard wired into VFIO. The worst outcome would be to have VDPA and VFIO have to different ways to do all of this with a different set of bugs. Bug fixes/new features in VFIO won't flow over to VDPA. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Thursday, October 15, 2020 4:41 PM > > > On 2020/10/15 ??3:58, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Thursday, October 15, 2020 2:52 PM > >> > >> > >> On 2020/10/14 ??11:08, Tian, Kevin wrote: > From: Jason Wang > Sent: Tuesday, October 13, 2020 2:22 PM > > > On 2020/10/12 ??4:38, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Monday, September 14, 2020 12:20 PM > >> > > [...] > > > If it's possible, I would suggest a generic uAPI instead of > > a VFIO > >> specific one. > >> > >> Jason suggest something like /dev/sva. There will be a lot of > >> other subsystems that could benefit from this (e.g vDPA). > >> > >> Have you ever considered this approach? > >> > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o > > losing necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through below > > new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations > > can be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good > > thing both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. > A question here, is IOASID expected to be the single management > interface for PASID? > >>> yes > >>> > (I'm asking since there're already vendor specific IDA based PASID > allocator e.g amdgpu_pasid_alloc()) > >>> That comes before IOASID core was introduced. I think it should be > >>> changed to use the new generic interface. Jacob/Jean can better > >>> comment if other reason exists for this exception. > >> > >> If there's no exception it should be fixed. > >> > >> > > From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), > I think we need a definition of "global" here. It looks to me for > vt-d the PASID table is per device. > >>> PASID table is per device, thus VT-d could support per-device PASIDs > >>> in concept. > >> > >> I think that's the requirement of PCIE spec which said PASID + RID > >> identifies the process address space ID. > >> > >> > >>>However on Intel platform we require PASIDs to be managed in > >>> system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > >>> and ENQCMD together. > >> > >> Any reason for such requirement? (I'm not familiar with ENQCMD, but > >> my understanding is that vSVA, SIOV or SR-IOV doesn't have the > >> requirement for system-wide PASID). > > ENQCMD is a new instruction to allow multiple processes submitting > > workload to one shared workqueue. Each process has an unique PASID > > saved in a MSR, which is included in the ENQCMD payload to indicate > > the address space when the CPU sends to the device. As one process > > might issue ENQCMD to multiple devices, OS-wide PASID allocation is > > required both in host and guest side. > > > > When executing ENQCMD in the guest to a SIOV device, the guest > > programmed value in the PASID_MSR must be translated to a host PASID > > value for proper function/isolation as PASID represents the address > > space. The translation is done through a new VMCS PASID translation > > structure (per-VM, and 1:1 mapping). From this angle the host PASIDs > > must be allocated 'globally' cross all assigned devices otherwise it > > may lead to 1:N mapping when a guest process issues ENQCMD to multiple > > assigned devices/subdevices. > > > > There will be a KVM forum session for this topic btw. > > > Thanks for the background. Now I see the restrict comes from ENQCMD. > > > > > >> > >>> Thus the host creates only one 'global' PASID namespace but do use > >>> per-device PASID table to assure isolation between devices on Intel > >>> platforms. But ARM does it differently as Jean explained. > >>> They have a global namespace for host processes on all host-owned > >>> devices (s
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 下午3:58, Tian, Kevin wrote: From: Jason Wang Sent: Thursday, October 15, 2020 2:52 PM On 2020/10/14 上午11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. Thanks for the background. Now I see the restrict comes from ENQCMD. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, w
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Thursday, October 15, 2020 2:52 PM > > > On 2020/10/14 上午11:08, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Tuesday, October 13, 2020 2:22 PM > >> > >> > >> On 2020/10/12 下午4:38, Tian, Kevin wrote: > From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > > >>> [...] > >>>> If it's possible, I would suggest a generic uAPI instead of a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of other > subsystems that could benefit from this (e.g vDPA). > > Have you ever considered this approach? > > >>> Hi, Jason, > >>> > >>> We did some study on this approach and below is the output. It's a > >>> long writing but I didn't find a way to further abstract w/o losing > >>> necessary context. Sorry about that. > >>> > >>> Overall the real purpose of this series is to enable IOMMU nested > >>> translation capability with vSVA as one major usage, through > >>> below new uAPIs: > >>> 1) Report/enable IOMMU nested translation capability; > >>> 2) Allocate/free PASID; > >>> 3) Bind/unbind guest page table; > >>> 4) Invalidate IOMMU cache; > >>> 5) Handle IOMMU page request/response (not in this series); > >>> 1/3/4) is the minimal set for using IOMMU nested translation, with > >>> the other two optional. For example, the guest may enable vSVA on > >>> a device without using PASID. Or, it may bind its gIOVA page table > >>> which doesn't require page fault support. Finally, all operations can > >>> be applied to either physical device or subdevice. > >>> > >>> Then we evaluated each uAPI whether generalizing it is a good thing > >>> both in concept and regarding to complexity. > >>> > >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID > >>> allocation/free is through the IOASID sub-system. > >> > >> A question here, is IOASID expected to be the single management > >> interface for PASID? > > yes > > > >> (I'm asking since there're already vendor specific IDA based PASID > >> allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > > changed to use the new generic interface. Jacob/Jean can better > > comment if other reason exists for this exception. > > > If there's no exception it should be fixed. > > > > > >> > >>>From this angle > >>> we feel generalizing PASID management does make some sense. > >>> First, PASID is just a number and not related to any device before > >>> it's bound to a page table and IOMMU domain. Second, PASID is a > >>> global resource (at least on Intel VT-d), > >> > >> I think we need a definition of "global" here. It looks to me for vt-d > >> the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > > in concept. > > > I think that's the requirement of PCIE spec which said PASID + RID > identifies the process address space ID. > > > > However on Intel platform we require PASIDs to be managed > > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > > and ENQCMD together. > > > Any reason for such requirement? (I'm not familiar with ENQCMD, but my > understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement > for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. > > > > Thus the host creates only one 'global' PASID > > namespace but do use per-device PASID table to assure isolation between > > devices on Intel platforms. But ARM does it differently as Jean explained. > > They have a global namespace for host processes on all host-owned > > devices (same as Intel), but then per-device namespace when a device > > (and its PASID table) is assigned to userspace. > > > >> Another question, is this possible to have two DMAR hardware unit(at > >> least I can see two even in my laptop). In this case, is PASID still a > >> global resource? > > yes > > > >> > >>>while having separate VFIO/ > >>> VDPA allocation interfaces may easily cause confusion in userspace, > >>> e.g. which interface to be u
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 上午7:10, Alex Williamson wrote: On Wed, 14 Oct 2020 03:08:31 + "Tian, Kevin" wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be easily introduced in the 'set' level. I'm not sure how such requirement can be unified w/o involving passthrough frameworks, or whether ARM could also switch to global PASID style... Second, IOMMU nested translation is a per IOMMU domain capability. Since IO
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/14 上午11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, 14 Oct 2020 03:08:31 + "Tian, Kevin" wrote: > > From: Jason Wang > > Sent: Tuesday, October 13, 2020 2:22 PM > > > > > > On 2020/10/12 下午4:38, Tian, Kevin wrote: > > >> From: Jason Wang > > >> Sent: Monday, September 14, 2020 12:20 PM > > >> > > > [...] > > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > >> specific one. > > >> > > >> Jason suggest something like /dev/sva. There will be a lot of other > > >> subsystems that could benefit from this (e.g vDPA). > > >> > > >> Have you ever considered this approach? > > >> > > > Hi, Jason, > > > > > > We did some study on this approach and below is the output. It's a > > > long writing but I didn't find a way to further abstract w/o losing > > > necessary context. Sorry about that. > > > > > > Overall the real purpose of this series is to enable IOMMU nested > > > translation capability with vSVA as one major usage, through > > > below new uAPIs: > > > 1) Report/enable IOMMU nested translation capability; > > > 2) Allocate/free PASID; > > > 3) Bind/unbind guest page table; > > > 4) Invalidate IOMMU cache; > > > 5) Handle IOMMU page request/response (not in this series); > > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > > the other two optional. For example, the guest may enable vSVA on > > > a device without using PASID. Or, it may bind its gIOVA page table > > > which doesn't require page fault support. Finally, all operations can > > > be applied to either physical device or subdevice. > > > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > > both in concept and regarding to complexity. > > > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > > allocation/free is through the IOASID sub-system. > > > > > > A question here, is IOASID expected to be the single management > > interface for PASID? > > yes > > > > > (I'm asking since there're already vendor specific IDA based PASID > > allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > changed to use the new generic interface. Jacob/Jean can better > comment if other reason exists for this exception. > > > > > > > > From this angle > > > we feel generalizing PASID management does make some sense. > > > First, PASID is just a number and not related to any device before > > > it's bound to a page table and IOMMU domain. Second, PASID is a > > > global resource (at least on Intel VT-d), > > > > > > I think we need a definition of "global" here. It looks to me for vt-d > > the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > in concept. However on Intel platform we require PASIDs to be managed > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > and ENQCMD together. Thus the host creates only one 'global' PASID > namespace but do use per-device PASID table to assure isolation between > devices on Intel platforms. But ARM does it differently as Jean explained. > They have a global namespace for host processes on all host-owned > devices (same as Intel), but then per-device namespace when a device > (and its PASID table) is assigned to userspace. > > > > > Another question, is this possible to have two DMAR hardware unit(at > > least I can see two even in my laptop). In this case, is PASID still a > > global resource? > > yes > > > > > > > > while having separate VFIO/ > > > VDPA allocation interfaces may easily cause confusion in userspace, > > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > > Moreover, an unified interface allows centralized control over how > > > many PASIDs are allowed per process. > > > > > > Yes. > > > > > > > > > > One unclear part with this generalization is about the permission. > > > Do we open this interface to any process or only to those which > > > have assigned devices? If the latter, what would be the mechanism > > > to coordinate between this new interface and specific passthrough > > > frameworks? > > > > > > I'm not sure, but if you just want a permission, you probably can > > introduce new capability (CAP_XXX) for this. > > > > > > > A more tricky case, vSVA support on ARM (Eric/Jean > > > please correct me) plans to do per-device PASID namespace which > > > is built on a bind_pasid_table iommu callback to allow guest fully > > > manage its PASIDs on a given passthrough device. > > > > > > I see, so I think the answer is to prepare for the namespace support > > from the start. (btw, I don't see how namespace is handled in current > > IOASID module?) > > The PASID table is based on GPA when nested translation is enabled > on ARM SMMU. This design implies that the guest manages PASID > table thus PASIDs instead of going through host-side API on assigned > device. From this angle we don't need explicit namespace in the
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi, Alex and Jason (G), How about your opinion for this new proposal? For now looks both Jason (W) and Jean are OK with this direction and more discussions are possibly required for the new /dev/ioasid interface. Internally we're doing a quick prototype to see any unforeseen issue with this separation. Please let us know your thoughts. Thanks Kevin > From: Tian, Kevin > Sent: Monday, October 12, 2020 4:39 PM > > > From: Jason Wang > > Sent: Monday, September 14, 2020 12:20 PM > > > [...] > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > specific one. > > > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). > > > > Have you ever considered this approach? > > > > Hi, Jason, > > We did some study on this approach and below is the output. It's a > long writing but I didn't find a way to further abstract w/o losing > necessary context. Sorry about that. > > Overall the real purpose of this series is to enable IOMMU nested > translation capability with vSVA as one major usage, through > below new uAPIs: > 1) Report/enable IOMMU nested translation capability; > 2) Allocate/free PASID; > 3) Bind/unbind guest page table; > 4) Invalidate IOMMU cache; > 5) Handle IOMMU page request/response (not in this series); > 1/3/4) is the minimal set for using IOMMU nested translation, with > the other two optional. For example, the guest may enable vSVA on > a device without using PASID. Or, it may bind its gIOVA page table > which doesn't require page fault support. Finally, all operations can > be applied to either physical device or subdevice. > > Then we evaluated each uAPI whether generalizing it is a good thing > both in concept and regarding to complexity. > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > allocation/free is through the IOASID sub-system. From this angle > we feel generalizing PASID management does make some sense. > First, PASID is just a number and not related to any device before > it's bound to a page table and IOMMU domain. Second, PASID is a > global resource (at least on Intel VT-d), while having separate VFIO/ > VDPA allocation interfaces may easily cause confusion in userspace, > e.g. which interface to be used if both VFIO/VDPA devices exist. > Moreover, an unified interface allows centralized control over how > many PASIDs are allowed per process. > > One unclear part with this generalization is about the permission. > Do we open this interface to any process or only to those which > have assigned devices? If the latter, what would be the mechanism > to coordinate between this new interface and specific passthrough > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > please correct me) plans to do per-device PASID namespace which > is built on a bind_pasid_table iommu callback to allow guest fully > manage its PASIDs on a given passthrough device. I'm not sure > how such requirement can be unified w/o involving passthrough > frameworks, or whether ARM could also switch to global PASID > style... > > Second, IOMMU nested translation is a per IOMMU domain > capability. Since IOMMU domains are managed by VFIO/VDPA > (alloc/free domain, attach/detach device, set/get domain attribute, > etc.), reporting/enabling the nesting capability is an natural > extension to the domain uAPI of existing passthrough frameworks. > Actually, VFIO already includes a nesting enable interface even > before this series. So it doesn't make sense to generalize this uAPI > out. > > Then the tricky part comes with the remaining operations (3/4/5), > which are all backed by iommu_ops thus effective only within an > IOMMU domain. To generalize them, the first thing is to find a way > to associate the sva_FD (opened through generic /dev/sva) with an > IOMMU domain that is created by VFIO/VDPA. The second thing is > to replicate {domain<->device/subdevice} association in /dev/sva > path because some operations (e.g. page fault) is triggered/handled > per device/subdevice. Therefore, /dev/sva must provide both per- > domain and per-device uAPIs similar to what VFIO/VDPA already > does. Moreover, mapping page fault to subdevice requires pre- > registering subdevice fault data to IOMMU layer when binding > guest page table, while such fault data can be only retrieved from > parent driver through VFIO/VDPA. > > However, we failed to find a good way even at the 1st step about > domain association. The iommu domains are not exposed to the > userspace, and there is no 1:1 mapping between domain and device. > In VFIO, all devices within the same VFIO container share the address > space but they may be organized in multiple IOMMU domains based > on their bus type. How (should we let) the userspace know the > domain information and open an sva_FD for each domain is the main > problem here. > > In the end we just realized that doing such generalization
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Tuesday, October 13, 2020 2:22 PM > > > On 2020/10/12 下午4:38, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Monday, September 14, 2020 12:20 PM > >> > > [...] > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > >> specific one. > >> > >> Jason suggest something like /dev/sva. There will be a lot of other > >> subsystems that could benefit from this (e.g vDPA). > >> > >> Have you ever considered this approach? > >> > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o losing > > necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through > > below new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations can > > be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. > > > A question here, is IOASID expected to be the single management > interface for PASID? yes > > (I'm asking since there're already vendor specific IDA based PASID > allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. > > > > From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), > > > I think we need a definition of "global" here. It looks to me for vt-d > the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. > > Another question, is this possible to have two DMAR hardware unit(at > least I can see two even in my laptop). In this case, is PASID still a > global resource? yes > > > > while having separate VFIO/ > > VDPA allocation interfaces may easily cause confusion in userspace, > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > Moreover, an unified interface allows centralized control over how > > many PASIDs are allowed per process. > > > Yes. > > > > > > One unclear part with this generalization is about the permission. > > Do we open this interface to any process or only to those which > > have assigned devices? If the latter, what would be the mechanism > > to coordinate between this new interface and specific passthrough > > frameworks? > > > I'm not sure, but if you just want a permission, you probably can > introduce new capability (CAP_XXX) for this. > > > > A more tricky case, vSVA support on ARM (Eric/Jean > > please correct me) plans to do per-device PASID namespace which > > is built on a bind_pasid_table iommu callback to allow guest fully > > manage its PASIDs on a given passthrough device. > > > I see, so I think the answer is to prepare for the namespace support > from the start. (btw, I don't see how namespace is handled in current > IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be easily introduced in the 'set' level. > > > > I'm not sure > > how such requirement can be un
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jean-Philippe Brucker > Sent: Tuesday, October 13, 2020 6:28 PM > > On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > > From: Jason Wang > > > Sent: Monday, September 14, 2020 12:20 PM > > > > > [...] > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > > specific one. > > > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > subsystems that could benefit from this (e.g vDPA). > > > > > > Have you ever considered this approach? > > > > > > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o losing > > necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through > > below new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations can > > be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), while having separate VFIO/ > > VDPA allocation interfaces may easily cause confusion in userspace, > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > Moreover, an unified interface allows centralized control over how > > many PASIDs are allowed per process. > > > > One unclear part with this generalization is about the permission. > > Do we open this interface to any process or only to those which > > have assigned devices? If the latter, what would be the mechanism > > to coordinate between this new interface and specific passthrough > > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > > please correct me) plans to do per-device PASID namespace which > > is built on a bind_pasid_table iommu callback to allow guest fully > > manage its PASIDs on a given passthrough device. > > Yes we need a bind_pasid_table. The guest needs to allocate the PASID > tables because they are accessed via guest-physical addresses by the HW > SMMU. > > With bind_pasid_table, the invalidation message also requires a scope to > invalidate a whole PASID context, in addition to invalidating a mappings > ranges. > > > I'm not sure > > how such requirement can be unified w/o involving passthrough > > frameworks, or whether ARM could also switch to global PASID > > style... > > Not planned at the moment, sorry. It requires a PV IOMMU to do PASID > allocation, which is possible with virtio-iommu but not with a vSMMU > emulation. The VM will manage its own PASID space. The upside is that we > don't need userspace access to IOASID, so I won't pester you with comments > on that part of the API :) It makes sense. Possibly in the future when you plan to support SIOV-like capability then you may have to convert PASID table to use host physical address then the same API could be reused. :) Thanks Kevin > > > Second, IOMMU nested translation is a per IOMMU domain > > capability. Since IOMMU domains are managed by VFIO/VDPA > > (alloc/free domain, attach/detach device, set/get domain attribute, > > etc.), reporting/enabling the nesting capability is an natural > > extension to the domain uAPI of existing passthrough frameworks. > > Actually, VFIO already includes a nesting enable interface even > > before this series. So it doesn't make sense to generalize this uAPI > > out. > > Agree for enabling, but for reporting we did consider adding a sysfs > interface in /sys/class/iommu/ describing an IOMMU's properties. Then > opted for VFIO capabilities to keep the API nice and contained, but if > we're breaking up the API, sysfs might be more convenient to use and > extend. > > > Then the tricky part comes with the remaining operations (3/4/5), > > which are all backed by iommu_ops thus effective only within an > > IOMMU domain. To generalize them, the first thing is to find a way > > to associate the sva_FD (opened through generic /dev/sva) with an > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > to replicate {domain<->device/subdevice
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > From: Jason Wang > > Sent: Monday, September 14, 2020 12:20 PM > > > [...] > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > specific one. > > > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). > > > > Have you ever considered this approach? > > > > Hi, Jason, > > We did some study on this approach and below is the output. It's a > long writing but I didn't find a way to further abstract w/o losing > necessary context. Sorry about that. > > Overall the real purpose of this series is to enable IOMMU nested > translation capability with vSVA as one major usage, through > below new uAPIs: > 1) Report/enable IOMMU nested translation capability; > 2) Allocate/free PASID; > 3) Bind/unbind guest page table; > 4) Invalidate IOMMU cache; > 5) Handle IOMMU page request/response (not in this series); > 1/3/4) is the minimal set for using IOMMU nested translation, with > the other two optional. For example, the guest may enable vSVA on > a device without using PASID. Or, it may bind its gIOVA page table > which doesn't require page fault support. Finally, all operations can > be applied to either physical device or subdevice. > > Then we evaluated each uAPI whether generalizing it is a good thing > both in concept and regarding to complexity. > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > allocation/free is through the IOASID sub-system. From this angle > we feel generalizing PASID management does make some sense. > First, PASID is just a number and not related to any device before > it's bound to a page table and IOMMU domain. Second, PASID is a > global resource (at least on Intel VT-d), while having separate VFIO/ > VDPA allocation interfaces may easily cause confusion in userspace, > e.g. which interface to be used if both VFIO/VDPA devices exist. > Moreover, an unified interface allows centralized control over how > many PASIDs are allowed per process. > > One unclear part with this generalization is about the permission. > Do we open this interface to any process or only to those which > have assigned devices? If the latter, what would be the mechanism > to coordinate between this new interface and specific passthrough > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > please correct me) plans to do per-device PASID namespace which > is built on a bind_pasid_table iommu callback to allow guest fully > manage its PASIDs on a given passthrough device. Yes we need a bind_pasid_table. The guest needs to allocate the PASID tables because they are accessed via guest-physical addresses by the HW SMMU. With bind_pasid_table, the invalidation message also requires a scope to invalidate a whole PASID context, in addition to invalidating a mappings ranges. > I'm not sure > how such requirement can be unified w/o involving passthrough > frameworks, or whether ARM could also switch to global PASID > style... Not planned at the moment, sorry. It requires a PV IOMMU to do PASID allocation, which is possible with virtio-iommu but not with a vSMMU emulation. The VM will manage its own PASID space. The upside is that we don't need userspace access to IOASID, so I won't pester you with comments on that part of the API :) > Second, IOMMU nested translation is a per IOMMU domain > capability. Since IOMMU domains are managed by VFIO/VDPA > (alloc/free domain, attach/detach device, set/get domain attribute, > etc.), reporting/enabling the nesting capability is an natural > extension to the domain uAPI of existing passthrough frameworks. > Actually, VFIO already includes a nesting enable interface even > before this series. So it doesn't make sense to generalize this uAPI > out. Agree for enabling, but for reporting we did consider adding a sysfs interface in /sys/class/iommu/ describing an IOMMU's properties. Then opted for VFIO capabilities to keep the API nice and contained, but if we're breaking up the API, sysfs might be more convenient to use and extend. > Then the tricky part comes with the remaining operations (3/4/5), > which are all backed by iommu_ops thus effective only within an > IOMMU domain. To generalize them, the first thing is to find a way > to associate the sva_FD (opened through generic /dev/sva) with an > IOMMU domain that is created by VFIO/VDPA. The second thing is > to replicate {domain<->device/subdevice} association in /dev/sva > path because some operations (e.g. page fault) is triggered/handled > per device/subdevice. Therefore, /dev/sva must provide both per- > domain and per-device uAPIs similar to what VFIO/VDPA already > does. Moreover, mapping page fault to subdevice requires pre- > registering subdevice fault data to IOMMU layer when binding > guest page table, while such fault data can be only retrieved from > parent
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) I'm not sure how such requirement can be unified w/o involving passthrough frameworks, or whether ARM could also switch to global PASID style... Second, IOMMU nested translation is a per IOMMU domain capability. Since IOMMU domains are managed by VFIO/VDPA (alloc/free domain, attach/detach device, set/get domain attribute, etc.), reporting/enabling the nesting capability is an natural extension to the domain uAPI of existing passthrough frameworks. Actually, VFIO already includes a nesting enable interface even before this series. So it doesn't make sense to generalize this uAPI out. So my understanding is that VFIO already: 1) use multiple fds 2) separate IOMMU ops to a dedicated container fd (type1 iommu) 3) provides API to associated devices/group with a container And all the proposal in this series is to reuse the container fd. It should be possible to replace e.g type1 IOMMU with a unified module. Then the tricky part comes with the remaining operations (3/4/5), which are all backed by iommu_ops thus effective only within an IOMMU domain. To generalize them, the first thing is to find a way to associate the sva_FD (opened through generic /dev/sva) with an IOMMU domain that is created by VFIO/VDPA. The second thing is to replicate {domain<->device/subdevice} association in /dev/sva path because some operations (e.g. page fault) is triggered/handled per device/subdevice. Is there any reason that the #PF can not be handled via SVA fd? Therefore, /dev/sva must provide both per- domain and per-device uAPIs similar to what VFIO/VDPA already does. Moreover, mapping page fault to subdevice requires pre- registering subdevice fault data to IOMMU layer when binding guest page ta