Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
On Thu, Mar 04, 2021 at 07:20:22AM +, Liu, Yi L wrote: > > > However, IOMMU is a system device which has little value to be exposed > > to > > > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO > > > nicely abstracts IOMMU from the userspace, why do we want to reverse > > that? > > > > The other patch was talking about a /dev/ioasid - why can't this stuff > > be run over that? > > The stuff in this patch are actually iommu domain operations, which are > finally supported by iommu domain ops. While /dev/ioasid in another patch > is created for IOASID allocation/free to fit the PASID allocation requirement > from both vSVA and vDPA. It has no idea about iommu domain and neither the > device information. Without such info, /dev/ioasid is unable to run this > stuff. Why can't it know? My point was that VFIO should interact with /dev/ioasid to exchange the information both sides need and once exchanged the control over IOMMU should be done through /dev/ioasid, not inside VFIO. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
Hi Jason, > From: Jason Gunthorpe > Sent: Thursday, March 4, 2021 3:45 AM > > On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote: > > Hi Jason, > > > > On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe > wrote: > > > > > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote: > > > > Hi Jason, > > > > > > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe > > > > wrote: > > > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > > > > > > > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > > > > > > +{ > > > > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > > > > + unsigned long arg = *(unsigned long *)dc->data; > > > > > > + > > > > > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > > > > > > + (void __user *)arg); > > > > > > > > > > This arg buisness is really tortured. The type should be set at the > > > > > ioctl, not constantly passed down as unsigned long or worse void *. > > > > > > > > > > And why is this passing a __user pointer deep into an iommu_* API?? > > > > > > > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or > other > > > > user driver frameworks. The design is documented here: > > > > Documentation/userspace-api/iommu.rst > > > > IOMMU UAPI handles the type and sanitation of user provided data. > > > > > > Why? If it is uapi it has defined types and those types should be > > > completely clear from the C code, not obfuscated. > > > > > From the user's p.o.v., it is plain c code nothing obfuscated. As for > > kernel handling of the data types, it has to be answered by the bigger > > question of how we deal with sharing IOMMU among multiple > subsystems with > > UAPIs. > > As I said, don't obfuscate types like this in the kernel. It is not > good style. > > > However, IOMMU is a system device which has little value to be exposed > to > > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO > > nicely abstracts IOMMU from the userspace, why do we want to reverse > that? > > The other patch was talking about a /dev/ioasid - why can't this stuff > be run over that? The stuff in this patch are actually iommu domain operations, which are finally supported by iommu domain ops. While /dev/ioasid in another patch is created for IOASID allocation/free to fit the PASID allocation requirement from both vSVA and vDPA. It has no idea about iommu domain and neither the device information. Without such info, /dev/ioasid is unable to run this stuff. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote: > Hi Jason, > > On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe wrote: > > > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote: > > > Hi Jason, > > > > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe > > > wrote: > > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > > > > > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > > > > > +{ > > > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > > > + unsigned long arg = *(unsigned long *)dc->data; > > > > > + > > > > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > > > > > + (void __user *)arg); > > > > > > > > This arg buisness is really tortured. The type should be set at the > > > > ioctl, not constantly passed down as unsigned long or worse void *. > > > > > > > > And why is this passing a __user pointer deep into an iommu_* API?? > > > > > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other > > > user driver frameworks. The design is documented here: > > > Documentation/userspace-api/iommu.rst > > > IOMMU UAPI handles the type and sanitation of user provided data. > > > > Why? If it is uapi it has defined types and those types should be > > completely clear from the C code, not obfuscated. > > > From the user's p.o.v., it is plain c code nothing obfuscated. As for > kernel handling of the data types, it has to be answered by the bigger > question of how we deal with sharing IOMMU among multiple subsystems with > UAPIs. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1614800733; bh=ESjraPQ1U+x3dvWd7l3HBKlTBu3ySX2nWO4QM44ApEU=; h=ARC-Seal:ARC-Message-Signature:ARC-Authentication-Results:Date: From:To:CC:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:X-ClientProxiedBy:MIME-Version: X-MS-Exchange-MessageSentRepresentingType:X-Header; b=Jyxs64SLsrGfmoPaAHc58Bpre6/+/wDGs/dAMDviWTwRc+2Miw0+jOBGQXNLz/lE8 KbS/K02BZmgn/jJwl994Po6nS3kGgTg0of6AFSd2MqtaZPMx+aMJ3prec9hwpHaXmN SiosC+FviWoFQHmEvWSQoVeEMS093zQ+sjcsymUkMXYHGYQRqebW101Mii6/0hT3iv Su4YvAyaKmOpWT8sayI4K0ICIhRxWAOT5P78FrXofij9o3X9T9F/9Bo+S5BWSCMzMr +dM2KsTd4Ecac9hNemigs87T/tiCh50XaZoc8WMRFWGYMpha6KFdCV5wWL0Yzauyt7 H7uCWkmg/QTCw== As I said, don't obfuscate types like this in the kernel. It is not good style. > However, IOMMU is a system device which has little value to be exposed to > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO > nicely abstracts IOMMU from the userspace, why do we want to reverse that? The other patch was talking about a /dev/ioasid - why can't this stuff be run over that? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
Hi Jason, On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe wrote: > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote: > > Hi Jason, > > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe > > wrote: > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > > > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > > > > +{ > > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > > + unsigned long arg = *(unsigned long *)dc->data; > > > > + > > > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > > > > + (void __user *)arg); > > > > > > This arg buisness is really tortured. The type should be set at the > > > ioctl, not constantly passed down as unsigned long or worse void *. > > > > > > And why is this passing a __user pointer deep into an iommu_* API?? > > > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other > > user driver frameworks. The design is documented here: > > Documentation/userspace-api/iommu.rst > > IOMMU UAPI handles the type and sanitation of user provided data. > > Why? If it is uapi it has defined types and those types should be > completely clear from the C code, not obfuscated. > >From the user's p.o.v., it is plain c code nothing obfuscated. As for kernel handling of the data types, it has to be answered by the bigger question of how we deal with sharing IOMMU among multiple subsystems with UAPIs. > I haven't looked at the design doc yet, but this is a just a big red > flag, you shouldn't be tunneling one subsytems uAPI through another > subsystem. > > If you need to hook two subsystems together it should be more > directly, like VFIO takes in the IOMMU FD and 'registers' itself in > some way with the IOMMU then you can do the IOMMU actions through the > IOMMU FD and it can call back to VFIO as needed. > > At least in this way we can swap VFIO for other things in the API. > > Having every subsystem that wants to implement IOMMU also implement > tunneled ops seems very backwards. > Let me try to understand your proposal, you are saying: 1. Create a new user interface such that FDs can be obtained as a reference to an IOMMU. 2. Handle over the IOMMU FD to VFIO or other subsystem which wish to register for IOMMU service. However, IOMMU is a system device which has little value to be exposed to the userspace. Not to mention the device-IOMMU affinity/topology. VFIO nicely abstracts IOMMU from the userspace, why do we want to reverse that? With this UAPI layering approach, we converge common code at the IOMMU layer. Without introducing new user interfaces, we can support multiple subsystems that want to use IOMMU service. e.g. VDPA and VFIO. > > Could you be more specific about your concerns? > > Avoid using unsigned long, void * and flex arrays to describe > concretely typed things. > > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote: > Hi Jason, > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe wrote: > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > > > +{ > > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > > + unsigned long arg = *(unsigned long *)dc->data; > > > + > > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > > > + (void __user *)arg); > > > > This arg buisness is really tortured. The type should be set at the > > ioctl, not constantly passed down as unsigned long or worse void *. > > > > And why is this passing a __user pointer deep into an iommu_* API?? > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other user > driver frameworks. The design is documented here: > Documentation/userspace-api/iommu.rst > IOMMU UAPI handles the type and sanitation of user provided data. Why? If it is uapi it has defined types and those types should be completely clear from the C code, not obfuscated. I haven't looked at the design doc yet, but this is a just a big red flag, you shouldn't be tunneling one subsytems uAPI through another subsystem. If you need to hook two subsystems together it should be more directly, like VFIO takes in the IOMMU FD and 'registers' itself in some way with the IOMMU then you can do the IOMMU actions through the IOMMU FD and it can call back to VFIO as needed. At least in this way we can swap VFIO for other things in the API. Having every subsystem that wants to implement IOMMU also implement tunneled ops seems very backwards. > Could you be more specific about your concerns? Avoid using unsigned long, void * and flex arrays to describe concretely typed things. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
Hi Jason, On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe wrote: > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > > +{ > > + struct domain_capsule *dc = (struct domain_capsule *)data; > > + unsigned long arg = *(unsigned long *)dc->data; > > + > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > > + (void __user *)arg); > > This arg buisness is really tortured. The type should be set at the > ioctl, not constantly passed down as unsigned long or worse void *. > > And why is this passing a __user pointer deep into an iommu_* API?? > The idea was that IOMMU UAPI (not API) is independent of VFIO or other user driver frameworks. The design is documented here: Documentation/userspace-api/iommu.rst IOMMU UAPI handles the type and sanitation of user provided data. Could you be more specific about your concerns? > > +/** > > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18, > > + * struct vfio_iommu_type1_nesting_op) > > + * > > + * This interface allows userspace to utilize the nesting IOMMU > > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING > > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require > > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID > > + * _REQUEST. > > + * > > + * @data[] types defined for each op: > > + * +=+===+ > > + * | NESTING OP | @data[] | > > + * +=+===+ > > + * | BIND_PGTBL | struct iommu_gpasid_bind_data| > > + * +-+---+ > > + * | UNBIND_PGTBL| struct iommu_gpasid_bind_data| > > + * > > +-+---+ > > If the type is known why does the struct have a flex array? > This will be extended to other types in the next patches. > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote: > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > +{ > + struct domain_capsule *dc = (struct domain_capsule *)data; > + unsigned long arg = *(unsigned long *)dc->data; > + > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, > + (void __user *)arg); This arg buisness is really tortured. The type should be set at the ioctl, not constantly passed down as unsigned long or worse void *. And why is this passing a __user pointer deep into an iommu_* API?? > +/** > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18, > + * struct vfio_iommu_type1_nesting_op) > + * > + * This interface allows userspace to utilize the nesting IOMMU > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID > + * _REQUEST. > + * > + * @data[] types defined for each op: > + * +=+===+ > + * | NESTING OP | @data[] | > + * +=+===+ > + * | BIND_PGTBL | struct iommu_gpasid_bind_data| > + * +-+---+ > + * | UNBIND_PGTBL| struct iommu_gpasid_bind_data| > + * > +-+---+ If the type is known why does the struct have a flex array? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
Nesting translation allows two-levels/stages page tables, with 1st level for guest translations (e.g. GVA->GPA), 2nd level for host translations (e.g. GPA->HPA). This patch adds interface for binding guest page tables to a PASID. This PASID must have been allocated by the userspace before the binding request. e.g. allocated from /dev/ioasid. As the bind data is parsed by iommu abstract layer, so this patch doesn't have the ownership check against the PASID from userspace. It would be done in the iommu sub- system. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Joerg Roedel Cc: Lu Baolu Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- v7 -> v8: *) adapt to /dev/ioasid *) address comments from Alex on v7. *) adapt to latest iommu_sva_unbind_gpasid() implementation. *) remove the OP check against VFIO_IOMMU_NESTING_OP_NUM as it's redundant to the default switch case in vfio_iommu_handle_pgtbl_op(). v6 -> v7: *) introduced @user in struct domain_capsule to simplify the code per Eric's suggestion. *) introduced VFIO_IOMMU_NESTING_OP_NUM for sanitizing op from userspace. *) corrected the @argsz value of unbind_data in vfio_group_unbind_gpasid_fn(). v5 -> v6: *) dropped vfio_find_nesting_group() and add vfio_get_nesting_domain_capsule(). per comment from Eric. *) use iommu_uapi_sva_bind/unbind_gpasid() and iommu_sva_unbind_gpasid() in linux/iommu.h for userspace operation and in-kernel operation. v3 -> v4: *) address comments from Alex on v3 v2 -> v3: *) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun@linux.intel.com/ v1 -> v2: *) rename subject from "vfio/type1: Bind guest page tables to host" *) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/ unbind guet page table *) replaced vfio_iommu_for_each_dev() with a group level loop since this series enforces one group per container w/ nesting type as start. *) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn() *) vfio_dev_unbind_gpasid() always successful *) use vfio_mm->pasid_lock to avoid race between PASID free and page table bind/unbind --- drivers/vfio/vfio_iommu_type1.c | 156 include/uapi/linux/vfio.h | 35 +++ 2 files changed, 191 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 3a5c84d4f19b..0044931b80dc 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -164,6 +164,34 @@ struct vfio_regions { #define WAITED 1 +struct domain_capsule { + struct vfio_group *group; + struct iommu_domain *domain; + void*data; + /* set if @data contains a user pointer*/ + booluser; +}; + +/* iommu->lock must be held */ +static int vfio_prepare_nesting_domain_capsule(struct vfio_iommu *iommu, + struct domain_capsule *dc) +{ + struct vfio_domain *domain; + struct vfio_group *group; + + if (!iommu->nesting_info) + return -EINVAL; + + domain = list_first_entry(>domain_list, + struct vfio_domain, next); + group = list_first_entry(>group_list, +struct vfio_group, next); + dc->group = group; + dc->domain = domain->domain; + dc->user = true; + return 0; +} + static int put_pfn(unsigned long pfn, int prot); static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu, @@ -2607,6 +2635,51 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu, return ret; } +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + unsigned long arg = *(unsigned long *)dc->data; + + return iommu_uapi_sva_bind_gpasid(dc->domain, dev, + (void __user *)arg); +} + +static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + + /* +* dc->user is a toggle for the unbind operation. When user +* set, the dc->data passes in a __user pointer and requires +* to use iommu_uapi_sva_unbind_gpasid(), in which it will +* copy the unbind data from the user buffer. When user is +* clear, the dc->data passes in a pasid which is going to +* be unbind no need to copy data from userspace. +*/ + if (dc->user) { + unsigned long arg = *(unsigned long *)dc->data; + + iommu_uapi_sva_unbind_gpasid(dc->domain, +dev, (void __user *)arg); + } else { + ioasid_t pasid =