Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-04 Thread Jason Gunthorpe
On Thu, Mar 04, 2021 at 07:20:22AM +, Liu, Yi L wrote:
> > > However, IOMMU is a system device which has little value to be exposed
> > to
> > > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > > nicely abstracts IOMMU from the userspace, why do we want to reverse
> > that?
> > 
> > The other patch was talking about a /dev/ioasid - why can't this stuff
> > be run over that?
> 
> The stuff in this patch are actually iommu domain operations, which are
> finally supported by iommu domain ops. While /dev/ioasid in another patch
> is created for IOASID allocation/free to fit the PASID allocation requirement
> from both vSVA and vDPA. It has no idea about iommu domain and neither the
> device information. Without such info, /dev/ioasid is unable to run this
> stuff.

Why can't it know? My point was that VFIO should interact with
/dev/ioasid to exchange the information both sides need and once
exchanged the control over IOMMU should be done through /dev/ioasid,
not inside VFIO.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-03 Thread Liu, Yi L
Hi Jason,

> From: Jason Gunthorpe 
> Sent: Thursday, March 4, 2021 3:45 AM
> 
> On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe 
> wrote:
> >
> > > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > > Hi Jason,
> > > >
> > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe 
> > > > wrote:
> > > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > > > > >
> > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > > +{
> > > > > > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > > +   unsigned long arg = *(unsigned long *)dc->data;
> > > > > > +
> > > > > > +   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > > + (void __user *)arg);
> > > > >
> > > > > This arg buisness is really tortured. The type should be set at the
> > > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > >
> > > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > > >
> > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or
> other
> > > > user driver frameworks. The design is documented here:
> > > > Documentation/userspace-api/iommu.rst
> > > > IOMMU UAPI handles the type and sanitation of user provided data.
> > >
> > > Why? If it is uapi it has defined types and those types should be
> > > completely clear from the C code, not obfuscated.
> > >
> > From the user's p.o.v., it is plain c code nothing obfuscated. As for
> > kernel handling of the data types, it has to be answered by the bigger
> > question of how we deal with sharing IOMMU among multiple
> subsystems with
> > UAPIs.
> 
> As I said, don't obfuscate types like this in the kernel. It is not
> good style.
> 
> > However, IOMMU is a system device which has little value to be exposed
> to
> > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > nicely abstracts IOMMU from the userspace, why do we want to reverse
> that?
> 
> The other patch was talking about a /dev/ioasid - why can't this stuff
> be run over that?

The stuff in this patch are actually iommu domain operations, which are
finally supported by iommu domain ops. While /dev/ioasid in another patch
is created for IOASID allocation/free to fit the PASID allocation requirement
from both vSVA and vDPA. It has no idea about iommu domain and neither the
device information. Without such info, /dev/ioasid is unable to run this
stuff.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-03 Thread Jason Gunthorpe
On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe  wrote:
> 
> > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > Hi Jason,
> > > 
> > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe 
> > > wrote: 
> > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > > >  
> > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > +{
> > > > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > + unsigned long arg = *(unsigned long *)dc->data;
> > > > > +
> > > > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > +   (void __user *)arg);
> > > > 
> > > > This arg buisness is really tortured. The type should be set at the
> > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > 
> > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > >   
> > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > > user driver frameworks. The design is documented here:
> > > Documentation/userspace-api/iommu.rst
> > > IOMMU UAPI handles the type and sanitation of user provided data.  
> > 
> > Why? If it is uapi it has defined types and those types should be
> > completely clear from the C code, not obfuscated.
> > 
> From the user's p.o.v., it is plain c code nothing obfuscated. As for
> kernel handling of the data types, it has to be answered by the bigger
> question of how we deal with sharing IOMMU among multiple subsystems with
> UAPIs.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1;
t=1614800733; bh=ESjraPQ1U+x3dvWd7l3HBKlTBu3ySX2nWO4QM44ApEU=;
h=ARC-Seal:ARC-Message-Signature:ARC-Authentication-Results:Date:
 From:To:CC:Subject:Message-ID:References:Content-Type:
 Content-Disposition:In-Reply-To:X-ClientProxiedBy:MIME-Version:
 X-MS-Exchange-MessageSentRepresentingType:X-Header;
b=Jyxs64SLsrGfmoPaAHc58Bpre6/+/wDGs/dAMDviWTwRc+2Miw0+jOBGQXNLz/lE8
 KbS/K02BZmgn/jJwl994Po6nS3kGgTg0of6AFSd2MqtaZPMx+aMJ3prec9hwpHaXmN
 SiosC+FviWoFQHmEvWSQoVeEMS093zQ+sjcsymUkMXYHGYQRqebW101Mii6/0hT3iv
 Su4YvAyaKmOpWT8sayI4K0ICIhRxWAOT5P78FrXofij9o3X9T9F/9Bo+S5BWSCMzMr
 +dM2KsTd4Ecac9hNemigs87T/tiCh50XaZoc8WMRFWGYMpha6KFdCV5wWL0Yzauyt7
 H7uCWkmg/QTCw==

As I said, don't obfuscate types like this in the kernel. It is not
good style.

> However, IOMMU is a system device which has little value to be exposed to
> the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> nicely abstracts IOMMU from the userspace, why do we want to reverse that?

The other patch was talking about a /dev/ioasid - why can't this stuff
be run over that?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-03 Thread Jacob Pan
Hi Jason,

On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe  wrote:

> On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe 
> > wrote: 
> > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > >  
> > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > +{
> > > > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > +   unsigned long arg = *(unsigned long *)dc->data;
> > > > +
> > > > +   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > + (void __user *)arg);
> > > 
> > > This arg buisness is really tortured. The type should be set at the
> > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > 
> > > And why is this passing a __user pointer deep into an iommu_* API??
> > >   
> > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > user driver frameworks. The design is documented here:
> > Documentation/userspace-api/iommu.rst
> > IOMMU UAPI handles the type and sanitation of user provided data.  
> 
> Why? If it is uapi it has defined types and those types should be
> completely clear from the C code, not obfuscated.
> 
>From the user's p.o.v., it is plain c code nothing obfuscated. As for
kernel handling of the data types, it has to be answered by the bigger
question of how we deal with sharing IOMMU among multiple subsystems with
UAPIs.

> I haven't looked at the design doc yet, but this is a just a big red
> flag, you shouldn't be tunneling one subsytems uAPI through another
> subsystem.
>
> If you need to hook two subsystems together it should be more
> directly, like VFIO takes in the IOMMU FD and 'registers' itself in
> some way with the IOMMU then you can do the IOMMU actions through the
> IOMMU FD and it can call back to VFIO as needed.
> 
> At least in this way we can swap VFIO for other things in the API.
> 
> Having every subsystem that wants to implement IOMMU also implement
> tunneled ops seems very backwards.
>
Let me try to understand your proposal, you are saying:
1. Create a new user interface such that FDs can be obtained as a reference
to an IOMMU.
2. Handle over the IOMMU FD to VFIO or other subsystem which wish to
register for IOMMU service.

However, IOMMU is a system device which has little value to be exposed to
the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
nicely abstracts IOMMU from the userspace, why do we want to reverse that?

With this UAPI layering approach, we converge common code at the IOMMU
layer. Without introducing new user interfaces, we can support multiple
subsystems that want to use IOMMU service. e.g. VDPA and VFIO.

> > Could you be more specific about your concerns?  
> 
> Avoid using unsigned long, void * and flex arrays to describe
> concretely typed things.
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jason Gunthorpe
On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe  wrote:
> 
> > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > >  
> > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + unsigned long arg = *(unsigned long *)dc->data;
> > > +
> > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > +   (void __user *)arg);  
> > 
> > This arg buisness is really tortured. The type should be set at the
> > ioctl, not constantly passed down as unsigned long or worse void *.
> > 
> > And why is this passing a __user pointer deep into an iommu_* API??
> > 
> The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
> driver frameworks. The design is documented here:
> Documentation/userspace-api/iommu.rst
> IOMMU UAPI handles the type and sanitation of user provided data.

Why? If it is uapi it has defined types and those types should be
completely clear from the C code, not obfuscated.

I haven't looked at the design doc yet, but this is a just a big red
flag, you shouldn't be tunneling one subsytems uAPI through another
subsystem.

If you need to hook two subsystems together it should be more
directly, like VFIO takes in the IOMMU FD and 'registers' itself in
some way with the IOMMU then you can do the IOMMU actions through the
IOMMU FD and it can call back to VFIO as needed.

At least in this way we can swap VFIO for other things in the API.

Having every subsystem that wants to implement IOMMU also implement
tunneled ops seems very backwards.

> Could you be more specific about your concerns?

Avoid using unsigned long, void * and flex arrays to describe
concretely typed things.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jacob Pan
Hi Jason,

On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe  wrote:

> On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> >  
> > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > + (void __user *)arg);  
> 
> This arg buisness is really tortured. The type should be set at the
> ioctl, not constantly passed down as unsigned long or worse void *.
> 
> And why is this passing a __user pointer deep into an iommu_* API??
> 
The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
driver frameworks. The design is documented here:
Documentation/userspace-api/iommu.rst
IOMMU UAPI handles the type and sanitation of user provided data.

Could you be more specific about your concerns?

> > +/**
> > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> > + * struct vfio_iommu_type1_nesting_op)
> > + *
> > + * This interface allows userspace to utilize the nesting IOMMU
> > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> > + * _REQUEST.
> > + *
> > + * @data[] types defined for each op:
> > + * +=+===+
> > + * | NESTING OP  |  @data[]  |
> > + * +=+===+
> > + * | BIND_PGTBL  |  struct iommu_gpasid_bind_data|
> > + * +-+---+
> > + * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
> > + *
> > +-+---+  
> 
> If the type is known why does the struct have a flex array?
> 
This will be extended to other types in the next patches.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jason Gunthorpe
On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
>  
> +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + unsigned long arg = *(unsigned long *)dc->data;
> +
> + return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> +   (void __user *)arg);

This arg buisness is really tortured. The type should be set at the
ioctl, not constantly passed down as unsigned long or worse void *.

And why is this passing a __user pointer deep into an iommu_* API??

> +/**
> + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> + *   struct vfio_iommu_type1_nesting_op)
> + *
> + * This interface allows userspace to utilize the nesting IOMMU
> + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> + * _REQUEST.
> + *
> + * @data[] types defined for each op:
> + * +=+===+
> + * | NESTING OP  |  @data[]  |
> + * +=+===+
> + * | BIND_PGTBL  |  struct iommu_gpasid_bind_data|
> + * +-+---+
> + * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
> + *
> +-+---+

If the type is known why does the struct have a flex array?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Liu Yi L
Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated by the userspace before
the binding request. e.g. allocated from /dev/ioasid. As the bind data
is parsed by iommu abstract layer, so this patch doesn't have the ownership
check against the PASID from userspace. It would be done in the iommu sub-
system.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v7 -> v8:
*) adapt to /dev/ioasid
*) address comments from Alex on v7.
*) adapt to latest iommu_sva_unbind_gpasid() implementation.
*) remove the OP check against VFIO_IOMMU_NESTING_OP_NUM as it's redundant
   to the default switch case in vfio_iommu_handle_pgtbl_op().

v6 -> v7:
*) introduced @user in struct domain_capsule to simplify the code per Eric's
   suggestion.
*) introduced VFIO_IOMMU_NESTING_OP_NUM for sanitizing op from userspace.
*) corrected the @argsz value of unbind_data in vfio_group_unbind_gpasid_fn().

v5 -> v6:
*) dropped vfio_find_nesting_group() and add vfio_get_nesting_domain_capsule().
   per comment from Eric.
*) use iommu_uapi_sva_bind/unbind_gpasid() and iommu_sva_unbind_gpasid() in
   linux/iommu.h for userspace operation and in-kernel operation.

v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   
https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 156 
 include/uapi/linux/vfio.h   |  35 +++
 2 files changed, 191 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3a5c84d4f19b..0044931b80dc 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -164,6 +164,34 @@ struct vfio_regions {
 
 #define WAITED 1
 
+struct domain_capsule {
+   struct vfio_group   *group;
+   struct iommu_domain *domain;
+   void*data;
+   /* set if @data contains a user pointer*/
+   booluser;
+};
+
+/* iommu->lock must be held */
+static int vfio_prepare_nesting_domain_capsule(struct vfio_iommu *iommu,
+  struct domain_capsule *dc)
+{
+   struct vfio_domain *domain;
+   struct vfio_group *group;
+
+   if (!iommu->nesting_info)
+   return -EINVAL;
+
+   domain = list_first_entry(>domain_list,
+ struct vfio_domain, next);
+   group = list_first_entry(>group_list,
+struct vfio_group, next);
+   dc->group = group;
+   dc->domain = domain->domain;
+   dc->user = true;
+   return 0;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2607,6 +2635,51 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+ (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+
+   /*
+* dc->user is a toggle for the unbind operation. When user
+* set, the dc->data passes in a __user pointer and requires
+* to use iommu_uapi_sva_unbind_gpasid(), in which it will
+* copy the unbind data from the user buffer. When user is
+* clear, the dc->data passes in a pasid which is going to
+* be unbind no need to copy data from userspace.
+*/
+   if (dc->user) {
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   iommu_uapi_sva_unbind_gpasid(dc->domain,
+dev, (void __user *)arg);
+   } else {
+   ioasid_t pasid =