RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
> From: Raj, Ashok 
> Sent: Friday, May 15, 2020 11:20 PM
> 
> On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote:
> > Hi, Alex,
> >
> > When working on an updated version Yi and I found an design open
> > which needs your guidance.
> >
> > In concept nested translation can be incarnated as one GPA->HPA page
> > table and multiple GVA->GPA page tables per VM. It means one container
> > is sufficient to include all SVA-capable devices assigned to the same guest,
> > as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> > map/unmap api. GVA->GPA page tables are bound through the new api
> > introduced in this patch. It is different from legacy shadow translation
> > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device
> requires
> > its own iova space and must be in a separate container.
> >
> > However supporting multiple SVA-capable devices in one container
> > imposes one challenge. Now the bind_guest_pgtbl is implemented as
> > iommu type1 api. From the definition of iommu type 1 any operation
> > should be applied to all devices within the same container, just like
> > dma map/unmap. However this philosophy is incorrect regarding to
> > page table binding. We must follow the guest binding requirements
> > between its page tables and assigned devices, otherwise every bound
> > address space is suddenly accessible by all assigned devices thus creating
> > security holes.
> 
> The above 2 paragraphs are bit confusing :-) but what really matters
> is the below:
> >
> > Do you think whether it's possible to extend iommu api to accept
> > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> > also problematic, as PASID and page tables are IOMMU things which
> > are not touched by vfio device drivers today.
> 
> All you are referring to is when admin groups multiple devices in a
> single container, you are saying you can't give isolation to them
> for SVA purposes. This is logically equivalent to a single group with
> multiple devices.
> 
>   - Such as devices behind p2p bridge
>   - MFD's or devices behind switches or hieararchies without ACS
> support for instance.
> 
> By limitation you mean, disable PASID on those devices in a single
> container?

the limitation means disabling nesting capability in such container
and yes it implies not exposing PASID capability to userspace too.

> 
> what about ATS?

ATS is possibly fine. VFIO exposes it to userspace even today.

Thanks
Kevin

> 
> >
> > Alternatively can we impose the limitation that nesting APIs can be
> > only enabled for singleton containers which contains only one device?
> > This basically falls back to the state of legacy shadow translation
> > vIOMMU. and our current Qemu vIOMMU implementation actually
> > does this way regardless of whether nesting is used. Do you think
> > whether such tradeoff is acceptable as a starting point?
> >
> > Thanks
> > Kevin
> >
> > > -Original Message-
> > > From: Liu, Yi L 
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > > To: alex.william...@redhat.com; eric.au...@redhat.com
> > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > > ; Tian, Jun J ; Sun, Yi Y
> > > ; jean-phili...@linaro.org; pet...@redhat.com;
> > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; Wu, Hao 
> > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > >
> > > From: Liu Yi L 
> > >
> > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > > hardware
> > > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > > translation). For such hardware IOMMUs, there are two stages/levels of
> > > address translation, and software may let userspace/VM to own the first-
> > > level/stage-1 translation structures. Example of such usage is vSVA (
> > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > > translation structures and bind the structures to host, then hardware
> > > IOMMU would utilize nesting translation when doing DMA translation fo
> > > the devices behind such hardware IOMMU.
> > >
> > > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not
> only
> > > bind
> > > guest page table is needed, it also requires to expose interface to guest
> > > for iommu cache invalidation when guest modified the first-level/stage-1
> > > translation structures since hardware needs to be notified to flush stale
> > > iotlbs. This would be introduced in next patch.
> > >
> > > In this patch, guest page table bind and unbind are done by using flags
> > > VFIO_IOMMU_BIND_GUEST_PGTBL and
> > > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > > VM should have got a PASID allocated by host via
> > > 

RE: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Saturday, May 16, 2020 1:59 AM
> 
> On Fri, 15 May 2020 07:39:14 +
> "Tian, Kevin"  wrote:
> 
> > Hi, Alex,
> >
> > When working on an updated version Yi and I found an design open
> > which needs your guidance.
> >
> > In concept nested translation can be incarnated as one GPA->HPA page
> > table and multiple GVA->GPA page tables per VM. It means one container
> > is sufficient to include all SVA-capable devices assigned to the same guest,
> > as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> > map/unmap api. GVA->GPA page tables are bound through the new api
> > introduced in this patch. It is different from legacy shadow translation
> > which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device
> requires
> > its own iova space and must be in a separate container.
> 
> I think you've redefined the container as specifically IOVA context,
> but I think a container is actually more of a shared IOMMU context
> between groups and devices within those groups.  Userspace is under no
> obligation to share a container between groups and the kernel is under
> no obligation to let groups share a container.

oh, sorry, I didn't redefine anything here. Just describe the usage
examples of containers when vIOMMU is used. Before vSVA there
is just one address space per IOMMU context, which is what current
vfio iommu api is designed for. Now when extending it to support
vSVA there are two-layer multiple address spaces per IOMMU
 context, and the sharing possibility of those address spaces are 
different. Then this open is about whether we want to allow partial
sharing of IOMMU context within a container, i.e. sharing the 2nd
level but allows binding to different 1st levels.

> 
> > However supporting multiple SVA-capable devices in one container
> > imposes one challenge. Now the bind_guest_pgtbl is implemented as
> > iommu type1 api. From the definition of iommu type 1 any operation
> > should be applied to all devices within the same container, just like
> > dma map/unmap. However this philosophy is incorrect regarding to
> > page table binding. We must follow the guest binding requirements
> > between its page tables and assigned devices, otherwise every bound
> > address space is suddenly accessible by all assigned devices thus creating
> > security holes.
> 
> Is that a security hole, or is that simply the nature of a container?
> Containers are meant to allow a user to share an IOMMU context between
> groups of devices.  Sharing that context necessarily implies that the
> user is granting the devices access to those address spaces.

It's the nature of container. I just tried to state that the current api
is insufficient if we want to allow partial-sharing within container. 

> 
> > Do you think whether it's possible to extend iommu api to accept
> > device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> > also problematic, as PASID and page tables are IOMMU things which
> > are not touched by vfio device drivers today.
> 
> Is this really that different from a group holding multiple devices,
> each of which has a unique requester ID as seen by the IOMMU?  The
> IOMMU can support separate contexts per device, but the topology
> restricts us that those contexts are not fully isolated.  So we've
> imposed the group restriction on ourselves to reflect that.  If a user
> wants to share an IOMMU context between devices, maybe that precludes
> the user from making use of nested translation.

Agree.

> 
> > Alternatively can we impose the limitation that nesting APIs can be
> > only enabled for singleton containers which contains only one device?
> > This basically falls back to the state of legacy shadow translation
> > vIOMMU. and our current Qemu vIOMMU implementation actually
> > does this way regardless of whether nesting is used. Do you think
> > whether such tradeoff is acceptable as a starting point?
> 
> I think it's an acceptable starting point.  It seems what you're
> describing is separating a monolithic notion of IOMMU context into
> multiple layers, so we can share them at different points, ex. share a
> GPA->HPA context among multiple different GVA->GPA contexts.  That
> seems to imply something like the sub/super-container idea that's been
> tossed around before, but never been fully defined or explored.  Thanks,

yes, that'd be a future extension and we'll start with singleton
limitation for now. 

Thanks
Kevin

> 
> > > -Original Message-
> > > From: Liu, Yi L 
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > > To: alex.william...@redhat.com; eric.au...@redhat.com
> > > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > > ; Tian, Jun J ; Sun, Yi Y
> > > ; jean-phili...@linaro.org; pet...@redhat.com;
> > > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; Wu, Hao 
> > > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables 

Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Alex Williamson
On Fri, 15 May 2020 07:39:14 +
"Tian, Kevin"  wrote:

> Hi, Alex,
> 
> When working on an updated version Yi and I found an design open
> which needs your guidance.
> 
> In concept nested translation can be incarnated as one GPA->HPA page
> table and multiple GVA->GPA page tables per VM. It means one container
> is sufficient to include all SVA-capable devices assigned to the same guest,
> as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> map/unmap api. GVA->GPA page tables are bound through the new api
> introduced in this patch. It is different from legacy shadow translation 
> which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires 
> its own iova space and must be in a separate container.

I think you've redefined the container as specifically IOVA context,
but I think a container is actually more of a shared IOMMU context
between groups and devices within those groups.  Userspace is under no
obligation to share a container between groups and the kernel is under
no obligation to let groups share a container.

> However supporting multiple SVA-capable devices in one container 
> imposes one challenge. Now the bind_guest_pgtbl is implemented as
> iommu type1 api. From the definition of iommu type 1 any operation
> should be applied to all devices within the same container, just like
> dma map/unmap. However this philosophy is incorrect regarding to
> page table binding. We must follow the guest binding requirements 
> between its page tables and assigned devices, otherwise every bound
> address space is suddenly accessible by all assigned devices thus creating
> security holes.

Is that a security hole, or is that simply the nature of a container?
Containers are meant to allow a user to share an IOMMU context between
groups of devices.  Sharing that context necessarily implies that the
user is granting the devices access to those address spaces.

> Do you think whether it's possible to extend iommu api to accept
> device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> also problematic, as PASID and page tables are IOMMU things which
> are not touched by vfio device drivers today.

Is this really that different from a group holding multiple devices,
each of which has a unique requester ID as seen by the IOMMU?  The
IOMMU can support separate contexts per device, but the topology
restricts us that those contexts are not fully isolated.  So we've
imposed the group restriction on ourselves to reflect that.  If a user
wants to share an IOMMU context between devices, maybe that precludes
the user from making use of nested translation.
 
> Alternatively can we impose the limitation that nesting APIs can be
> only enabled for singleton containers which contains only one device?
> This basically falls back to the state of legacy shadow translation 
> vIOMMU. and our current Qemu vIOMMU implementation actually 
> does this way regardless of whether nesting is used. Do you think 
> whether such tradeoff is acceptable as a starting point?

I think it's an acceptable starting point.  It seems what you're
describing is separating a monolithic notion of IOMMU context into
multiple layers, so we can share them at different points, ex. share a
GPA->HPA context among multiple different GVA->GPA contexts.  That
seems to imply something like the sub/super-container idea that's been
tossed around before, but never been fully defined or explored.  Thanks,

Alex

> > -Original Message-
> > From: Liu, Yi L 
> > Sent: Sunday, March 22, 2020 8:32 PM
> > To: alex.william...@redhat.com; eric.au...@redhat.com
> > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > ; Tian, Jun J ; Sun, Yi Y
> > ; jean-phili...@linaro.org; pet...@redhat.com;
> > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Wu, Hao 
> > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > 
> > From: Liu Yi L 
> > 
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> > 
> > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> > bind
> > guest page table is needed, it also requires to expose interface to guest
> > for iommu cache invalidation when guest modified the first-level/stage-1
> > translation structures since 

Re: (a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Raj, Ashok
On Fri, May 15, 2020 at 12:39:14AM -0700, Tian, Kevin wrote:
> Hi, Alex,
> 
> When working on an updated version Yi and I found an design open
> which needs your guidance.
> 
> In concept nested translation can be incarnated as one GPA->HPA page
> table and multiple GVA->GPA page tables per VM. It means one container
> is sufficient to include all SVA-capable devices assigned to the same guest,
> as there is just one iova space (GPA->HPA) to be managed by vfio iommu
> map/unmap api. GVA->GPA page tables are bound through the new api
> introduced in this patch. It is different from legacy shadow translation
> which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires
> its own iova space and must be in a separate container.
> 
> However supporting multiple SVA-capable devices in one container
> imposes one challenge. Now the bind_guest_pgtbl is implemented as
> iommu type1 api. From the definition of iommu type 1 any operation
> should be applied to all devices within the same container, just like
> dma map/unmap. However this philosophy is incorrect regarding to
> page table binding. We must follow the guest binding requirements
> between its page tables and assigned devices, otherwise every bound
> address space is suddenly accessible by all assigned devices thus creating
> security holes.

The above 2 paragraphs are bit confusing :-) but what really matters
is the below: 
> 
> Do you think whether it's possible to extend iommu api to accept
> device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
> also problematic, as PASID and page tables are IOMMU things which
> are not touched by vfio device drivers today.

All you are referring to is when admin groups multiple devices in a
single container, you are saying you can't give isolation to them
for SVA purposes. This is logically equivalent to a single group with
multiple devices.

- Such as devices behind p2p bridge
- MFD's or devices behind switches or hieararchies without ACS
  support for instance.

By limitation you mean, disable PASID on those devices in a single 
container?

what about ATS? 

> 
> Alternatively can we impose the limitation that nesting APIs can be
> only enabled for singleton containers which contains only one device?
> This basically falls back to the state of legacy shadow translation
> vIOMMU. and our current Qemu vIOMMU implementation actually
> does this way regardless of whether nesting is used. Do you think
> whether such tradeoff is acceptable as a starting point?
> 
> Thanks
> Kevin
> 
> > -Original Message-
> > From: Liu, Yi L 
> > Sent: Sunday, March 22, 2020 8:32 PM
> > To: alex.william...@redhat.com; eric.au...@redhat.com
> > Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> > j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> > ; Tian, Jun J ; Sun, Yi Y
> > ; jean-phili...@linaro.org; pet...@redhat.com;
> > iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Wu, Hao 
> > Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> >
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> >
> > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> > bind
> > guest page table is needed, it also requires to expose interface to guest
> > for iommu cache invalidation when guest modified the first-level/stage-1
> > translation structures since hardware needs to be notified to flush stale
> > iotlbs. This would be introduced in next patch.
> >
> > In this patch, guest page table bind and unbind are done by using flags
> > VFIO_IOMMU_BIND_GUEST_PGTBL and
> > VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > VM should have got a PASID allocated by host via
> > VFIO_IOMMU_PASID_REQUEST.
> >
> > Bind guest translation structures (here is guest page table) to host
> > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 158
> > 

(a design open) RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-05-15 Thread Tian, Kevin
Hi, Alex,

When working on an updated version Yi and I found an design open
which needs your guidance.

In concept nested translation can be incarnated as one GPA->HPA page
table and multiple GVA->GPA page tables per VM. It means one container
is sufficient to include all SVA-capable devices assigned to the same guest,
as there is just one iova space (GPA->HPA) to be managed by vfio iommu
map/unmap api. GVA->GPA page tables are bound through the new api
introduced in this patch. It is different from legacy shadow translation 
which merges GIOVA->GPA->HPA into GIOVA->HPA thus each device requires 
its own iova space and must be in a separate container.

However supporting multiple SVA-capable devices in one container 
imposes one challenge. Now the bind_guest_pgtbl is implemented as
iommu type1 api. From the definition of iommu type 1 any operation
should be applied to all devices within the same container, just like
dma map/unmap. However this philosophy is incorrect regarding to
page table binding. We must follow the guest binding requirements 
between its page tables and assigned devices, otherwise every bound
address space is suddenly accessible by all assigned devices thus creating
security holes.

Do you think whether it's possible to extend iommu api to accept
device specific cmd? If not, moving it to vfio-pci or vfio-mdev sounds
also problematic, as PASID and page tables are IOMMU things which
are not touched by vfio device drivers today.

Alternatively can we impose the limitation that nesting APIs can be
only enabled for singleton containers which contains only one device?
This basically falls back to the state of legacy shadow translation 
vIOMMU. and our current Qemu vIOMMU implementation actually 
does this way regardless of whether nesting is used. Do you think 
whether such tradeoff is acceptable as a starting point?

Thanks
Kevin

> -Original Message-
> From: Liu, Yi L 
> Sent: Sunday, March 22, 2020 8:32 PM
> To: alex.william...@redhat.com; eric.au...@redhat.com
> Cc: Tian, Kevin ; jacob.jun@linux.intel.com;
> j...@8bytes.org; Raj, Ashok ; Liu, Yi L
> ; Tian, Jun J ; Sun, Yi Y
> ; jean-phili...@linaro.org; pet...@redhat.com;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Wu, Hao 
> Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> From: Liu Yi L 
> 
> VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> hardware
> IOMMUs that have nesting DMA translation (a.k.a dual stage address
> translation). For such hardware IOMMUs, there are two stages/levels of
> address translation, and software may let userspace/VM to own the first-
> level/stage-1 translation structures. Example of such usage is vSVA (
> virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> translation structures and bind the structures to host, then hardware
> IOMMU would utilize nesting translation when doing DMA translation fo
> the devices behind such hardware IOMMU.
> 
> This patch adds vfio support for binding guest translation (a.k.a stage 1)
> structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> bind
> guest page table is needed, it also requires to expose interface to guest
> for iommu cache invalidation when guest modified the first-level/stage-1
> translation structures since hardware needs to be notified to flush stale
> iotlbs. This would be introduced in next patch.
> 
> In this patch, guest page table bind and unbind are done by using flags
> VFIO_IOMMU_BIND_GUEST_PGTBL and
> VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> struct iommu_gpasid_bind_data. Before binding guest page table to host,
> VM should have got a PASID allocated by host via
> VFIO_IOMMU_PASID_REQUEST.
> 
> Bind guest translation structures (here is guest page table) to host
> are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 158
> 
>  include/uapi/linux/vfio.h   |  46 
>  2 files changed, 204 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 82a9e0b..a877747 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -130,6 +130,33 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
> 
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)
> 

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-10 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 3:57 AM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> On Sun, 22 Mar 2020 05:32:03 -0700
> "Liu, Yi L"  wrote:
> 
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
[...]
> > +/**
> > + * Unbind specific gpasid, caller of this function requires hold
> > + * vfio_iommu->lock
> > + */
> > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   return vfio_iommu_for_each_dev(iommu,
> > +   vfio_unbind_gpasid_fn, gbind_data);
> > +}
> > +
> > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   int ret = 0;
> > +
> > +   mutex_lock(>lock);
> > +   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   ret = vfio_iommu_for_each_dev(iommu,
> > +   vfio_bind_gpasid_fn, gbind_data);
> > +   /*
> > +* If bind failed, it may not be a total failure. Some devices
> > +* within the iommu group may have bind successfully. Although
> > +* we don't enable pasid capability for non-singletion iommu
> > +* groups, a unbind operation would be helpful to ensure no
> > +* partial binding for an iommu group.
> 
> Where was the non-singleton group restriction done, I missed that.
> 
> > +*/
> > +   if (ret)
> > +   /*
> > +* Undo all binds that already succeeded, no need to
> > +* check the return value here since some device within
> > +* the group has no successful bind when coming to this
> > +* place switch.
> > +*/
> > +   vfio_iommu_type1_do_guest_unbind(iommu, gbind_data);
> 
> However, the for_each_dev function stops when the callback function
> returns error, are we just assuming we stop at the same device as we
> faulted on the first time and that we traverse the same set of devices
> the second time?  It seems strange to me that unbind should be able to
> fail.

I think the code needs enhancement. Although one group per container
and one device per group is the most typical and desired case, but
the code here loops domains, groups, devices. It should be able to
unwind prior bind when the loop failed for a device. So I plan to do
below change for bind path.

list_for_each_entry(domain, >domain_list, next) {
list_for_each_entry(group, >group_list, next) {
/*
  * if bind failed on a certain device, should unbind prior 
successful
  * bind iommu_group_for_each_dev() should be modified to take 
two
  * callbacks, one for forward loop and one for reverse loop 
when failure
  * happened. "return_upon_failure" indicates whether return 
upon failure
  * during forward loop or not. If yes, 
iommu_group_for_each_dev() should
  * unwind the prior bind in this iommu group before return.
  */
ret = iommu_group_for_each_dev(iommu_group, bind_gpasid_fn,
unbind_gpasid_fn, data, 
return_upon_failure);
if (ret)
break;
}
if (ret) {
/* unwind bindings with prior groups */
list_for_each_entry_continue_reverse(group,
>group_list, 
next) {
iommu_group_for_each_dev(iommu_group, unbind_gpasid_fn,
NULL, data, 
ignore_intermediate_failure);
}
break;
}
}

if (ret) {
/* unwind bindings with prior domains */
list_for_each_entry_continue_reverse(domain, >domain_list, next) 
{
iommu_group_for_each_dev(iommu_group, unbind_gpasid_fn,
NULL, data, 
ignore_intermediate_failure);
}
}
}

return ret;

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 09, 2020 at 09:15:29AM +, Liu, Yi L wrote:
> > From: Jean-Philippe Brucker 
> > Sent: Thursday, April 9, 2020 4:29 PM
> > To: Liu, Yi L 
> > 
> > On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> > > Hi Jean,
> > >
> > > > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > > > Sent: Friday, April 3, 2020 4:35 PM
> > > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > > >
> > > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > > > >   default:
> > > > > > > > >   return -EINVAL;
> > > > > > > > >   }
> > > > > > > > > +
> > > > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > > > >
> > > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > > > >
> > > > > > > Emm, it's up to the flags to indicate bind what. It was proposed 
> > > > > > > to
> > > > > > > cover the three cases below:
> > > > > > > a) BIND/UNBIND_GPASID
> > > > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > > > c) BIND/UNBIND_PROCESS
> > > > > > > 
> > > > > > > So it's called VFIO_IOMMU_BIND.
> > > > > >
> > > > > > but aren't they all about PASID related binding?
> > > > >
> > > > > yeah, I can rename it. :-)
> > > >
> > > > I don't know if anyone intends to implement it, but SMMUv2 supports
> > > > nesting translation without any PASID support. For that case the name
> > > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more
> > sense.
> > > > Ideally we'd also use a neutral name for the IOMMU API instead of
> > > > bind_gpasid(), but that's easier to change later.
> > >
> > > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it 
> > > may
> > > cause confusion when thinking about VFIO_SET_IOMMU. How about using
> > > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> > > VFIO_BIND_PROCESS in future for the SVA bind case.
> > 
> > I think minimizing the number of ioctls is more important than finding the
> > ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
> > rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
> > non-PASID things (they should be rare enough).
> maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is
> also a discussion on reusing the same ioctl and vfio structure for
> pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your
> opinion?

Merging bind with unbind and alloc with free makes sense. I'd leave at
least invalidate a separate ioctl, because alloc/bind/unbind/free are
setup functions while invalidate is a runtime thing and performance
sensitive. But I can't see a good reason not to merge them all together,
so either way is fine by me.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Liu, Yi L
> From: Jean-Philippe Brucker 
> Sent: Thursday, April 9, 2020 4:29 PM
> To: Liu, Yi L 
> 
> On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> > Hi Jean,
> >
> > > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > > Sent: Friday, April 3, 2020 4:35 PM
> > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > >
> > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > > > default:
> > > > > > > > return -EINVAL;
> > > > > > > > }
> > > > > > > > +
> > > > > > > > +   } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > > >
> > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > > >
> > > > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > > > cover the three cases below:
> > > > > > a) BIND/UNBIND_GPASID
> > > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > > c) BIND/UNBIND_PROCESS
> > > > > > 
> > > > > > So it's called VFIO_IOMMU_BIND.
> > > > >
> > > > > but aren't they all about PASID related binding?
> > > >
> > > > yeah, I can rename it. :-)
> > >
> > > I don't know if anyone intends to implement it, but SMMUv2 supports
> > > nesting translation without any PASID support. For that case the name
> > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more
> sense.
> > > Ideally we'd also use a neutral name for the IOMMU API instead of
> > > bind_gpasid(), but that's easier to change later.
> >
> > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
> > cause confusion when thinking about VFIO_SET_IOMMU. How about using
> > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> > VFIO_BIND_PROCESS in future for the SVA bind case.
> 
> I think minimizing the number of ioctls is more important than finding the
> ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
> rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
> non-PASID things (they should be rare enough).
maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is
also a discussion on reusing the same ioctl and vfio structure for
pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your
opinion?

https://lkml.org/lkml/2020/4/3/833

Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Jean-Philippe Brucker
On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> Hi Jean,
> 
> > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > Sent: Friday, April 3, 2020 4:35 PM
> > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > 
> > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > >   default:
> > > > > > >   return -EINVAL;
> > > > > > >   }
> > > > > > > +
> > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > >
> > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > >
> > > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > > cover the three cases below:
> > > > > a) BIND/UNBIND_GPASID
> > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > c) BIND/UNBIND_PROCESS
> > > > > 
> > > > > So it's called VFIO_IOMMU_BIND.
> > > >
> > > > but aren't they all about PASID related binding?
> > >
> > > yeah, I can rename it. :-)
> > 
> > I don't know if anyone intends to implement it, but SMMUv2 supports
> > nesting translation without any PASID support. For that case the name
> > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense.
> > Ideally we'd also use a neutral name for the IOMMU API instead of
> > bind_gpasid(), but that's easier to change later.
> 
> I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
> cause confusion when thinking about VFIO_SET_IOMMU. How about using
> VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> VFIO_BIND_PROCESS in future for the SVA bind case.

I think minimizing the number of ioctls is more important than finding the
ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
non-PASID things (they should be rare enough).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-07 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> Sent: Friday, April 3, 2020 4:35 PM
> Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > default:
> > > > > > return -EINVAL;
> > > > > > }
> > > > > > +
> > > > > > +   } else if (cmd == VFIO_IOMMU_BIND) {
> > > > >
> > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > >
> > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > cover the three cases below:
> > > > a) BIND/UNBIND_GPASID
> > > > b) BIND/UNBIND_GPASID_TABLE
> > > > c) BIND/UNBIND_PROCESS
> > > > 
> > > > So it's called VFIO_IOMMU_BIND.
> > >
> > > but aren't they all about PASID related binding?
> >
> > yeah, I can rename it. :-)
> 
> I don't know if anyone intends to implement it, but SMMUv2 supports
> nesting translation without any PASID support. For that case the name
> VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense.
> Ideally we'd also use a neutral name for the IOMMU API instead of
> bind_gpasid(), but that's easier to change later.

I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
cause confusion when thinking about VFIO_SET_IOMMU. How about using
VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
VFIO_BIND_PROCESS in future for the SVA bind case.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-04 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Saturday, April 4, 2020 2:11 AM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> On Fri, 3 Apr 2020 13:30:49 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson 
> > > Sent: Friday, April 3, 2020 3:57 AM
> > > To: Liu, Yi L 
> > >
> > > On Sun, 22 Mar 2020 05:32:03 -0700
> > > "Liu, Yi L"  wrote:
> > >
> > > > From: Liu Yi L 
> > > >
> > > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > > > hardware IOMMUs that have nesting DMA translation (a.k.a dual
> > > > stage address translation). For such hardware IOMMUs, there are
> > > > two stages/levels of address translation, and software may let
> > > > userspace/VM to own the first-
> > > > level/stage-1 translation structures. Example of such usage is
> > > > vSVA ( virtual Shared Virtual Addressing). VM owns the
> > > > first-level/stage-1 translation structures and bind the structures
> > > > to host, then hardware IOMMU would utilize nesting translation
> > > > when doing DMA translation fo the devices behind such hardware IOMMU.
> > > >
> > > > This patch adds vfio support for binding guest translation (a.k.a
> > > > stage 1) structure to host iommu. And for
> > > > VFIO_TYPE1_NESTING_IOMMU, not only bind guest page table is
> > > > needed, it also requires to expose interface to guest for iommu
> > > > cache invalidation when guest modified the first-level/stage-1
> > > > translation structures since hardware needs to be notified to flush 
> > > > stale iotlbs.
> This would be introduced in next patch.
> > > >
> > > > In this patch, guest page table bind and unbind are done by using
> > > > flags VFIO_IOMMU_BIND_GUEST_PGTBL and
> > > > VFIO_IOMMU_UNBIND_GUEST_PGTBL
> > > under IOCTL
> > > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by struct
> > > > iommu_gpasid_bind_data. Before binding guest page table to host,
> > > > VM should have got a PASID allocated by host via
> VFIO_IOMMU_PASID_REQUEST.
> > > >
> > > > Bind guest translation structures (here is guest page table) to
> > > > host are the first step to setup vSVA (Virtual Shared Virtual 
> > > > Addressing).
> > > >
> > > > Cc: Kevin Tian 
> > > > CC: Jacob Pan 
> > > > Cc: Alex Williamson 
> > > > Cc: Eric Auger 
> > > > Cc: Jean-Philippe Brucker 
> > > > Signed-off-by: Jean-Philippe Brucker 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 158
> > > 
> > > >  include/uapi/linux/vfio.h   |  46 
> > > >  2 files changed, 204 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -130,6 +130,33 @@ struct vfio_regions {
> > > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > > > 
> > > > (!list_empty(>domain_list))
> > > >
> > > > +struct domain_capsule {
> > > > +   struct iommu_domain *domain;
> > > > +   void *data;
> > > > +};
> > > > +
> > > > +/* iommu->lock must be held */
> > > > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > > > + int (*fn)(struct device *dev, void *data),
> > > > + void *data)
> > > > +{
> > > > +   struct domain_capsule dc = {.data = data};
> > > > +   struct vfio_domain *d;
> > > > +   struct vfio_group *g;
> > > > +   int ret = 0;
> > > > +
> > > > +   list_for_each_entry(d, >domain_list, next) {
> > > > +   dc.domain = d->domain;
> > > > +   list_for_each_entry(g, >group_list, next) {
> > > > +   ret = iommu_group_for_each_dev(g->iommu_group,
> > > > + 

Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-03 Thread Alex Williamson
On Fri, 3 Apr 2020 13:30:49 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> > From: Alex Williamson 
> > Sent: Friday, April 3, 2020 3:57 AM
> > To: Liu, Yi L 
> > 
> > On Sun, 22 Mar 2020 05:32:03 -0700
> > "Liu, Yi L"  wrote:
> >   
> > > From: Liu Yi L 
> > >
> > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware
> > > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > > translation). For such hardware IOMMUs, there are two stages/levels of
> > > address translation, and software may let userspace/VM to own the first-
> > > level/stage-1 translation structures. Example of such usage is vSVA (
> > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > > translation structures and bind the structures to host, then hardware
> > > IOMMU would utilize nesting translation when doing DMA translation fo
> > > the devices behind such hardware IOMMU.
> > >
> > > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind
> > > guest page table is needed, it also requires to expose interface to guest
> > > for iommu cache invalidation when guest modified the first-level/stage-1
> > > translation structures since hardware needs to be notified to flush stale
> > > iotlbs. This would be introduced in next patch.
> > >
> > > In this patch, guest page table bind and unbind are done by using flags
> > > VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL  
> > under IOCTL  
> > > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > > VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST.
> > >
> > > Bind guest translation structures (here is guest page table) to host
> > > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> > >
> > > Cc: Kevin Tian 
> > > CC: Jacob Pan 
> > > Cc: Alex Williamson 
> > > Cc: Eric Auger 
> > > Cc: Jean-Philippe Brucker 
> > > Signed-off-by: Jean-Philippe Brucker 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 158  
> >   
> > >  include/uapi/linux/vfio.h   |  46 
> > >  2 files changed, 204 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 82a9e0b..a877747 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -130,6 +130,33 @@ struct vfio_regions {
> > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> > >   (!list_empty(>domain_list))
> > >
> > > +struct domain_capsule {
> > > + struct iommu_domain *domain;
> > > + void *data;
> > > +};
> > > +
> > > +/* iommu->lock must be held */
> > > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > > +   int (*fn)(struct device *dev, void *data),
> > > +   void *data)
> > > +{
> > > + struct domain_capsule dc = {.data = data};
> > > + struct vfio_domain *d;
> > > + struct vfio_group *g;
> > > + int ret = 0;
> > > +
> > > + list_for_each_entry(d, >domain_list, next) {
> > > + dc.domain = d->domain;
> > > + list_for_each_entry(g, >group_list, next) {
> > > + ret = iommu_group_for_each_dev(g->iommu_group,
> > > +, fn);
> > > + if (ret)
> > > + break;
> > > + }
> > > + }
> > > + return ret;
> > > +}
> > > +
> > >  static int put_pfn(unsigned long pfn, int prot);
> > >
> > >  /*
> > > @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct  
> > vfio_iommu *iommu,  
> > >   return 0;
> > >  }
> > >
> > > +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + struct iommu_gpasid_bind_data *gbind_data =
> > > + (struct iommu_gpasid_bind_data *) dc->data;
> > > +
> > > + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data);
> > > +}
> > > +
> > > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + struct iommu_gpasid_bind_data *gbind_data =
> > > + (struct iommu_gpasid_bind_data *) dc->data;
> > > +
> > > + return iommu_sva_unbind_gpasid(dc->domain, dev,
> > > + gbind_data->hpasid);
> > > +}
> > > +
> > > +/**
> > > + * Unbind specific gpasid, caller of this function requires hold
> > > + * vfio_iommu->lock
> > > + */
> > > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > > + struct iommu_gpasid_bind_data *gbind_data)
> > > +{
> > > + return vfio_iommu_for_each_dev(iommu,
> > > + vfio_unbind_gpasid_fn, 

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-03 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 3:57 AM
> To: Liu, Yi L 
> 
> On Sun, 22 Mar 2020 05:32:03 -0700
> "Liu, Yi L"  wrote:
> 
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> >
> > This patch adds vfio support for binding guest translation (a.k.a stage 1)
> > structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind
> > guest page table is needed, it also requires to expose interface to guest
> > for iommu cache invalidation when guest modified the first-level/stage-1
> > translation structures since hardware needs to be notified to flush stale
> > iotlbs. This would be introduced in next patch.
> >
> > In this patch, guest page table bind and unbind are done by using flags
> > VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL
> under IOCTL
> > VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > struct iommu_gpasid_bind_data. Before binding guest page table to host,
> > VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST.
> >
> > Bind guest translation structures (here is guest page table) to host
> > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 158
> 
> >  include/uapi/linux/vfio.h   |  46 
> >  2 files changed, 204 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 82a9e0b..a877747 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -130,6 +130,33 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > (!list_empty(>domain_list))
> >
> > +struct domain_capsule {
> > +   struct iommu_domain *domain;
> > +   void *data;
> > +};
> > +
> > +/* iommu->lock must be held */
> > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > + int (*fn)(struct device *dev, void *data),
> > + void *data)
> > +{
> > +   struct domain_capsule dc = {.data = data};
> > +   struct vfio_domain *d;
> > +   struct vfio_group *g;
> > +   int ret = 0;
> > +
> > +   list_for_each_entry(d, >domain_list, next) {
> > +   dc.domain = d->domain;
> > +   list_for_each_entry(g, >group_list, next) {
> > +   ret = iommu_group_for_each_dev(g->iommu_group,
> > +  , fn);
> > +   if (ret)
> > +   break;
> > +   }
> > +   }
> > +   return ret;
> > +}
> > +
> >  static int put_pfn(unsigned long pfn, int prot);
> >
> >  /*
> > @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct
> vfio_iommu *iommu,
> > return 0;
> >  }
> >
> > +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   struct iommu_gpasid_bind_data *gbind_data =
> > +   (struct iommu_gpasid_bind_data *) dc->data;
> > +
> > +   return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data);
> > +}
> > +
> > +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   struct iommu_gpasid_bind_data *gbind_data =
> > +   (struct iommu_gpasid_bind_data *) dc->data;
> > +
> > +   return iommu_sva_unbind_gpasid(dc->domain, dev,
> > +   gbind_data->hpasid);
> > +}
> > +
> > +/**
> > + * Unbind specific gpasid, caller of this function requires hold
> > + * vfio_iommu->lock
> > + */
> > +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   return vfio_iommu_for_each_dev(iommu,
> > +   vfio_unbind_gpasid_fn, gbind_data);
> > +}
> > +
> > +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> > +   struct iommu_gpasid_bind_data *gbind_data)
> > +{
> > +   int ret = 0;
> > +
> > +   mutex_lock(>lock);
> > +   if 

Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-03 Thread Jean-Philippe Brucker
On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > >   default:
> > > > >   return -EINVAL;
> > > > >   }
> > > > > +
> > > > > + } else if (cmd == VFIO_IOMMU_BIND) {
> > > >
> > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > >
> > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > cover the three cases below:
> > > a) BIND/UNBIND_GPASID
> > > b) BIND/UNBIND_GPASID_TABLE
> > > c) BIND/UNBIND_PROCESS
> > > 
> > > So it's called VFIO_IOMMU_BIND.
> > 
> > but aren't they all about PASID related binding?
> 
> yeah, I can rename it. :-)

I don't know if anyone intends to implement it, but SMMUv2 supports
nesting translation without any PASID support. For that case the name
VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense.
Ideally we'd also use a neutral name for the IOMMU API instead of
bind_gpasid(), but that's easier to change later.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-02 Thread Alex Williamson
On Sun, 22 Mar 2020 05:32:03 -0700
"Liu, Yi L"  wrote:

> From: Liu Yi L 
> 
> VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware
> IOMMUs that have nesting DMA translation (a.k.a dual stage address
> translation). For such hardware IOMMUs, there are two stages/levels of
> address translation, and software may let userspace/VM to own the first-
> level/stage-1 translation structures. Example of such usage is vSVA (
> virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> translation structures and bind the structures to host, then hardware
> IOMMU would utilize nesting translation when doing DMA translation fo
> the devices behind such hardware IOMMU.
> 
> This patch adds vfio support for binding guest translation (a.k.a stage 1)
> structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind
> guest page table is needed, it also requires to expose interface to guest
> for iommu cache invalidation when guest modified the first-level/stage-1
> translation structures since hardware needs to be notified to flush stale
> iotlbs. This would be introduced in next patch.
> 
> In this patch, guest page table bind and unbind are done by using flags
> VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> struct iommu_gpasid_bind_data. Before binding guest page table to host,
> VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST.
> 
> Bind guest translation structures (here is guest page table) to host
> are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 158 
> 
>  include/uapi/linux/vfio.h   |  46 
>  2 files changed, 204 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 82a9e0b..a877747 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -130,6 +130,33 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)
> +{
> + struct domain_capsule dc = {.data = data};
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + int ret = 0;
> +
> + list_for_each_entry(d, >domain_list, next) {
> + dc.domain = d->domain;
> + list_for_each_entry(g, >group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group,
> +, fn);
> + if (ret)
> + break;
> + }
> + }
> + return ret;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
>  
>  /*
> @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct 
> vfio_iommu *iommu,
>   return 0;
>  }
>  
> +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + struct iommu_gpasid_bind_data *gbind_data =
> + (struct iommu_gpasid_bind_data *) dc->data;
> +
> + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data);
> +}
> +
> +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + struct iommu_gpasid_bind_data *gbind_data =
> + (struct iommu_gpasid_bind_data *) dc->data;
> +
> + return iommu_sva_unbind_gpasid(dc->domain, dev,
> + gbind_data->hpasid);
> +}
> +
> +/**
> + * Unbind specific gpasid, caller of this function requires hold
> + * vfio_iommu->lock
> + */
> +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu,
> + struct iommu_gpasid_bind_data *gbind_data)
> +{
> + return vfio_iommu_for_each_dev(iommu,
> + vfio_unbind_gpasid_fn, gbind_data);
> +}
> +
> +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu,
> + struct iommu_gpasid_bind_data *gbind_data)
> +{
> + int ret = 0;
> +
> + mutex_lock(>lock);
> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + ret = vfio_iommu_for_each_dev(iommu,
> + vfio_bind_gpasid_fn, gbind_data);
> + /*
> +  * If bind failed, it 

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-02 Thread Liu, Yi L
> From: Tian, Kevin 
> Sent: Thursday, April 2, 2020 10:12 AM
> To: Liu, Yi L ; alex.william...@redhat.com;
> Subject: RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, April 1, 2020 5:13 PM
> >
> > > From: Tian, Kevin 
> > > Sent: Monday, March 30, 2020 8:46 PM
> > > Subject: RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to
> > > host
> > >
> > > > From: Liu, Yi L 
> > > > Sent: Sunday, March 22, 2020 8:32 PM
> > > >
> > > > From: Liu Yi L 
> > > >
> > > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> > hardware
> > > > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > > > translation). For such hardware IOMMUs, there are two
> > > > stages/levels of address translation, and software may let
> > > > userspace/VM to own the
> > > > first-
> > > > level/stage-1 translation structures. Example of such usage is
> > > > vSVA ( virtual Shared Virtual Addressing). VM owns the
> > > > first-level/stage-1 translation structures and bind the structures
> > > > to host, then hardware IOMMU would utilize nesting translation
> > > > when doing DMA translation fo the devices behind such hardware IOMMU.
> > > >
> > > > This patch adds vfio support for binding guest translation (a.k.a
> > > > stage 1) structure to host iommu. And for
> > > > VFIO_TYPE1_NESTING_IOMMU, not only bind guest page table is
> > > > needed, it also requires to expose interface to guest for iommu
> > > > cache invalidation when guest modified the first-level/stage-1
> > > > translation structures since hardware needs to be notified to
> > > > flush stale iotlbs. This would be introduced in next patch.
> > > >
> > > > In this patch, guest page table bind and unbind are done by using
> > > > flags VFIO_IOMMU_BIND_GUEST_PGTBL and
> > > VFIO_IOMMU_UNBIND_GUEST_PGTBL
> > > > under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > > > struct iommu_gpasid_bind_data. Before binding guest page table to
> > > > host, VM should have got a PASID allocated by host via
> > > > VFIO_IOMMU_PASID_REQUEST.
> > > >
> > > > Bind guest translation structures (here is guest page table) to
> > > > host
> > >
> > > Bind -> Binding
> > got it.
> > > > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> > >
> > > are -> is. and you already explained vSVA earlier.
> > oh yes, it is.
> > > >
> > > > Cc: Kevin Tian 
> > > > CC: Jacob Pan 
> > > > Cc: Alex Williamson 
> > > > Cc: Eric Auger 
> > > > Cc: Jean-Philippe Brucker 
> > > > Signed-off-by: Jean-Philippe Brucker 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 158
> > > > 
> > > >  include/uapi/linux/vfio.h   |  46 
> > > >  2 files changed, 204 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -130,6 +130,33 @@ struct vfio_regions {
> > > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > > > 
> > > > (!list_empty(>domain_list))
> > > >
> > > > +struct domain_capsule {
> > > > +   struct iommu_domain *domain;
> > > > +   void *data;
> > > > +};
> > > > +
> > > > +/* iommu->lock must be held */
> > > > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > > > + int (*fn)(struct device *dev, void *data),
> > > > + void *data)
> > > > +{
> > > > +   struct domain_capsule dc = {.data = data};
> > > > +   struct vfio_domain *d;
> > > > +   struct vfio_group *g;
> > > > +   int ret = 0;
> > > > +
> > > > +   list_for_each_entry(d, >domain_list, next) {
> > > > +   d

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-01 Thread Tian, Kevin
> From: Liu, Yi L 
> Sent: Wednesday, April 1, 2020 5:13 PM
> 
> > From: Tian, Kevin 
> > Sent: Monday, March 30, 2020 8:46 PM
> > Subject: RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> >
> > > From: Liu, Yi L 
> > > Sent: Sunday, March 22, 2020 8:32 PM
> > >
> > > From: Liu Yi L 
> > >
> > > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> hardware
> > > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > > translation). For such hardware IOMMUs, there are two stages/levels of
> > > address translation, and software may let userspace/VM to own the
> > > first-
> > > level/stage-1 translation structures. Example of such usage is vSVA (
> > > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > > translation structures and bind the structures to host, then hardware
> > > IOMMU would utilize nesting translation when doing DMA translation fo
> > > the devices behind such hardware IOMMU.
> > >
> > > This patch adds vfio support for binding guest translation (a.k.a
> > > stage 1) structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU,
> > > not only bind guest page table is needed, it also requires to expose
> > > interface to guest for iommu cache invalidation when guest modified
> > > the first-level/stage-1 translation structures since hardware needs to
> > > be notified to flush stale iotlbs. This would be introduced in next
> > > patch.
> > >
> > > In this patch, guest page table bind and unbind are done by using
> > > flags VFIO_IOMMU_BIND_GUEST_PGTBL and
> > VFIO_IOMMU_UNBIND_GUEST_PGTBL
> > > under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > > struct iommu_gpasid_bind_data. Before binding guest page table to
> > > host, VM should have got a PASID allocated by host via
> > > VFIO_IOMMU_PASID_REQUEST.
> > >
> > > Bind guest translation structures (here is guest page table) to host
> >
> > Bind -> Binding
> got it.
> > > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> >
> > are -> is. and you already explained vSVA earlier.
> oh yes, it is.
> > >
> > > Cc: Kevin Tian 
> > > CC: Jacob Pan 
> > > Cc: Alex Williamson 
> > > Cc: Eric Auger 
> > > Cc: Jean-Philippe Brucker 
> > > Signed-off-by: Jean-Philippe Brucker 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 158
> > > 
> > >  include/uapi/linux/vfio.h   |  46 
> > >  2 files changed, 204 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -130,6 +130,33 @@ struct vfio_regions {
> > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> > >   (!list_empty(>domain_list))
> > >
> > > +struct domain_capsule {
> > > + struct iommu_domain *domain;
> > > + void *data;
> > > +};
> > > +
> > > +/* iommu->lock must be held */
> > > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > > +   int (*fn)(struct device *dev, void *data),
> > > +   void *data)
> > > +{
> > > + struct domain_capsule dc = {.data = data};
> > > + struct vfio_domain *d;
> > > + struct vfio_group *g;
> > > + int ret = 0;
> > > +
> > > + list_for_each_entry(d, >domain_list, next) {
> > > + dc.domain = d->domain;
> > > + list_for_each_entry(g, >group_list, next) {
> > > + ret = iommu_group_for_each_dev(g->iommu_group,
> > > +, fn);
> > > + if (ret)
> > > + break;
> > > + }
> > > + }
> > > + return ret;
> > > +}
> > > +
> > >  static int put_pfn(unsigned long pfn, int prot);
> > >
> > >  /*
> > > @@ -2314,6 +2341,88 @@ static int
> > > vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > >   return 0;
> > >  }
> > >
> > > +static int vfio_bind_gpasid_fn(struct 

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-01 Thread Liu, Yi L
> From: Tian, Kevin 
> Sent: Monday, March 30, 2020 8:46 PM
> Subject: RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> > From: Liu, Yi L 
> > Sent: Sunday, March 22, 2020 8:32 PM
> >
> > From: Liu Yi L 
> >
> > VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware
> > IOMMUs that have nesting DMA translation (a.k.a dual stage address
> > translation). For such hardware IOMMUs, there are two stages/levels of
> > address translation, and software may let userspace/VM to own the
> > first-
> > level/stage-1 translation structures. Example of such usage is vSVA (
> > virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> > translation structures and bind the structures to host, then hardware
> > IOMMU would utilize nesting translation when doing DMA translation fo
> > the devices behind such hardware IOMMU.
> >
> > This patch adds vfio support for binding guest translation (a.k.a
> > stage 1) structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU,
> > not only bind guest page table is needed, it also requires to expose
> > interface to guest for iommu cache invalidation when guest modified
> > the first-level/stage-1 translation structures since hardware needs to
> > be notified to flush stale iotlbs. This would be introduced in next
> > patch.
> >
> > In this patch, guest page table bind and unbind are done by using
> > flags VFIO_IOMMU_BIND_GUEST_PGTBL and
> VFIO_IOMMU_UNBIND_GUEST_PGTBL
> > under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> > struct iommu_gpasid_bind_data. Before binding guest page table to
> > host, VM should have got a PASID allocated by host via
> > VFIO_IOMMU_PASID_REQUEST.
> >
> > Bind guest translation structures (here is guest page table) to host
> 
> Bind -> Binding
got it.
> > are the first step to setup vSVA (Virtual Shared Virtual Addressing).
> 
> are -> is. and you already explained vSVA earlier.
oh yes, it is.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 158
> > 
> >  include/uapi/linux/vfio.h   |  46 
> >  2 files changed, 204 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -130,6 +130,33 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > (!list_empty(>domain_list))
> >
> > +struct domain_capsule {
> > +   struct iommu_domain *domain;
> > +   void *data;
> > +};
> > +
> > +/* iommu->lock must be held */
> > +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> > + int (*fn)(struct device *dev, void *data),
> > + void *data)
> > +{
> > +   struct domain_capsule dc = {.data = data};
> > +   struct vfio_domain *d;
> > +   struct vfio_group *g;
> > +   int ret = 0;
> > +
> > +   list_for_each_entry(d, >domain_list, next) {
> > +   dc.domain = d->domain;
> > +   list_for_each_entry(g, >group_list, next) {
> > +   ret = iommu_group_for_each_dev(g->iommu_group,
> > +  , fn);
> > +   if (ret)
> > +   break;
> > +   }
> > +   }
> > +   return ret;
> > +}
> > +
> >  static int put_pfn(unsigned long pfn, int prot);
> >
> >  /*
> > @@ -2314,6 +2341,88 @@ static int
> > vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > return 0;
> >  }
> >
> > +static int vfio_bind_gpasid_fn(struct device *dev, void *data) {
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   struct iommu_gpasid_bind_data *gbind_data =
> > +   (struct iommu_gpasid_bind_data *) dc->data;
> > +
> 
> In Jacob's vSVA iommu series, [PATCH 06/11]:
> 
> + /* REVISIT: upper layer/VFIO can track host process that bind 
> the
> PASID.
> +  * ioasid_set = mm might be sufficient for vfio to check pasid 
> VMM
> +  * ownership.
> +  */
>

RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-03-30 Thread Tian, Kevin
> From: Liu, Yi L 
> Sent: Sunday, March 22, 2020 8:32 PM
> 
> From: Liu Yi L 
> 
> VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by
> hardware
> IOMMUs that have nesting DMA translation (a.k.a dual stage address
> translation). For such hardware IOMMUs, there are two stages/levels of
> address translation, and software may let userspace/VM to own the first-
> level/stage-1 translation structures. Example of such usage is vSVA (
> virtual Shared Virtual Addressing). VM owns the first-level/stage-1
> translation structures and bind the structures to host, then hardware
> IOMMU would utilize nesting translation when doing DMA translation fo
> the devices behind such hardware IOMMU.
> 
> This patch adds vfio support for binding guest translation (a.k.a stage 1)
> structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only
> bind
> guest page table is needed, it also requires to expose interface to guest
> for iommu cache invalidation when guest modified the first-level/stage-1
> translation structures since hardware needs to be notified to flush stale
> iotlbs. This would be introduced in next patch.
> 
> In this patch, guest page table bind and unbind are done by using flags
> VFIO_IOMMU_BIND_GUEST_PGTBL and
> VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL
> VFIO_IOMMU_BIND, the bind/unbind data are conveyed by
> struct iommu_gpasid_bind_data. Before binding guest page table to host,
> VM should have got a PASID allocated by host via
> VFIO_IOMMU_PASID_REQUEST.
> 
> Bind guest translation structures (here is guest page table) to host

Bind -> Binding

> are the first step to setup vSVA (Virtual Shared Virtual Addressing).

are -> is. and you already explained vSVA earlier.

> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 158
> 
>  include/uapi/linux/vfio.h   |  46 
>  2 files changed, 204 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 82a9e0b..a877747 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -130,6 +130,33 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
> 
> +struct domain_capsule {
> + struct iommu_domain *domain;
> + void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu,
> +   int (*fn)(struct device *dev, void *data),
> +   void *data)
> +{
> + struct domain_capsule dc = {.data = data};
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + int ret = 0;
> +
> + list_for_each_entry(d, >domain_list, next) {
> + dc.domain = d->domain;
> + list_for_each_entry(g, >group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group,
> +, fn);
> + if (ret)
> + break;
> + }
> + }
> + return ret;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
> 
>  /*
> @@ -2314,6 +2341,88 @@ static int
> vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
>   return 0;
>  }
> 
> +static int vfio_bind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + struct iommu_gpasid_bind_data *gbind_data =
> + (struct iommu_gpasid_bind_data *) dc->data;
> +

In Jacob's vSVA iommu series, [PATCH 06/11]:

+   /* REVISIT: upper layer/VFIO can track host process that bind 
the PASID.
+* ioasid_set = mm might be sufficient for vfio to check pasid 
VMM
+* ownership.
+*/

I asked him who exactly should be responsible for tracking the pasid
ownership. Although no response yet, I expect vfio/iommu can have
a clear policy and also documented here to provide consistent 
message.

> + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data);
> +}
> +
> +static int vfio_unbind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + struct iommu_gpasid_bind_data *gbind_data =
> + (struct iommu_gpasid_bind_data *) dc->data;
> +
> + return iommu_sva_unbind_gpasid(dc->domain, dev,
> + gbind_data->hpasid);

curious why we have to share the same bind_data structure
between bind and unbind, especially when unbind requires
only one field? I didn't see a clear reason, and just similar
to earlier ALLOC/FREE which don't share structure either.
Current way simply wastes space for unbind operation...

> +}
> +
> +/**
> + 

Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-03-22 Thread kbuild test robot
Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Liu-Yi-L/vfio-expose-virtual-Shared-Virtual-Addressing-to-VMs/20200322-213259
base:   https://github.com/awilliam/linux-vfio.git next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_get_stage1_format':
   drivers/vfio/vfio_iommu_type1.c:2300:4: error: 'DOMAIN_ATTR_PASID_FORMAT' 
undeclared (first use in this function)
2300 |DOMAIN_ATTR_PASID_FORMAT, )) {
 |^~~~
   drivers/vfio/vfio_iommu_type1.c:2300:4: note: each undeclared identifier is 
reported only once for each function it appears in
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_type1_ioctl':
   drivers/vfio/vfio_iommu_type1.c:2464:11: error: implicit declaration of 
function 'iommu_get_uapi_version' [-Werror=implicit-function-declaration]
2464 |return iommu_get_uapi_version();
 |   ^~
>> drivers/vfio/vfio_iommu_type1.c:2626:15: error: implicit declaration of 
>> function 'iommu_uapi_get_data_size' [-Werror=implicit-function-declaration]
2626 |   data_size = iommu_uapi_get_data_size(
 |   ^~~~
>> drivers/vfio/vfio_iommu_type1.c:2627:5: error: 'IOMMU_UAPI_BIND_GPASID' 
>> undeclared (first use in this function)
2627 | IOMMU_UAPI_BIND_GPASID, version);
 | ^~
   cc1: some warnings being treated as errors

vim +/iommu_uapi_get_data_size +2626 drivers/vfio/vfio_iommu_type1.c

  2446  
  2447  static long vfio_iommu_type1_ioctl(void *iommu_data,
  2448 unsigned int cmd, unsigned long arg)
  2449  {
  2450  struct vfio_iommu *iommu = iommu_data;
  2451  unsigned long minsz;
  2452  
  2453  if (cmd == VFIO_CHECK_EXTENSION) {
  2454  switch (arg) {
  2455  case VFIO_TYPE1_IOMMU:
  2456  case VFIO_TYPE1v2_IOMMU:
  2457  case VFIO_TYPE1_NESTING_IOMMU:
  2458  return 1;
  2459  case VFIO_DMA_CC_IOMMU:
  2460  if (!iommu)
  2461  return 0;
  2462  return vfio_domains_have_iommu_cache(iommu);
  2463  case VFIO_NESTING_IOMMU_UAPI:
  2464  return iommu_get_uapi_version();
  2465  default:
  2466  return 0;
  2467  }
  2468  } else if (cmd == VFIO_IOMMU_GET_INFO) {
  2469  struct vfio_iommu_type1_info info;
  2470  struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
  2471  unsigned long capsz;
  2472  int ret;
  2473  
  2474  minsz = offsetofend(struct vfio_iommu_type1_info, 
iova_pgsizes);
  2475  
  2476  /* For backward compatibility, cannot require this */
  2477  capsz = offsetofend(struct vfio_iommu_type1_info, 
cap_offset);
  2478  
  2479  if (copy_from_user(, (void __user *)arg, minsz))
  2480  return -EFAULT;
  2481  
  2482  if (info.argsz < minsz)
  2483  return -EINVAL;
  2484  
  2485  if (info.argsz >= capsz) {
  2486  minsz = capsz;
  2487  info.cap_offset = 0; /* output, no-recopy 
necessary */
  2488  }
  2489  
  2490  info.flags = VFIO_IOMMU_INFO_PGSIZES;
  2491  
  2492  info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
  2493  
  2494  ret = vfio_iommu_iova_build_caps(iommu, );
  2495  if (ret)
  2496  return ret;
  2497  
  2498  ret = vfio_iommu_info_add_nesting_cap(iommu, );
  2499  if (ret)
  2500  return ret;
  2501  
  2502  if (caps.size) {
  2503  info.flags |= VFIO_IOMMU_INFO_CAPS;
  2504  
  2505  if (info.argsz < sizeof(info) + caps.size) {