RE: [PATCH v7 03/16] vfio/type1: Report iommu nesting info to userspace

2020-09-12 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Saturday, September 12, 2020 4:17 AM
> 
> On Thu, 10 Sep 2020 03:45:20 -0700
> Liu Yi L  wrote:
> 
> > This patch exports iommu nesting capability info to user space through
> > VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> > PASID alloc/free, bind page table, and cache invalidation) and the
> > vendor specific format information for first level/stage page table
> > that will be bound to.
> >
> > The nesting info is available only after container set to be NESTED type.
> > Current implementation imposes one limitation - one nesting container
> > should include at most one iommu group. The philosophy of vfio
> > container is having all groups/devices within the container share the
> > same IOMMU context. When vSVA is enabled, one IOMMU context could
> > include one 2nd- level address space and multiple 1st-level address
> > spaces. While the 2nd-level address space is reasonably sharable by
> > multiple groups, blindly sharing 1st-level address spaces across all
> > groups within the container might instead break the guest expectation.
> > In the future sub/super container concept might be introduced to allow
> > partial address space sharing within an IOMMU context. But for now
> > let's go with this restriction by requiring singleton container for
> > using nesting iommu features. Below link has the related discussion about 
> > this
> decision.
> >
> > https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/
> >
> > This patch also changes the NESTING type container behaviour.
> > Something that would have succeeded before will now fail: Before this
> > series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have
> > succeeded even if the SMMU didn't support stage-2, as the driver would
> > have silently fallen back on stage-1 mappings (which work exactly the
> > same as stage-2 only since there was no nesting supported). After the
> > series, we do check for DOMAIN_ATTR_NESTING so if user asks for
> > VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the
> > ioctl fails. But it should be a good fix and completely harmless. Detail 
> > can be found
> in below link as well.
> >
> > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > ---
> > v6 -> v7:
> > *) using vfio_info_add_capability() for adding nesting cap per suggestion
> >from Eric.
> >
> > v5 -> v6:
> > *) address comments against v5 from Eric Auger.
> > *) don't report nesting cap to userspace if the nesting_info->format is
> >invalid.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) return struct iommu_nesting_info for
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> >cap is much "cheap", if needs extension in future, just define another 
> > cap.
> >https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/
> >
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 92 +++-
> -
> >  include/uapi/linux/vfio.h   | 19 +
> >  2 files changed, 99 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index c992973..3c0048b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  "Maximum number of user DMA mappings per container (65535).");
> >
> >  struct vfio_iommu {
> > -   struct list_headdomain_list;
> > -   struct list_headiova_list;
> > -   struct vfio_domain  *external_domain; /* domain for external user */
> > -   struct mutexlock;
> > -   struct rb_root  dma_list;
> > -   struct blocking_notifier_head notifier;
> > -   unsigned intdma_avail;
> > -   uint64_tpgsize_bitmap;
> > -   boolv2;
> > -   boolnesting;
> > -   booldirty_page_tracking;
> > -   boolpinned_page_dirty_scope;
> > +   struct list_headdomain_list;
> > +   struct list_headiova_list;
> > +   /* domain for external user */
> > +   struct vfio_domain  *external_domain;
> > +   struct mutexlock;
> > +   struct rb_root  dma_list;
> > +   struct blocking_notifier_head   notifier;
> > +   unsigned intdma_avail;
> > +   uint64_tpgsize_bitmap;
> > +   boolv2;
> > +   boolnesting;
> > +   booldirty_page_tracking;
> > +   boolpinned_page_dirty_scope;
> > +   struct iommu_nesting_info   

Re: [PATCH v7 03/16] vfio/type1: Report iommu nesting info to userspace

2020-09-11 Thread Alex Williamson
On Thu, 10 Sep 2020 03:45:20 -0700
Liu Yi L  wrote:

> This patch exports iommu nesting capability info to user space through
> VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
> 
> The nesting info is available only after container set to be NESTED type.
> Current implementation imposes one limitation - one nesting container
> should include at most one iommu group. The philosophy of vfio container
> is having all groups/devices within the container share the same IOMMU
> context. When vSVA is enabled, one IOMMU context could include one 2nd-
> level address space and multiple 1st-level address spaces. While the
> 2nd-level address space is reasonably sharable by multiple groups, blindly
> sharing 1st-level address spaces across all groups within the container
> might instead break the guest expectation. In the future sub/super container
> concept might be introduced to allow partial address space sharing within
> an IOMMU context. But for now let's go with this restriction by requiring
> singleton container for using nesting iommu features. Below link has the
> related discussion about this decision.
> 
> https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/
> 
> This patch also changes the NESTING type container behaviour. Something
> that would have succeeded before will now fail: Before this series, if
> user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> if the SMMU didn't support stage-2, as the driver would have silently
> fallen back on stage-1 mappings (which work exactly the same as stage-2
> only since there was no nesting supported). After the series, we do check
> for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
> the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> fix and completely harmless. Detail can be found in below link as well.
> 
> https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> ---
> v6 -> v7:
> *) using vfio_info_add_capability() for adding nesting cap per suggestion
>from Eric.
> 
> v5 -> v6:
> *) address comments against v5 from Eric Auger.
> *) don't report nesting cap to userspace if the nesting_info->format is
>invalid.
> 
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
>cap is much "cheap", if needs extension in future, just define another cap.
>https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/
> 
> v3 -> v4:
> *) address comments against v3.
> 
> v1 -> v2:
> *) added in v2
> ---
>  drivers/vfio/vfio_iommu_type1.c | 92 
> +++--
>  include/uapi/linux/vfio.h   | 19 +
>  2 files changed, 99 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c992973..3c0048b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
>"Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> - struct list_headdomain_list;
> - struct list_headiova_list;
> - struct vfio_domain  *external_domain; /* domain for external user */
> - struct mutexlock;
> - struct rb_root  dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned intdma_avail;
> - uint64_tpgsize_bitmap;
> - boolv2;
> - boolnesting;
> - booldirty_page_tracking;
> - boolpinned_page_dirty_scope;
> + struct list_headdomain_list;
> + struct list_headiova_list;
> + /* domain for external user */
> + struct vfio_domain  *external_domain;
> + struct mutexlock;
> + struct rb_root  dma_list;
> + struct blocking_notifier_head   notifier;
> + unsigned intdma_avail;
> + uint64_tpgsize_bitmap;
> + boolv2;
> + boolnesting;
> + booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
> + struct iommu_nesting_info   *nesting_info;

Nit, not as important as the previous alignment, but might as well move
this up with the uint64_t pgsize_bitmap with the bools at the end of
the structure to avoid adding new gaps.

>  };
>  
>  struct vfio_domain {
> @@ -130,6 +132,9 

[PATCH v7 03/16] vfio/type1: Report iommu nesting info to userspace

2020-09-10 Thread Liu Yi L
This patch exports iommu nesting capability info to user space through
VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after container set to be NESTED type.
Current implementation imposes one limitation - one nesting container
should include at most one iommu group. The philosophy of vfio container
is having all groups/devices within the container share the same IOMMU
context. When vSVA is enabled, one IOMMU context could include one 2nd-
level address space and multiple 1st-level address spaces. While the
2nd-level address space is reasonably sharable by multiple groups, blindly
sharing 1st-level address spaces across all groups within the container
might instead break the guest expectation. In the future sub/super container
concept might be introduced to allow partial address space sharing within
an IOMMU context. But for now let's go with this restriction by requiring
singleton container for using nesting iommu features. Below link has the
related discussion about this decision.

https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/

This patch also changes the NESTING type container behaviour. Something
that would have succeeded before will now fail: Before this series, if
user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
if the SMMU didn't support stage-2, as the driver would have silently
fallen back on stage-1 mappings (which work exactly the same as stage-2
only since there was no nesting supported). After the series, we do check
for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
fix and completely harmless. Detail can be found in below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v6 -> v7:
*) using vfio_info_add_capability() for adding nesting cap per suggestion
   from Eric.

v5 -> v6:
*) address comments against v5 from Eric Auger.
*) don't report nesting cap to userspace if the nesting_info->format is
   invalid.

v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 92 +++--
 include/uapi/linux/vfio.h   | 19 +
 2 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c992973..3c0048b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-   struct list_headdomain_list;
-   struct list_headiova_list;
-   struct vfio_domain  *external_domain; /* domain for external user */
-   struct mutexlock;
-   struct rb_root  dma_list;
-   struct blocking_notifier_head notifier;
-   unsigned intdma_avail;
-   uint64_tpgsize_bitmap;
-   boolv2;
-   boolnesting;
-   booldirty_page_tracking;
-   boolpinned_page_dirty_scope;
+   struct list_headdomain_list;
+   struct list_headiova_list;
+   /* domain for external user */
+   struct vfio_domain  *external_domain;
+   struct mutexlock;
+   struct rb_root  dma_list;
+   struct blocking_notifier_head   notifier;
+   unsigned intdma_avail;
+   uint64_tpgsize_bitmap;
+   boolv2;
+   boolnesting;
+   booldirty_page_tracking;
+   boolpinned_page_dirty_scope;
+   struct iommu_nesting_info   *nesting_info;
 };
 
 struct vfio_domain {
@@ -130,6 +132,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define CONTAINER_HAS_DOMAIN(iommu)(((iommu)->external_domain) || \
+(!list_empty(&(iommu)->domain_list)))
+
 #define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
 
 /*
@@