Re: [PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-08 Thread Qian Cai


> On Nov 8, 2019, at 10:40 PM, Lu Baolu  wrote:
> 
> This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> to make it easier for distributions to enable or disable the
> Intel IOMMU scalable mode by default during kernel build.
> 
> Signed-off-by: Lu Baolu 
> ---
> drivers/iommu/Kconfig   | 9 +
> drivers/iommu/intel-iommu.c | 7 ++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e3842eabcfdd..fbdf3fd291d9 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -242,6 +242,15 @@ config INTEL_IOMMU_FLOPPY_WA
> workaround will setup a 1:1 mapping for the first
> 16MiB to make floppy (an ISA device) work.
> 
> +config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> + prompt "Enable Intel IOMMU scalable mode by default"
> + depends on INTEL_IOMMU
> + help
> +   Selecting this option will enable the scalable mode if
> +   hardware presents the capability. If this option is not
> +   selected, scalable mode support could also be enabled
> +   by passing intel_iommu=sm_on to the kernel.


Maybe a sentence or two to describe what the scalable mode is in layman's
terms could be useful, so developers don’t need to search around for the
Kconfig selection?

> +
> config IRQ_REMAP
>   bool "Support for Interrupt Remapping"
>   depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6db6d969e31c..6051fe790c61 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -355,9 +355,14 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
> iommu_domain *domain,
> int dmar_disabled = 0;
> #else
> int dmar_disabled = 1;
> -#endif /*CONFIG_INTEL_IOMMU_DEFAULT_ON*/
> +#endif /* CONFIG_INTEL_IOMMU_DEFAULT_ON */
> 
> +#ifdef INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
> +int intel_iommu_sm = 1;
> +#else
> int intel_iommu_sm;
> +#endif /* INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON */
> +
> int intel_iommu_enabled = 0;
> EXPORT_SYMBOL_GPL(intel_iommu_enabled);
> 
> -- 
> 2.17.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/1] iommu/vt-d: Add Kconfig option to enable/disable scalable mode

2019-11-08 Thread Lu Baolu
This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
to make it easier for distributions to enable or disable the
Intel IOMMU scalable mode by default during kernel build.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/Kconfig   | 9 +
 drivers/iommu/intel-iommu.c | 7 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..fbdf3fd291d9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -242,6 +242,15 @@ config INTEL_IOMMU_FLOPPY_WA
  workaround will setup a 1:1 mapping for the first
  16MiB to make floppy (an ISA device) work.
 
+config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
+   prompt "Enable Intel IOMMU scalable mode by default"
+   depends on INTEL_IOMMU
+   help
+ Selecting this option will enable the scalable mode if
+ hardware presents the capability. If this option is not
+ selected, scalable mode support could also be enabled
+ by passing intel_iommu=sm_on to the kernel.
+
 config IRQ_REMAP
bool "Support for Interrupt Remapping"
depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6db6d969e31c..6051fe790c61 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -355,9 +355,14 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
iommu_domain *domain,
 int dmar_disabled = 0;
 #else
 int dmar_disabled = 1;
-#endif /*CONFIG_INTEL_IOMMU_DEFAULT_ON*/
+#endif /* CONFIG_INTEL_IOMMU_DEFAULT_ON */
 
+#ifdef INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
+int intel_iommu_sm = 1;
+#else
 int intel_iommu_sm;
+#endif /* INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON */
+
 int intel_iommu_enabled = 0;
 EXPORT_SYMBOL_GPL(intel_iommu_enabled);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros

2019-11-08 Thread Lu Baolu

Hi Eric,

On 11/8/19 11:58 PM, Eric Auger wrote:

For both PASID-based-Device-TLB Invalidate Descriptor and
Device-TLB Invalidate Descriptor, the Physical Function Source-ID
value is split according to this layout:

PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52.
Fix the part laid out at offset 52.

Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID")
Signed-off-by: Eric Auger 


Please cc this to stable as well.

Cc: sta...@vger.kernel.org # v4.19+

Good catch!

Acked-by: Lu Baolu 

Best regards,
baolu



---
  include/linux/intel-iommu.h | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ed11ef594378..6d8bf4bdf240 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -336,7 +336,8 @@ enum {
  #define QI_DEV_IOTLB_SID(sid) ((u64)((sid) & 0x) << 32)
  #define QI_DEV_IOTLB_QDEP(qdep)   (((qdep) & 0x1f) << 16)
  #define QI_DEV_IOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
-#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 
0xfff) << 52))
+#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
+  ((u64)((pfsid >> 4) & 0xfff) << 52))
  #define QI_DEV_IOTLB_SIZE 1
  #define QI_DEV_IOTLB_MAX_INVS 32
  
@@ -360,7 +361,8 @@ enum {

  #define QI_DEV_EIOTLB_PASID(p)(((u64)p) << 32)
  #define QI_DEV_EIOTLB_SID(sid)((u64)((sid) & 0x) << 16)
  #define QI_DEV_EIOTLB_QDEP(qd)((u64)((qd) & 0x1f) << 4)
-#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 
0xfff) << 52))
+#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
+   ((u64)((pfsid >> 4) & 0xfff) << 52))
  #define QI_DEV_EIOTLB_MAX_INVS32
  
  /* Page group response descriptor QW0 */



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread Lu Baolu

Hi,

On 11/9/19 6:28 AM, Deepa Dinamani wrote:

+ x86_platform.iommu_shutdown = intel_iommu_shutdown;


How about moving it to detect_intel_iommu() in drivers/iommu/dmar.c? And


Ok, makes sense to move it along with the init handler.


make sure that it's included with CONFIG_X86_64.


You mean CONFIG_X86 like the init that is already there?

#ifdef CONFIG_X86
 if (!ret)
 x86_init.iommu.iommu_init = intel_iommu_init;
#endif



Yes.

Also, change the title to "iommu/vt-d: Turn off ..."

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

2019-11-08 Thread Jacob Pan
On Fri, 8 Nov 2019 17:18:10 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 10/24/19 9:55 PM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dmar.c| 46
> > +
> > drivers/iommu/intel-pasid.c |  3 ++- include/linux/intel-iommu.h |
> > 21 + 3 files changed, 65 insertions(+), 5
> > deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 49bb7d76e646..0ce2d32ff99e 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > +   unsigned int size_order, u64 granu, int ih)
> > +{
> > +   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > +   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > +   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > +   desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > +   QI_EIOTLB_AM(size_order);
> > +
> > +   qi_submit_sync(&desc, iommu);
> > +}
> > +
> >  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> >  {
> > @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > +   u32 pasid,  u16 qdep, u64 addr, unsigned
> > size_order, u64 granu) +{
> > +   struct qi_desc desc;
> > +
> > +   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > +   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > +   QI_DEV_IOTLB_PFSID(pfsid);
> > +   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > +   /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > +* The least significant zero bit indicates the
> > invalidation address
> > +* range. VT-d spec 6.5.2.6.
> > +* e.g. address bit 12[0] indicates 8KB, 13[0] indicates
> > 16KB.
> > +*/
> > +   if (!size_order) {
> > +   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;  
> this is desc.qw1
> 
Right, will fix.

Thanks!
> With that fixed and the qi_flush_dev_piotlb init issue spotted by Lu,
> feel free to add my
> 
> Reviewed-by: Eric Auger 
> 
> Thanks
> 
> Eric
> 
> > +   } else {
> > +   unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size_order);
> > +   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;
> > +   }
> > +   qi_submit_sync(&desc, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > +   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> > +
> > +   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> > QI_PC_GRAN(granu) | QI_PC_TYPE;
> > +   qi_submit_sync(&desc, iommu);
> > +}
> >  /*
> >   * Disable Queued Invalidation interface.
> >   */
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index f846a907cfcf..6d7a701ef4d3
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct
> > intel_iommu *iommu, {
> > struct qi_desc desc;
> >  
> > -   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> > QI_PC_PASID(pasid);
> > +   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> > +   QI_PC_PASID(pasid) | QI_PC_TYPE;
> > desc.qw1 = 0;
> > desc.qw2 = 0;
> > desc.qw3 = 0;
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 6c74c71b1ebf..a25fb3a0ea5b
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -332,7 +332,7 @@ enum {
> >  #define QI_IOTLB_GRAN(gran)(((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am) (((u8)am)) +#define
> > QI_IOTLB_AM(am) (((u8)am) & 0x3f) 
> >  #define QI_CC_FM(fm)   (((u64)fm) << 48)
> >  #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -350,16 +350,21 @@ enum {
> >  #define QI_PC_DID(did) (((u64)did) << 16)
> >  #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
> >  
> > -#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation gr

Re: [PATCH] iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros

2019-11-08 Thread Jacob Pan
On Fri,  8 Nov 2019 16:58:03 +0100
Eric Auger  wrote:

> For both PASID-based-Device-TLB Invalidate Descriptor and
> Device-TLB Invalidate Descriptor, the Physical Function Source-ID
> value is split according to this layout:
> 
> PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52.
> Fix the part laid out at offset 52.
> 
> Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID")
> Signed-off-by: Eric Auger 
> ---
>  include/linux/intel-iommu.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ed11ef594378..6d8bf4bdf240 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -336,7 +336,8 @@ enum {
>  #define QI_DEV_IOTLB_SID(sid)((u64)((sid) & 0x) << 32)
>  #define QI_DEV_IOTLB_QDEP(qdep)  (((qdep) & 0x1f) << 16)
>  #define QI_DEV_IOTLB_ADDR(addr)  ((u64)(addr) & VTD_PAGE_MASK)
> -#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) |
> ((u64)(pfsid & 0xfff) << 52)) +#define QI_DEV_IOTLB_PFSID(pfsid)
> (((u64)(pfsid & 0xf) << 12) | \
> +((u64)((pfsid >> 4) & 0xfff) <<
> 52)) #define QI_DEV_IOTLB_SIZE1
>  #define QI_DEV_IOTLB_MAX_INVS32
>  
> @@ -360,7 +361,8 @@ enum {
>  #define QI_DEV_EIOTLB_PASID(p)   (((u64)p) << 32)
>  #define QI_DEV_EIOTLB_SID(sid)   ((u64)((sid) & 0x) << 16)
>  #define QI_DEV_EIOTLB_QDEP(qd)   ((u64)((qd) & 0x1f) << 4)
> -#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) |
> ((u64)(pfsid & 0xfff) << 52)) +#define QI_DEV_EIOTLB_PFSID(pfsid)
> (((u64)(pfsid & 0xf) << 12) | \
> + ((u64)((pfsid >> 4) & 0xfff) <<
> 52)) #define QI_DEV_EIOTLB_MAX_INVS   32
>  
>  /* Page group response descriptor QW0 */

Good catch. Thanks!

Acked-by: Jacob Pan 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-11-08 Thread Jacob Pan
On Fri, 8 Nov 2019 12:30:31 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 12 ++--
> >  drivers/iommu/intel-pasid.c | 36
> >  drivers/iommu/intel-svm.c   |
> > 39 +++ 3 files changed, 29
> > insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
> > dmar_domain *domain, domain->auxd_refcnt--;
> >  
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > -   intel_pasid_free_id(domain->default_pasid);
> > +   ioasid_free(domain->default_pasid);
> >  }
> >  
> >  static int aux_domain_add_dev(struct dmar_domain *domain,
> > @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, if (domain->default_pasid <= 0) {
> > int pasid;
> >  
> > -   pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> > -
> > pci_max_pasids(to_pci_dev(dev)),
> > -GFP_KERNEL);
> > -   if (pasid <= 0) {
> > +   /* No private data needed for the default pasid */
> > +   pasid = ioasid_alloc(NULL, PASID_MIN,
> > pci_max_pasids(to_pci_dev(dev)) - 1,
> > +   NULL);
> > +   if (pasid == INVALID_IOASID) {
> > pr_err("Can't allocate default pasid\n");
> > return -ENODEV;
> > }
> > @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, spin_unlock(&iommu->lock);
> > spin_unlock_irqrestore(&device_domain_lock, flags);
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > -   intel_pasid_free_id(domain->default_pasid);
> > +   ioasid_free(domain->default_pasid);
> >  
> > return ret;
> >  }
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -26,42 +26,6 @@
> >   */
> >  static DEFINE_SPINLOCK(pasid_lock);
> >  u32 intel_pasid_max_id = PASID_MAX;
> > -static DEFINE_IDR(pasid_idr);
> > -
> > -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> > -{
> > -   int ret, min, max;
> > -
> > -   min = max_t(int, start, PASID_MIN);
> > -   max = min_t(int, end, intel_pasid_max_id);
> > -
> > -   WARN_ON(in_interrupt());
> > -   idr_preload(gfp);
> > -   spin_lock(&pasid_lock);
> > -   ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> > -   spin_unlock(&pasid_lock);
> > -   idr_preload_end();
> > -
> > -   return ret;
> > -}
> > -
> > -void intel_pasid_free_id(int pasid)
> > -{
> > -   spin_lock(&pasid_lock);
> > -   idr_remove(&pasid_idr, pasid);
> > -   spin_unlock(&pasid_lock);
> > -}
> > -
> > -void *intel_pasid_lookup_id(int pasid)
> > -{
> > -   void *p;
> > -
> > -   spin_lock(&pasid_lock);
> > -   p = idr_find(&pasid_idr, pasid);
> > -   spin_unlock(&pasid_lock);
> > -
> > -   return p;
> > -}
> >  
> >  int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) {
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 9b159132405d..a9a7f85a09bc 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include "intel-pasid.h"
> > @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (pasid_max >
> > intel_pasid_max_id) pasid_max = intel_pasid_max_id;
> >  
> > -   /* Do not use PASID 0 in caching mode (virtualised
> > IOMMU) */
> > -   ret = intel_pasid_alloc_id(svm,
> > -  !!cap_caching_mode(iommu->cap),
> > -  pasid_max - 1,
> > GFP_KERNEL);
> > -   if (ret < 0) {
> > +   /* Do not use PASID 0, reserved for RID to PASID */
> > +   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> > +   pasid_max - 1, svm);  
> pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
> is exclusive right? If you fixed an issue, you can mention it in the
> commit message.
yes, i should mention that. intel_pasid_alloc_id() uses IDR which is
end exclusive. ioasid uses xarray, which is inclusive. 
> > +   if (svm->pasid == INVALID_IOASID) {  
> > kfree(svm);>
> > kfree(sdev);  
> > +   ret = ENOSPC;  
> -ENOSPC.
> Nit: in 2/11 vcmd_alloc_pasid returned -ENOM

Re: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread Deepa Dinamani
> > > For VMM live update case, we should be able to detect and bypass
> > > the shutdown that Deepa introduced here, so keep IOMMU still operating?
> >
> > Is that a 'yes' to Deepa's "if someone wants to make it conditional, we
> > can do that" ?
>
> Yes, I think so. Thanks!

Are these changes already part of the kernel like avoiding shutdown of
the passthrough devices? device_shutdown() doesn't seem to be doing
anything selectively as of now.

Thanks,
Deepa
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread Deepa Dinamani
> > + x86_platform.iommu_shutdown = intel_iommu_shutdown;
>
> How about moving it to detect_intel_iommu() in drivers/iommu/dmar.c? And

Ok, makes sense to move it along with the init handler.

> make sure that it's included with CONFIG_X86_64.

You mean CONFIG_X86 like the init that is already there?

#ifdef CONFIG_X86
if (!ret)
x86_init.iommu.iommu_init = intel_iommu_init;
#endif

Thanks,
-Deepa
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

2019-11-08 Thread Jacob Pan
On Fri, 8 Nov 2019 11:40:23 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> > 
> > Signed-off-by: Lu Baolu 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/Kconfig   |  1 +
> >  drivers/iommu/intel-iommu.c | 67
> > +
> > include/linux/intel-iommu.h |  2 ++ 3 files changed, 70
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index fd50ddbf..961fe5795a90 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> > bool "Support for Shared Virtual Memory with Intel IOMMU"
> > depends on INTEL_IOMMU && X86
> > select PCI_PASID
> > +   select IOASID
> > select MMU_NOTIFIER
> > help
> >   Shared Virtual Memory (SVM) provides a facility for
> > devices diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 3f974919d3bd..ced1d89ef977
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct
> > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> > intel_svm_finish_prq(iommu);
> > }
> > +   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > +
> > ioasid_unregister_allocator(&iommu->pasid_allocator); +
> >  #endif
> >  }
> >  
> > @@ -4910,6 +4913,44 @@ static int __init
> > probe_acpi_namespace_devices(void) return 0;
> >  }
> >  
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > +   struct intel_iommu *iommu = data;
> > +   ioasid_t ioasid;
> > +
> > +   /*
> > +* VT-d virtual command interface always uses the full 20
> > bit
> > +* PASID range. Host can partition guest PASID range based
> > on
> > +* policies but it is out of guest's control.
> > +*/
> > +   if (min < PASID_MIN || max > intel_pasid_max_id)>
> > +   return INVALID_IOASID;  
> 
> > +
> > +   if (vcmd_alloc_pasid(iommu, &ioasid))
> > +   return INVALID_IOASID;
> > +
> > +   return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > +   struct intel_iommu *iommu = data;
> > +
> > +   if (!iommu)
> > +   return;
> > +   /*
> > +* Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > +* We can only free the PASID when all the devices are
> > unbond.
> > +*/
> > +   if (ioasid_find(NULL, ioasid, NULL)) {
> > +   pr_alert("Cannot free active IOASID %d\n", ioasid);
> > +   return;
> > +   }
> > +   vcmd_free_pasid(iommu, ioasid);
> > +}
> > +#endif
> > +
> >  int __init intel_iommu_init(void)
> >  {
> > int ret = -ENODEV;
> > @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> >"%s", iommu->name);
> > iommu_device_set_ops(&iommu->iommu,
> > &intel_iommu_ops); iommu_device_register(&iommu->iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   if (ecap_vcs(iommu->ecap) &&
> > vccap_pasid(iommu->vccap)) {
> > +   pr_info("Register custom PASID
> > allocator\n");
> > +   /*
> > +* Register a custom ASID allocator if we
> > are running
> > +* in a guest, the purpose is to have a
> > system wide PASID
> > +* namespace among all PASID users.
> > +* There can be multiple vIOMMUs in each
> > guest but only
> > +* one allocator is active. All vIOMMU
> > allocators will
> > +* eventually be calling the same host
> > allocator.
> > +*/
> > +   iommu->pasid_allocator.alloc =
> > intel_ioasid_alloc;
> > +   iommu->pasid_allocator.free =
> > intel_ioasid_free;
> > +   iommu->pasid_allocator.pdata = (void
> > *)iommu;
> > +   ret =
> > ioasid_register_allocator(&iommu->pasid_allocator);
> > +   if (ret) {
> > +   pr_warn("Custom PASID allocator
> > registeration failed\n");  
> nit: registration
> > +   /*
> > +* Disable scalable mode on this
> > IOMMU if there
> > +* is no custom allocator. Mixing
> > SM capable vIOMMU
> > +* and non-SM vIOMMU are not
> > supported:   
> nit; is not supported. But I guess you will reshape it according to
> previous comments.

Yes, i moved this earlier to avoid the

Re: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

2019-11-08 Thread Jacob Pan
On Fri, 8 Nov 2019 11:33:22 +0100
Auger Eric  wrote:

> Hi Jacob,
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > From: Lu Baolu 
> > 
> > Enabling IOMMU in a guest requires communication with the host
> > driver for certain aspects. Use of PASID ID to enable Shared Virtual
> > Addressing (SVA) requires managing PASID's in the host. VT-d 3.0
> > spec provides a Virtual Command Register (VCMD) to facilitate this.
> > Writes to this register in the guest are trapped by QEMU which
> > proxies the call to the host driver.
> > 
> > This virtual command interface consists of a capability register,
> > a virtual command register, and a virtual response register. Refer
> > to section 10.4.42, 10.4.43, 10.4.44 for more information.
> > 
> > This patch adds the enlightened PASID allocation/free interfaces
> > via the virtual command interface.
> > 
> > Cc: Ashok Raj 
> > Cc: Jacob Pan 
> > Cc: Kevin Tian 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Lu Baolu 
> > Signed-off-by: Jacob Pan 
> > Reviewed-by: Eric Auger 
> > ---
> >  drivers/iommu/intel-pasid.c | 56
> > +
> > drivers/iommu/intel-pasid.h | 13 ++-
> > include/linux/intel-iommu.h |  2 ++ 3 files changed, 70
> > insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 040a445be300..d81e857d2b25
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> > return p;
> >  }
> >  
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) +{
> > +   unsigned long flags;
> > +   u8 status_code;
> > +   int ret = 0;
> > +   u64 res;
> > +
> > +   raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > +   dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> > +   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > +   raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > +   status_code = VCMD_VRSP_SC(res);
> > +   switch (status_code) {
> > +   case VCMD_VRSP_SC_SUCCESS:
> > +   *pasid = VCMD_VRSP_RESULT(res);
> > +   break;
> > +   case VCMD_VRSP_SC_NO_PASID_AVAIL:
> > +   pr_info("IOMMU: %s: No PASID available\n",
> > iommu->name);
> > +   ret = -ENOMEM;
> > +   break;
> > +   default:
> > +   ret = -ENODEV;
> > +   pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > +   iommu->name, status_code);
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> > +{
> > +   unsigned long flags;
> > +   u8 status_code;
> > +   u64 res;
> > +
> > +   raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > +   dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> > VCMD_CMD_FREE);
> > +   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > +   raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > +   status_code = VCMD_VRSP_SC(res);
> > +   switch (status_code) {
> > +   case VCMD_VRSP_SC_SUCCESS:
> > +   break;
> > +   case VCMD_VRSP_SC_INVALID_PASID:
> > +   pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> > +   break;
> > +   default:
> > +   pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > +   iommu->name, status_code);
> > +   }
> > +}
> > +
> >  /*
> >   * Per device pasid table management:
> >   */
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..e413e884e685
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -23,6 +23,16 @@
> >  #define is_pasid_enabled(entry)(((entry)->lo >> 3)
> > & 0x1) #define get_pasid_dir_size(entry)(1 <<
> > entry)->lo >> 9) & 0x7) + 7)) 
> > +/* Virtual command interface for enlightened pasid management. */
> > +#define VCMD_CMD_ALLOC 0x1
> > +#define VCMD_CMD_FREE  0x2
> > +#define VCMD_VRSP_IP   0x1
> > +#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3)
> > +#define VCMD_VRSP_SC_SUCCESS   0
> > +#define VCMD_VRSP_SC_NO_PASID_AVAIL1
> > +#define VCMD_VRSP_SC_INVALID_PASID 1
> > +#define VCMD_VRSP_RESULT(e)(((e) >> 8) & 0xf)  
> nit: pasid is 20b but result field is 56b large
> Just in case a new command were to be added later on.
Good point, will rename to VCMD_VRSP_RESULT_PASID andd new macros for
future new commands with different results.
> > +
> >  /*
> >   * Domain ID reserved for pasid entries programmed for first-level
> >   * only and pass-through transfer modes.
> > @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu, struct device *dev, int pasid);
> >  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> >   

Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread John Garry

On 08/11/2019 17:48, Will Deacon wrote:

On Fri, Nov 08, 2019 at 05:32:48PM +, Will Deacon wrote:

On Fri, Nov 08, 2019 at 05:25:09PM +, John Garry wrote:

On 08/11/2019 16:47, Will Deacon wrote:

On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:

BTW, it now looks like it was your v1 series I was testing there, on your
branch iommu/module. It would be helpful to update for ease of testing.


Yes, sorry about that. I'll update it now (although I'm not sure it will
help with this -- I was going to see what happens with other devices such
as the intel-iommu or storage controllers)


So I tried your v2 series for this - it has the same issue, as I
anticipated.


Right, I'm just not sure how resilient drivers are expected to be to force
unbinding like this. You can break lots of stuff with root...


It seems that some iommu drivers do call iommu_device_register(), so maybe a
decent reference. Or simply stop the driver being unbound.


I'm not sure what you mean about iommu_device_register() (we call that
already), but I guess we can keep the '.suppress_bind_attrs = true' if
necessary. I'll have a play on my laptop and see how well that works if
you start unbinding stuff.


So unbinding the nvme driver goes bang:

[90139.090158] nvme nvme0: failed to set APST feature (-19)
[90141.966780] Aborting journal on device dm-1-8.
[90141.967124] Buffer I/O error on dev dm-1, logical block 26247168, lost sync 
page write
[90141.967169] JBD2: Error -5 detected when updating journal superblock for 
dm-1-8.
[90141.967403] Buffer I/O error on dev dm-1, logical block 0, lost sync page 
write
[90141.967454] EXT4-fs (dm-1): I/O error while writing superblock
[90141.967467] EXT4-fs error (device dm-1): ext4_journal_check_start:61: 
Detected aborted journal
[90141.967473] EXT4-fs (dm-1): Remounting filesystem read-only
[90141.967569] Buffer I/O error on dev dm-1, logical block 0, lost sync page 
write
[90141.967682] EXT4-fs (dm-1): I/O error while writing superblock

and I've not managed to recover the thing yet (it's stuck trying to reboot.)



Not surprised. I guess the device backing your root directory disappeared.


What state was your system in after unbinding the SMMU?


Unusable again. For me the storage controller backing the root directory 
is compromised by disabling the SMMU unsafely.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread John Garry

On 08/11/2019 17:32, Will Deacon wrote:

On Fri, Nov 08, 2019 at 05:25:09PM +, John Garry wrote:

On 08/11/2019 16:47, Will Deacon wrote:

On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:

BTW, it now looks like it was your v1 series I was testing there, on your
branch iommu/module. It would be helpful to update for ease of testing.


Yes, sorry about that. I'll update it now (although I'm not sure it will
help with this -- I was going to see what happens with other devices such
as the intel-iommu or storage controllers)


So I tried your v2 series for this - it has the same issue, as I
anticipated.


Right, I'm just not sure how resilient drivers are expected to be to force
unbinding like this. You can break lots of stuff with root...


For sure, but it is good practice to limit that.

I had to fix something like this recently, so know about it... another 
potential problem is use-after-frees, where your device managed memory 
is freed at removal but still registered somewhere.





It seems that some iommu drivers do call iommu_device_register(), so maybe a
decent reference. Or simply stop the driver being unbound.


I'm not sure what you mean about iommu_device_register() (we call that
already), 


Sorry, I meant to say iommu_device_unregister().

but I guess we can keep the '.suppress_bind_attrs = true' if
necessary. 


It may be good to add it to older stable kernels also, pre c07b6426df92.

I'll have a play on my laptop and see how well that works if

you start unbinding stuff.


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread Will Deacon
On Fri, Nov 08, 2019 at 05:32:48PM +, Will Deacon wrote:
> On Fri, Nov 08, 2019 at 05:25:09PM +, John Garry wrote:
> > On 08/11/2019 16:47, Will Deacon wrote:
> > > On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:
> > > > BTW, it now looks like it was your v1 series I was testing there, on 
> > > > your
> > > > branch iommu/module. It would be helpful to update for ease of testing.
> > > 
> > > Yes, sorry about that. I'll update it now (although I'm not sure it will
> > > help with this -- I was going to see what happens with other devices such
> > > as the intel-iommu or storage controllers)
> > 
> > So I tried your v2 series for this - it has the same issue, as I
> > anticipated.
> 
> Right, I'm just not sure how resilient drivers are expected to be to force
> unbinding like this. You can break lots of stuff with root...
> 
> > It seems that some iommu drivers do call iommu_device_register(), so maybe a
> > decent reference. Or simply stop the driver being unbound.
> 
> I'm not sure what you mean about iommu_device_register() (we call that
> already), but I guess we can keep the '.suppress_bind_attrs = true' if
> necessary. I'll have a play on my laptop and see how well that works if
> you start unbinding stuff.

So unbinding the nvme driver goes bang:

[90139.090158] nvme nvme0: failed to set APST feature (-19)
[90141.966780] Aborting journal on device dm-1-8.
[90141.967124] Buffer I/O error on dev dm-1, logical block 26247168, lost sync 
page write
[90141.967169] JBD2: Error -5 detected when updating journal superblock for 
dm-1-8.
[90141.967403] Buffer I/O error on dev dm-1, logical block 0, lost sync page 
write
[90141.967454] EXT4-fs (dm-1): I/O error while writing superblock
[90141.967467] EXT4-fs error (device dm-1): ext4_journal_check_start:61: 
Detected aborted journal
[90141.967473] EXT4-fs (dm-1): Remounting filesystem read-only
[90141.967569] Buffer I/O error on dev dm-1, logical block 0, lost sync page 
write
[90141.967682] EXT4-fs (dm-1): I/O error while writing superblock

and I've not managed to recover the thing yet (it's stuck trying to reboot.)

What state was your system in after unbinding the SMMU?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread Will Deacon
On Fri, Nov 08, 2019 at 05:25:09PM +, John Garry wrote:
> On 08/11/2019 16:47, Will Deacon wrote:
> > On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:
> > > BTW, it now looks like it was your v1 series I was testing there, on your
> > > branch iommu/module. It would be helpful to update for ease of testing.
> > 
> > Yes, sorry about that. I'll update it now (although I'm not sure it will
> > help with this -- I was going to see what happens with other devices such
> > as the intel-iommu or storage controllers)
> 
> So I tried your v2 series for this - it has the same issue, as I
> anticipated.

Right, I'm just not sure how resilient drivers are expected to be to force
unbinding like this. You can break lots of stuff with root...

> It seems that some iommu drivers do call iommu_device_register(), so maybe a
> decent reference. Or simply stop the driver being unbound.

I'm not sure what you mean about iommu_device_register() (we call that
already), but I guess we can keep the '.suppress_bind_attrs = true' if
necessary. I'll have a play on my laptop and see how well that works if
you start unbinding stuff.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread John Garry

On 08/11/2019 16:47, Will Deacon wrote:

On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:

On 08/11/2019 16:17, John Garry wrote:

On 08/11/2019 15:16, Will Deacon wrote:

+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);


Hi Will,


   static struct platform_driver arm_smmu_driver = {
   .driver    = {
   .name    = "arm-smmu-v3",
   .of_match_table    = of_match_ptr(arm_smmu_of_match),
-    .suppress_bind_attrs = true,


Does this mean that we can now manually unbind this driver from the SMMU
device?

Seems dangerous. Here's what happens for me:

root@ubuntu:/sys# cd ./bus/platform/drivers/arm-smmu-v3
ind @ubuntu:/sys/bus/platform/drivers/arm-smmu-v3# echo
arm-smmu-v3.0.auto > unbind
[   77.580351] hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
ho [   78.635473] platform arm-smmu-v3.0.auto: CMD_SYNC timeout at
0x0146 [hwprod 0x0146, hwcons 0x]


   },
   .probe    = arm_smmu_device_probe,
+    .remove    = arm_smmu_device_remove,
   .shutdown = arm_smmu_device_shutdown,
   };
-builtin_platform_driver(arm_smmu_driver);
+module_platform_driver(arm_smmu_driver);
+


BTW, it now looks like it was your v1 series I was testing there, on your
branch iommu/module. It would be helpful to update for ease of testing.




Hi Will,


Yes, sorry about that. I'll update it now (although I'm not sure it will
help with this -- I was going to see what happens with other devices such
as the intel-iommu or storage controllers)


So I tried your v2 series for this - it has the same issue, as I 
anticipated.


It seems that some iommu drivers do call iommu_device_register(), so 
maybe a decent reference. Or simply stop the driver being unbound.


Cheers,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread Will Deacon
On Fri, Nov 08, 2019 at 04:44:25PM +, John Garry wrote:
> On 08/11/2019 16:17, John Garry wrote:
> > On 08/11/2019 15:16, Will Deacon wrote:
> > > +MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
> > 
> > Hi Will,
> > 
> > >   static struct platform_driver arm_smmu_driver = {
> > >   .driver    = {
> > >   .name    = "arm-smmu-v3",
> > >   .of_match_table    = of_match_ptr(arm_smmu_of_match),
> > > -    .suppress_bind_attrs = true,
> > 
> > Does this mean that we can now manually unbind this driver from the SMMU
> > device?
> > 
> > Seems dangerous. Here's what happens for me:
> > 
> > root@ubuntu:/sys# cd ./bus/platform/drivers/arm-smmu-v3
> > ind @ubuntu:/sys/bus/platform/drivers/arm-smmu-v3# echo
> > arm-smmu-v3.0.auto > unbind
> > [   77.580351] hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
> > ho [   78.635473] platform arm-smmu-v3.0.auto: CMD_SYNC timeout at
> > 0x0146 [hwprod 0x0146, hwcons 0x]
> > 
> > >   },
> > >   .probe    = arm_smmu_device_probe,
> > > +    .remove    = arm_smmu_device_remove,
> > >   .shutdown = arm_smmu_device_shutdown,
> > >   };
> > > -builtin_platform_driver(arm_smmu_driver);
> > > +module_platform_driver(arm_smmu_driver);
> > > +
> 
> BTW, it now looks like it was your v1 series I was testing there, on your
> branch iommu/module. It would be helpful to update for ease of testing.

Yes, sorry about that. I'll update it now (although I'm not sure it will
help with this -- I was going to see what happens with other devices such
as the intel-iommu or storage controllers)

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread John Garry

On 08/11/2019 16:17, John Garry wrote:

On 08/11/2019 15:16, Will Deacon wrote:

+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);


Hi Will,


  static struct platform_driver arm_smmu_driver = {
  .driver    = {
  .name    = "arm-smmu-v3",
  .of_match_table    = of_match_ptr(arm_smmu_of_match),
-    .suppress_bind_attrs = true,


Does this mean that we can now manually unbind this driver from the SMMU 
device?


Seems dangerous. Here's what happens for me:

root@ubuntu:/sys# cd ./bus/platform/drivers/arm-smmu-v3
ind @ubuntu:/sys/bus/platform/drivers/arm-smmu-v3# echo 
arm-smmu-v3.0.auto > unbind

[   77.580351] hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
ho [   78.635473] platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 
0x0146 [hwprod 0x0146, hwcons 0x]



  },
  .probe    = arm_smmu_device_probe,
+    .remove    = arm_smmu_device_remove,
  .shutdown = arm_smmu_device_shutdown,
  };
-builtin_platform_driver(arm_smmu_driver);
+module_platform_driver(arm_smmu_driver);
+


BTW, it now looks like it was your v1 series I was testing there, on 
your branch iommu/module. It would be helpful to update for ease of testing.


Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

2019-11-08 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.
> This patch adds invalidation functions that can be used for additional
> translation cache types.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/dmar.c| 46 
> +
>  drivers/iommu/intel-pasid.c |  3 ++-
>  include/linux/intel-iommu.h | 21 +
>  3 files changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 49bb7d76e646..0ce2d32ff99e 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 
> did, u64 addr,
>   qi_submit_sync(&desc, iommu);
>  }
>  
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
> + unsigned int size_order, u64 granu, int ih)
> +{
> + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> + QI_EIOTLB_AM(size_order);
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
>  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>   u16 qdep, u64 addr, unsigned mask)
>  {
> @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
> sid, u16 pfsid,
>   qi_submit_sync(&desc, iommu);
>  }
>  
> +/* PASID-based device IOTLB Invalidate */
> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid,  u16 qdep, u64 addr, unsigned size_order, u64 granu)
> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> + QI_DEV_IOTLB_PFSID(pfsid);
> + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> +
> + /* If S bit is 0, we only flush a single page. If S bit is set,
> +  * The least significant zero bit indicates the invalidation address
> +  * range. VT-d spec 6.5.2.6.
> +  * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
> +  */
> + if (!size_order) {
> + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
this is desc.qw1

With that fixed and the qi_flush_dev_piotlb init issue spotted by Lu,
feel free to add my

Reviewed-by: Eric Auger 

Thanks

Eric

> + } else {
> + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
> + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | 
> QI_DEV_EIOTLB_SIZE;
> + }
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int 
> pasid)
> +{
> + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
> QI_PC_TYPE;
> + qi_submit_sync(&desc, iommu);
> +}
>  /*
>   * Disable Queued Invalidation interface.
>   */
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index f846a907cfcf..6d7a701ef4d3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu 
> *iommu,
>  {
>   struct qi_desc desc;
>  
> - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
> + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> + QI_PC_PASID(pasid) | QI_PC_TYPE;
>   desc.qw1 = 0;
>   desc.qw2 = 0;
>   desc.qw3 = 0;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6c74c71b1ebf..a25fb3a0ea5b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -332,7 +332,7 @@ enum {
>  #define QI_IOTLB_GRAN(gran)  (((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
>  #define QI_IOTLB_ADDR(addr)  (((u64)addr) & VTD_PAGE_MASK)
>  #define QI_IOTLB_IH(ih)  (((u64)ih) << 6)
> -#define QI_IOTLB_AM(am)  (((u8)am))
> +#define QI_IOTLB_AM(am)  (((u8)am) & 0x3f)
>  
>  #define QI_CC_FM(fm) (((u64)fm) << 48)
>  #define QI_CC_SID(sid)   (((u64)sid) << 32)
> @@ -350,16 +350,21 @@ enum {
>  #define QI_PC_DID(did)   (((u64)did) << 16)
>  #define QI_PC_GRAN(gran) (((u64)gran) << 4)
>  
> -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> -#define QI_PC_PASID_SEL  (QI_PC_TYPE | QI_PC_GRAN(1))
> +/* PASID cache invalidation granu */
> +#define QI_PC_ALL_PASIDS 0
> +#define QI_PC_PASID_SEL  1
>  
>  #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
>  #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_EIOTLB_AM(am) (((u64)am))
> 

Re: [PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread John Garry

On 08/11/2019 15:16, Will Deacon wrote:

+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);


Hi Will,

  
  static struct platform_driver arm_smmu_driver = {

.driver = {
.name   = "arm-smmu-v3",
.of_match_table = of_match_ptr(arm_smmu_of_match),
-   .suppress_bind_attrs = true,


Does this mean that we can now manually unbind this driver from the SMMU 
device?


Seems dangerous. Here's what happens for me:

root@ubuntu:/sys# cd ./bus/platform/drivers/arm-smmu-v3
ind @ubuntu:/sys/bus/platform/drivers/arm-smmu-v3# echo 
arm-smmu-v3.0.auto > unbind

[   77.580351] hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
ho [   78.635473] platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 
0x0146 [hwprod 0x0146, hwcons 0x]



},
.probe  = arm_smmu_device_probe,
+   .remove = arm_smmu_device_remove,
.shutdown = arm_smmu_device_shutdown,
  };
-builtin_platform_driver(arm_smmu_driver);
+module_platform_driver(arm_smmu_driver);
+



Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros

2019-11-08 Thread Eric Auger
For both PASID-based-Device-TLB Invalidate Descriptor and
Device-TLB Invalidate Descriptor, the Physical Function Source-ID
value is split according to this layout:

PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52.
Fix the part laid out at offset 52.

Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID")
Signed-off-by: Eric Auger 
---
 include/linux/intel-iommu.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ed11ef594378..6d8bf4bdf240 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -336,7 +336,8 @@ enum {
 #define QI_DEV_IOTLB_SID(sid)  ((u64)((sid) & 0x) << 32)
 #define QI_DEV_IOTLB_QDEP(qdep)(((qdep) & 0x1f) << 16)
 #define QI_DEV_IOTLB_ADDR(addr)((u64)(addr) & VTD_PAGE_MASK)
-#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 
0xfff) << 52))
+#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
+  ((u64)((pfsid >> 4) & 0xfff) << 52))
 #define QI_DEV_IOTLB_SIZE  1
 #define QI_DEV_IOTLB_MAX_INVS  32
 
@@ -360,7 +361,8 @@ enum {
 #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
-#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid 
& 0xfff) << 52))
+#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
+   ((u64)((pfsid >> 4) & 0xfff) << 52))
 #define QI_DEV_EIOTLB_MAX_INVS 32
 
 /* Page group response descriptor QW0 */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread Zeng, Jason


> -Original Message-
> From: David Woodhouse 
> Sent: Friday, November 8, 2019 3:54 PM
> To: Deepa Dinamani ; j...@8bytes.org; linux-
> ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; Zeng, Jason ;
> Tian, Kevin 
> Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown
> 
> On Thu, 2019-11-07 at 12:59 -0800, Deepa Dinamani wrote:
> > The intel-iommu driver assumes that the iommu state is
> > cleaned up at the start of the new kernel.
> > But, when we try to kexec boot something other than the
> > Linux kernel, the cleanup cannot be relied upon.
> > Hence, cleanup before we go down for reboot.
> >
> > Keeping the cleanup at initialization also, in case BIOS
> > leaves the IOMMU enabled.
> >
> > I considered turning off iommu only during kexec reboot,
> > but a clean shutdown seems always a good idea. But if
> > someone wants to make it conditional, we can do that.
> 
> This is going to break things for the VMM live update scheme that Jason
> presented at KVM Forum, isn't it?
> 
> In that case we rely on the IOMMU still operating during the
> transition.

For VMM live update case, we should be able to detect and bypass
the shutdown that Deepa introduced here, so keep IOMMU still operating?

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread Zeng, Jason



> -Original Message-
> From: David Woodhouse 
> Sent: Friday, November 8, 2019 4:57 PM
> To: Zeng, Jason ; Deepa Dinamani
> ; j...@8bytes.org; linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; Tian, Kevin 
> Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown
> 
> On Fri, 2019-11-08 at 08:47 +, Zeng, Jason wrote:
> > > -Original Message-
> > > From: David Woodhouse 
> > > Sent: Friday, November 8, 2019 3:54 PM
> > > To: Deepa Dinamani ; j...@8bytes.org;
> linux-
> > > ker...@vger.kernel.org
> > > Cc: iommu@lists.linux-foundation.org; Zeng, Jason
> ;
> > > Tian, Kevin 
> > > Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown
> > >
> > > On Thu, 2019-11-07 at 12:59 -0800, Deepa Dinamani wrote:
> > > > The intel-iommu driver assumes that the iommu state is
> > > > cleaned up at the start of the new kernel.
> > > > But, when we try to kexec boot something other than the
> > > > Linux kernel, the cleanup cannot be relied upon.
> > > > Hence, cleanup before we go down for reboot.
> > > >
> > > > Keeping the cleanup at initialization also, in case BIOS
> > > > leaves the IOMMU enabled.
> > > >
> > > > I considered turning off iommu only during kexec reboot,
> > > > but a clean shutdown seems always a good idea. But if
> > > > someone wants to make it conditional, we can do that.
> > >
> > > This is going to break things for the VMM live update scheme that Jason
> > > presented at KVM Forum, isn't it?
> > >
> > > In that case we rely on the IOMMU still operating during the
> > > transition.
> >
> > For VMM live update case, we should be able to detect and bypass
> > the shutdown that Deepa introduced here, so keep IOMMU still operating?
> 
> Is that a 'yes' to Deepa's "if someone wants to make it conditional, we
> can do that" ?

Yes, I think so. Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/8] iommu/arm-smmu-v3: Improve add_device() error handling

2019-11-08 Thread Jean-Philippe Brucker
Let add_device() clean up after itself. The iommu_bus_init() function
does call remove_device() on error, but other sites (e.g. of_iommu) do
not.

Don't free level-2 stream tables because we'd have to track if we
allocated each of them or if they are used by other endpoints. It's not
worth the hassle since they are managed resources.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82eac89ee187..88ec0bf33492 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2826,14 +2826,16 @@ static int arm_smmu_add_device(struct device *dev)
for (i = 0; i < master->num_sids; i++) {
u32 sid = master->sids[i];
 
-   if (!arm_smmu_sid_in_range(smmu, sid))
-   return -ERANGE;
+   if (!arm_smmu_sid_in_range(smmu, sid)) {
+   ret = -ERANGE;
+   goto err_free_master;
+   }
 
/* Ensure l2 strtab is initialised */
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
ret = arm_smmu_init_l2_strtab(smmu, sid);
if (ret)
-   return ret;
+   goto err_free_master;
}
}
 
@@ -2843,13 +2845,25 @@ static int arm_smmu_add_device(struct device *dev)
master->ssid_bits = min_t(u8, master->ssid_bits,
  CTXDESC_LINEAR_CDMAX);
 
+   ret = iommu_device_link(&smmu->iommu, dev);
+   if (ret)
+   goto err_free_master;
+
group = iommu_group_get_for_dev(dev);
-   if (!IS_ERR(group)) {
-   iommu_group_put(group);
-   iommu_device_link(&smmu->iommu, dev);
+   if (IS_ERR(group)) {
+   ret = PTR_ERR(group);
+   goto err_unlink;
}
 
-   return PTR_ERR_OR_ZERO(group);
+   iommu_group_put(group);
+   return 0;
+
+err_unlink:
+   iommu_device_unlink(&smmu->iommu, dev);
+err_free_master:
+   kfree(master);
+   fwspec->iommu_priv = NULL;
+   return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 8/8] iommu/arm-smmu-v3: Add support for PCI PASID

2019-11-08 Thread Jean-Philippe Brucker
Enable PASID for PCI devices that support it. Since the SSID tables are
allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
arm_smmu_dev_feature_enable() would be too late, since by that time the
main DMA domain has already been attached. Do it in add_device() instead.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 51 -
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 88ec0bf33492..3ee313c08325 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2633,6 +2633,49 @@ static void arm_smmu_disable_ats(struct arm_smmu_master 
*master)
atomic_dec(&smmu_domain->nr_ats_masters);
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
+{
+   int ret;
+   int features;
+   int num_pasids;
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(master->dev))
+   return -ENOSYS;
+
+   pdev = to_pci_dev(master->dev);
+
+   features = pci_pasid_features(pdev);
+   if (features < 0)
+   return -ENOSYS;
+
+   num_pasids = pci_max_pasids(pdev);
+   if (num_pasids <= 0)
+   return -ENOSYS;
+
+   ret = pci_enable_pasid(pdev, features);
+   if (!ret)
+   master->ssid_bits = min_t(u8, ilog2(num_pasids),
+ master->smmu->ssid_bits);
+   return ret;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
+{
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(master->dev))
+   return;
+
+   pdev = to_pci_dev(master->dev);
+
+   if (!pdev->pasid_enabled)
+   return;
+
+   master->ssid_bits = 0;
+   pci_disable_pasid(pdev);
+}
+
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
unsigned long flags;
@@ -2841,13 +2884,16 @@ static int arm_smmu_add_device(struct device *dev)
 
master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+   /* Note that PASID must be enabled before, and disabled after ATS */
+   arm_smmu_enable_pasid(master);
+
if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
master->ssid_bits = min_t(u8, master->ssid_bits,
  CTXDESC_LINEAR_CDMAX);
 
ret = iommu_device_link(&smmu->iommu, dev);
if (ret)
-   goto err_free_master;
+   goto err_disable_pasid;
 
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
@@ -2860,6 +2906,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 err_unlink:
iommu_device_unlink(&smmu->iommu, dev);
+err_disable_pasid:
+   arm_smmu_disable_pasid(master);
 err_free_master:
kfree(master);
fwspec->iommu_priv = NULL;
@@ -2880,6 +2928,7 @@ static void arm_smmu_remove_device(struct device *dev)
arm_smmu_detach_dev(master);
iommu_group_remove_device(dev);
iommu_device_unlink(&smmu->iommu, dev);
+   arm_smmu_disable_pasid(master);
kfree(master);
iommu_fwspec_free(dev);
 }
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] iommu/arm-smmu-v3: Add second level of context descriptor table

2019-11-08 Thread Jean-Philippe Brucker
The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 137 +---
 1 file changed, 126 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index df7d45503c65..82eac89ee187 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -224,6 +224,7 @@
 
 #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR  0
+#define STRTAB_STE_0_S1FMT_64K_L2  2
 #define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX   GENMASK_ULL(63, 59)
 
@@ -263,7 +264,20 @@
 
 #define STRTAB_STE_3_S2TTB_MASKGENMASK_ULL(51, 4)
 
-/* Context descriptor (stage-1 only) */
+/*
+ * Context descriptors.
+ *
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entries,
+ *   1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT  10
+#define CTXDESC_L2_ENTRIES (1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORDS 1
+#define CTXDESC_L1_DESC_VALID  1
+#define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
+
 #define CTXDESC_CD_DWORDS  8
 #define CTXDESC_CD_0_TCR_T0SZ  GENMASK_ULL(5, 0)
 #define ARM64_TCR_T0SZ GENMASK_ULL(5, 0)
@@ -577,7 +591,10 @@ struct arm_smmu_cd_table {
 struct arm_smmu_s1_cfg {
u8  s1fmt;
u8  s1cdmax;
-   struct arm_smmu_cd_tabletable;
+   struct arm_smmu_cd_table*tables;
+   size_t  num_tables;
+   __le64  *l1ptr;
+   dma_addr_t  l1ptr_dma;
struct arm_smmu_ctx_desccd;
 };
 
@@ -1521,12 +1538,51 @@ static void arm_smmu_free_cd_leaf_table(struct 
arm_smmu_device *smmu,
 {
size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
 
+   if (!table->ptr)
+   return;
dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
 }
 
-static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+ struct arm_smmu_cd_table *table)
 {
-   return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
+   u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
+ CTXDESC_L1_DESC_VALID;
+
+   WRITE_ONCE(*dst, cpu_to_le64(val));
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
+  u32 ssid)
+{
+   __le64 *l1ptr;
+   unsigned int idx;
+   struct arm_smmu_cd_table *table;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+   if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) {
+   table = &cfg->tables[0];
+   idx = ssid;
+   } else {
+   idx = ssid >> CTXDESC_SPLIT;
+   if (idx >= cfg->num_tables)
+   return NULL;
+
+   table = &cfg->tables[idx];
+   if (!table->ptr) {
+   if (arm_smmu_alloc_cd_leaf_table(smmu, table,
+CTXDESC_L2_ENTRIES))
+   return NULL;
+
+   l1ptr = cfg->l1ptr + idx * CTXDESC_L1_DESC_DWORDS;
+   arm_smmu_write_cd_l1_desc(l1ptr, table);
+   /* An invalid L1CD can be cached */
+   arm_smmu_sync_cd(smmu_domain, ssid, false);
+   }
+   idx = ssid & (CTXDESC_L2_ENTRIES - 1);
+   }
+   return table->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -1552,7 +1608,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain,
u64 val;
bool cd_live;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   __le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
+   __le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
 
/*
 * This function handles the following cases:
@@ -1612,20 +1668,76 @@ static int arm_smmu_write_ctx_desc(struct 
arm_smmu_domain *smmu_domain,
 
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+   int ret;
+   size_t size = 0;
+   size_t max_contexts;
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-   cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
-   return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table,
-  

[PATCH v2 4/8] iommu/arm-smmu-v3: Prepare for SSID support

2019-11-08 Thread Jean-Philippe Brucker
When a master supports substream ID, allocate a table with multiple
context descriptors for its stage-1 domain. For the moment S1CDMax is
still 0 in the STE, so the additional context descriptors are ignored.

Context descriptor tables are allocated once for the first master attached
to a domain. Therefore attaching multiple devices with different SSID
sizes is tricky, and we currently don't support it.

As a future improvement it would be nice to at least support attaching a
SSID-capable device to a domain that isn't using SSID, by reallocating the
SSID table. This would allow supporting a SSID-capable device that is in
the same IOMMU group as a bridge, for example. Varying SSID size is less
of a concern, since the PCIe specification "highly recommends" that
devices supporting PASID implement all 20 bits of it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 117 ++--
 1 file changed, 85 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 33488da8f742..122bed0168a3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -553,16 +553,22 @@ struct arm_smmu_strtab_l1_desc {
dma_addr_t  l2ptr_dma;
 };
 
+struct arm_smmu_ctx_desc {
+   u16 asid;
+   u64 ttbr;
+   u64 tcr;
+   u64 mair;
+};
+
+struct arm_smmu_cd_table {
+   __le64  *ptr;
+   dma_addr_t  ptr_dma;
+};
+
 struct arm_smmu_s1_cfg {
-   __le64  *cdptr;
-   dma_addr_t  cdptr_dma;
-
-   struct arm_smmu_ctx_desc {
-   u16 asid;
-   u64 ttbr;
-   u64 tcr;
-   u64 mair;
-   }   cd;
+   u8  s1cdmax;
+   struct arm_smmu_cd_tabletable;
+   struct arm_smmu_ctx_desccd;
 };
 
 struct arm_smmu_s2_cfg {
@@ -1450,6 +1456,31 @@ static int arm_smmu_cmdq_issue_sync(struct 
arm_smmu_device *smmu)
 }
 
 /* Context descriptor manipulation functions */
+static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
+   struct arm_smmu_cd_table *table,
+   size_t num_entries)
+{
+   size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+   table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma,
+GFP_KERNEL | __GFP_ZERO);
+   if (!table->ptr) {
+   dev_warn(smmu->dev,
+"failed to allocate context descriptor table\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
+   struct arm_smmu_cd_table *table,
+   size_t num_entries)
+{
+   size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+   dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
u64 val = 0;
@@ -1471,6 +1502,7 @@ static void arm_smmu_write_ctx_desc(struct 
arm_smmu_device *smmu,
struct arm_smmu_s1_cfg *cfg)
 {
u64 val;
+   __le64 *cdptr = cfg->table.ptr;
 
/*
 * We don't need to issue any invalidation here, as we'll invalidate
@@ -1488,12 +1520,29 @@ static void arm_smmu_write_ctx_desc(struct 
arm_smmu_device *smmu,
if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
val |= CTXDESC_CD_0_S;
 
-   cfg->cdptr[0] = cpu_to_le64(val);
+   cdptr[0] = cpu_to_le64(val);
 
val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
-   cfg->cdptr[1] = cpu_to_le64(val);
+   cdptr[1] = cpu_to_le64(val);
 
-   cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair);
+   cdptr[3] = cpu_to_le64(cfg->cd.mair);
+}
+
+static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+   return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table,
+   1 << cfg->s1cdmax);
+}
+
+static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+   arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
 }
 
 /* Stream table manipulation functions */
@@ -1624,7 +1673,7 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 

[PATCH v2 2/8] iommu/arm-smmu-v3: Support platform SSID

2019-11-08 Thread Jean-Philippe Brucker
For platform devices that support SubstreamID (SSID), firmware provides
the number of supported SSID bits. Restrict it to what the SMMU supports
and cache it into master->ssid_bits, which will also be used for PCI
PASID.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 13 +
 drivers/iommu/of_iommu.c|  6 +-
 include/linux/iommu.h   |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8da93e730d6f..33488da8f742 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -292,6 +292,12 @@
 
 #define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
 
+/*
+ * When the SMMU only supports linear context descriptor tables, pick a
+ * reasonable size limit (64kB).
+ */
+#define CTXDESC_LINEAR_CDMAX   ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
+
 /* Convert between AArch64 (CPU) TCR format and SMMU CD format */
 #define ARM_SMMU_TCR2CD(tcr, fld)  FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
FIELD_GET(ARM64_TCR_##fld, tcr))
@@ -638,6 +644,7 @@ struct arm_smmu_master {
u32 *sids;
unsigned intnum_sids;
boolats_enabled;
+   unsigned intssid_bits;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -2572,6 +2579,12 @@ static int arm_smmu_add_device(struct device *dev)
}
}
 
+   master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+
+   if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
+   master->ssid_bits = min_t(u8, master->ssid_bits,
+ CTXDESC_LINEAR_CDMAX);
+
group = iommu_group_get_for_dev(dev);
if (!IS_ERR(group)) {
iommu_group_put(group);
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 614a93aa5305..aab63e9f283f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -194,8 +194,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
if (err)
break;
}
-   }
 
+   fwspec = dev_iommu_fwspec_get(dev);
+   if (!err && fwspec)
+   of_property_read_u32(master_np, "pasid-num-bits",
+&fwspec->num_pasid_bits);
+   }
 
/*
 * Two success conditions can be represented by non-negative err here:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f84fe76f0eea..0a3d9c3c368a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -576,6 +576,7 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
  * @ops: ops for this device's IOMMU
  * @iommu_fwnode: firmware handle for this device's IOMMU
  * @iommu_priv: IOMMU driver private data for this device
+ * @num_pasid_bits: number of PASID bits supported by this device
  * @num_ids: number of associated device IDs
  * @ids: IDs which this device may present to the IOMMU
  */
@@ -584,6 +585,7 @@ struct iommu_fwspec {
struct fwnode_handle*iommu_fwnode;
void*iommu_priv;
u32 flags;
+   u32 num_pasid_bits;
unsigned intnum_ids;
u32 ids[1];
 };
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/8] iommu/arm-smmu-v3: Add support for Substream IDs

2019-11-08 Thread Jean-Philippe Brucker
At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
page directory per device. However SMMUv3 allows more than one address
space for some devices, by providing multiple stage-1 page directories. In
addition to the Stream ID (SID), that identifies a device, we can now have
Substream IDs (SSID) identifying an address space. In PCIe, SID is called
Requester ID (RID) and SSID is called Process Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

  Stream tables  Ctx. tables  Page tables
++   ,--->+---+   ,--->+---+
::   |:   :   |:   :
++   |+---+   |+---+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
+++---++---+
:::   ::   :
+++---++---+

Implement a single level of context descriptor table for now, but as with
stream and page tables, an SSID can be split to index multiple levels of
tables.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 132 ++--
 1 file changed, 111 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 122bed0168a3..df7d45503c65 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -227,6 +227,11 @@
 #define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX   GENMASK_ULL(63, 59)
 
+#define STRTAB_STE_1_S1DSS GENMASK_ULL(1, 0)
+#define STRTAB_STE_1_S1DSS_TERMINATE   0x0
+#define STRTAB_STE_1_S1DSS_BYPASS  0x1
+#define STRTAB_STE_1_S1DSS_SSID0   0x2
+
 #define STRTAB_STE_1_S1C_CACHE_NC  0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA1UL
 #define STRTAB_STE_1_S1C_CACHE_WT  2UL
@@ -329,6 +334,7 @@
 #define CMDQ_PREFETCH_1_SIZE   GENMASK_ULL(4, 0)
 #define CMDQ_PREFETCH_1_ADDR_MASK  GENMASK_ULL(63, 12)
 
+#define CMDQ_CFGI_0_SSID   GENMASK_ULL(31, 12)
 #define CMDQ_CFGI_0_SIDGENMASK_ULL(63, 32)
 #define CMDQ_CFGI_1_LEAF   (1UL << 0)
 #define CMDQ_CFGI_1_RANGE  GENMASK_ULL(4, 0)
@@ -446,8 +452,11 @@ struct arm_smmu_cmdq_ent {
 
#define CMDQ_OP_CFGI_STE0x3
#define CMDQ_OP_CFGI_ALL0x4
+   #define CMDQ_OP_CFGI_CD 0x5
+   #define CMDQ_OP_CFGI_CD_ALL 0x6
struct {
u32 sid;
+   u32 ssid;
union {
boolleaf;
u8  span;
@@ -566,6 +575,7 @@ struct arm_smmu_cd_table {
 };
 
 struct arm_smmu_s1_cfg {
+   u8  s1fmt;
u8  s1cdmax;
struct arm_smmu_cd_tabletable;
struct arm_smmu_ctx_desccd;
@@ -860,10 +870,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
break;
+   case CMDQ_OP_CFGI_CD:
+   cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
+   /* Fallthrough */
case CMDQ_OP_CFGI_STE:
cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
break;
+   case CMDQ_OP_CFGI_CD_ALL:
+   cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+   break;
case CMDQ_OP_CFGI_ALL:
/* Cover the entire SID range */
cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
@@ -1456,6 +1472,33 @@ static int arm_smmu_cmdq_issue_sync(struct 
arm_smmu_device *smmu)
 }
 
 /* Context descriptor manipulation functions */
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+int ssid, bool leaf)
+{
+   size_t i;
+   unsigned long flags;
+   struct arm_smmu_master *master;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_cmdq_ent cmd = {
+   .opcode = CMDQ_OP_CFGI_CD,
+   .cfgi   = {
+   .ssid   = ssid,
+   .leaf   = leaf,
+   },
+   };
+
+   spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+   list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+   for (i = 0; i < master->num_sids; i++) {
+   

[PATCH v2 1/8] dt-bindings: document PASID property for IOMMU masters

2019-11-08 Thread Jean-Philippe Brucker
On Arm systems, some platform devices behind an SMMU may support the PASID
feature, which offers multiple address space. Let the firmware tell us
when a device supports PASID.

Reviewed-by: Rob Herring 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt 
b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..3c36334e4f94 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,12 @@ have a means to turn off translation. But it is invalid in 
such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+
+- pasid-num-bits: Some masters support multiple address spaces for DMA, by
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ==
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/8] ACPI/IORT: Support PASID for platform devices

2019-11-08 Thread Jean-Philippe Brucker
Named component nodes in the IORT tables describe the number of
Substream ID bits (aka. PASID) supported by the device. Propagate this
value to the fwspec structure in order to enable PASID for platform
devices.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/iort.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 5a7551d060f2..9aebb180744f 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -11,6 +11,7 @@
 #define pr_fmt(fmt)"ACPI: IORT: " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -924,6 +925,20 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 
alias, void *data)
return iort_iommu_xlate(info->dev, parent, streamid);
 }
 
+static void iort_named_component_init(struct device *dev,
+ struct acpi_iort_node *node)
+{
+   struct acpi_iort_named_component *nc;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+   if (!fwspec)
+   return;
+
+   nc = (struct acpi_iort_named_component *)node->node_data;
+   fwspec->num_pasid_bits = FIELD_GET(ACPI_IORT_NC_PASID_BITS,
+  nc->node_flags);
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -978,6 +993,9 @@ const struct iommu_ops *iort_iommu_configure(struct device 
*dev)
if (parent)
err = iort_iommu_xlate(dev, parent, streamid);
} while (parent && !err);
+
+   if (!err)
+   iort_named_component_init(dev, node);
}
 
/*
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/8] iommu: Add PASID support to Arm SMMUv3

2019-11-08 Thread Jean-Philippe Brucker
This is version 2 of the series I sent a while ago [1], adding PASID
support to the Arm SMMUv3 driver.

Changes since v1:
* Dropped the patch adding auxiliary domain support. It's an easy way to
  test PASID, by populating PASID contexts using iommu_map/unmap(), but
  I don't know if it will ever have real users. 

  Since v1 I changed my testing gear, and am using the zip accelerator
  [2] instead of a software model. It only uses SVA and testing
  auxiliary domains would require additional changes that would never go
  upstream. SVA requires another 20 patches (including I/O page faults)
  that I will send later, but at least I know that this will get used.

* ioasid patch has been carried by Jacob and should be merged for v5.5 [3]

* Split patch "Add support for Substream IDs" into patches 4 and 5.

* Added IORT support (patch 3) and addressed other comments.

[1] 
https://lore.kernel.org/linux-iommu/20190610184714.6786-1-jean-philippe.bruc...@arm.com/
[2] 
https://lore.kernel.org/linux-iommu/1572331216-9503-1-git-send-email-zhangfei@linaro.org/
[3] 
https://lore.kernel.org/linux-iommu/1570045363-24856-1-git-send-email-jacob.jun@linux.intel.com/
 

Jean-Philippe Brucker (8):
  dt-bindings: document PASID property for IOMMU masters
  iommu/arm-smmu-v3: Support platform SSID
  ACPI/IORT: Support PASID for platform devices
  iommu/arm-smmu-v3: Prepare for SSID support
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Improve add_device() error handling
  iommu/arm-smmu-v3: Add support for PCI PASID

 .../devicetree/bindings/iommu/iommu.txt   |   6 +
 drivers/acpi/arm64/iort.c |  18 +
 drivers/iommu/arm-smmu-v3.c   | 442 +++---
 drivers/iommu/of_iommu.c  |   6 +-
 include/linux/iommu.h |   2 +
 5 files changed, 419 insertions(+), 55 deletions(-)

-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/9] iommu/arm-smmu-v3: Allow building as a module

2019-11-08 Thread Will Deacon
By removing the redundant call to 'pci_request_acs()' we can allow the
ARM SMMUv3 driver to be built as a module.

Signed-off-by: Will Deacon 
---
 drivers/iommu/Kconfig   | 2 +-
 drivers/iommu/arm-smmu-v3.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..7583d47fc4d5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -388,7 +388,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
  config.
 
 config ARM_SMMU_V3
-   bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
+   tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
depends on ARM64
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2ad8e2ca0583..d54ceb80c28d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2733,6 +2733,7 @@ static struct iommu_ops arm_smmu_ops = {
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = arm_smmu_put_resv_regions,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
+   .owner  = THIS_MODULE,
 };
 
 /* Probing and initialisation functions */
@@ -3657,7 +3658,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 
 #ifdef CONFIG_PCI
if (pci_bus_type.iommu_ops != &arm_smmu_ops) {
-   pci_request_acs();
ret = bus_set_iommu(&pci_bus_type, &arm_smmu_ops);
if (ret)
return ret;
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/9] drivers/iommu: Take a ref to the IOMMU driver prior to ->add_device()

2019-11-08 Thread Will Deacon
To avoid accidental removal of an active IOMMU driver module, take a
reference to the driver module in 'iommu_probe_device()' immediately
prior to invoking the '->add_device()' callback and hold it until the
after the device has been removed by '->remove_device()'.

Suggested-by: Joerg Roedel 
Signed-off-by: Will Deacon 
---
 drivers/iommu/iommu.c | 19 +--
 include/linux/iommu.h |  2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c1aadb570145..4bfecfbbe2cf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -185,10 +186,21 @@ int iommu_probe_device(struct device *dev)
if (!iommu_get_dev_param(dev))
return -ENOMEM;
 
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_free_dev_param;
+   }
+
ret = ops->add_device(dev);
if (ret)
-   iommu_free_dev_param(dev);
+   goto err_module_put;
+
+   return 0;
 
+err_module_put:
+   module_put(ops->owner);
+err_free_dev_param:
+   iommu_free_dev_param(dev);
return ret;
 }
 
@@ -199,7 +211,10 @@ void iommu_release_device(struct device *dev)
if (dev->iommu_group)
ops->remove_device(dev);
 
-   iommu_free_dev_param(dev);
+   if (dev->iommu_param) {
+   module_put(ops->owner);
+   iommu_free_dev_param(dev);
+   }
 }
 
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 29bac5345563..d9dab5a3e912 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -245,6 +245,7 @@ struct iommu_iotlb_gather {
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @owner: Driver module providing these ops
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -308,6 +309,7 @@ struct iommu_ops {
 struct iommu_page_response *msg);
 
unsigned long pgsize_bitmap;
+   struct module *owner;
 };
 
 /**
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/9] Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"

2019-11-08 Thread Will Deacon
This reverts commit c07b6426df922d21a13a959cf785d46e9c531941.

Let's get the SMMUv3 driver building as a module, which means putting
back some dead code that we used to carry.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8da93e730d6f..2ad8e2ca0583 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -21,8 +21,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -384,10 +383,6 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
-/*
- * not really modular, but the easiest way to keep compat with existing
- * bootargs behaviour is to continue using module_param_named here.
- */
 static bool disable_bypass = 1;
 module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
@@ -3683,25 +3678,37 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return 0;
 }
 
-static void arm_smmu_device_shutdown(struct platform_device *pdev)
+static int arm_smmu_device_remove(struct platform_device *pdev)
 {
struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
arm_smmu_device_disable(smmu);
+
+   return 0;
+}
+
+static void arm_smmu_device_shutdown(struct platform_device *pdev)
+{
+   arm_smmu_device_remove(pdev);
 }
 
 static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v3", },
{ },
 };
+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 
 static struct platform_driver arm_smmu_driver = {
.driver = {
.name   = "arm-smmu-v3",
.of_match_table = of_match_ptr(arm_smmu_of_match),
-   .suppress_bind_attrs = true,
},
.probe  = arm_smmu_device_probe,
+   .remove = arm_smmu_device_remove,
.shutdown = arm_smmu_device_shutdown,
 };
-builtin_platform_driver(arm_smmu_driver);
+module_platform_driver(arm_smmu_driver);
+
+MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations");
+MODULE_AUTHOR("Will Deacon ");
+MODULE_LICENSE("GPL v2");
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 9/9] iommu/arm-smmu: Allow building as a module

2019-11-08 Thread Will Deacon
By conditionally dropping support for the legacy binding and exporting
the newly introduced 'arm_smmu_impl_init()' function we can allow the
ARM SMMU driver to be built as a module.

Signed-off-by: Will Deacon 
---
 drivers/iommu/Kconfig| 14 +-
 drivers/iommu/Makefile   |  3 ++-
 drivers/iommu/arm-smmu.c | 55 
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7583d47fc4d5..fc55f7ba0d18 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -350,7 +350,7 @@ config SPAPR_TCE_IOMMU
 
 # ARM IOMMU support
 config ARM_SMMU
-   bool "ARM Ltd. System MMU (SMMU) Support"
+   tristate "ARM Ltd. System MMU (SMMU) Support"
depends on (ARM64 || ARM) && MMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
@@ -362,6 +362,18 @@ config ARM_SMMU
  Say Y here if your SoC includes an IOMMU device implementing
  the ARM SMMU architecture.
 
+config ARM_SMMU_LEGACY_DT_BINDINGS
+   bool "Support the legacy \"mmu-masters\" devicetree bindings"
+   depends on ARM_SMMU=y && OF
+   help
+ Support for the badly designed and deprecated "mmu-masters"
+ devicetree bindings. This allows some DMA masters to attach
+ to the SMMU but does not provide any support via the DMA API.
+ If you're lucky, you might be able to get VFIO up and running.
+
+ If you say Y here then you'll make me very sad. Instead, say N
+ and move your firmware to the utopian future that was 2016.
+
 config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
bool "Default to disabling bypass on ARM SMMU v1 and v2"
depends on ARM_SMMU
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f926e73..b52a03d87fc3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,8 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu-mod.o
+arm-smmu-mod-objs += arm-smmu.o arm-smmu-impl.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 53bbe0663b9e..9eb52410d016 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -125,6 +125,12 @@ static struct arm_smmu_domain *to_smmu_domain(struct 
iommu_domain *dom)
return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct platform_driver arm_smmu_driver;
+static struct iommu_ops arm_smmu_ops;
+
+#ifdef CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS
+static void arm_smmu_bus_init(void);
+
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
if (dev_is_pci(dev)) {
@@ -160,9 +166,6 @@ static int __find_legacy_master_phandle(struct device *dev, 
void *data)
return err == -ENOENT ? 0 : err;
 }
 
-static struct platform_driver arm_smmu_driver;
-static struct iommu_ops arm_smmu_ops;
-
 static int arm_smmu_register_legacy_master(struct device *dev,
   struct arm_smmu_device **smmu)
 {
@@ -214,6 +217,27 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
return err;
 }
 
+/*
+ * With the legacy DT binding in play, we have no guarantees about
+ * probe order, but then we're also not doing default domains, so we can
+ * delay setting bus ops until we're sure every possible SMMU is ready,
+ * and that way ensure that no add_device() calls get missed.
+ */
+static int arm_smmu_legacy_bus_init(void)
+{
+   if (using_legacy_binding)
+   arm_smmu_bus_init();
+   return 0;
+}
+device_initcall_sync(arm_smmu_legacy_bus_init);
+#else
+static int arm_smmu_register_legacy_master(struct device *dev,
+  struct arm_smmu_device **smmu)
+{
+   return -ENODEV;
+}
+#endif /* CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS */
+
 static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
 {
int idx;
@@ -1566,6 +1590,7 @@ static struct iommu_ops arm_smmu_ops = {
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = arm_smmu_put_resv_regions,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
+   .owner  = THIS_MODULE,
 };
 
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
@@ -1960,8 +1985,10 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
 
legacy_binding = of_find_property(dev->of_node, "mmu-masters", NULL);
if (legacy_binding && !using_generic_binding) {
-   if (!using_legacy_binding)
-   pr_notice("deprecated \"mmu-masters\" 

[PATCH v2 8/9] Revert "iommu/arm-smmu: Make arm-smmu explicitly non-modular"

2019-11-08 Thread Will Deacon
This reverts commit addb672f200f4e99368270da205320b83efe01a0.

Let's get the SMMU driver building as a module, which means putting
back some dead code that we used to carry.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7c503a6bc585..53bbe0663b9e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -27,8 +27,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -59,10 +58,6 @@
 #define MSI_IOVA_LENGTH0x10
 
 static int force_stage;
-/*
- * not really modular, but the easiest way to keep compat with existing
- * bootargs behaviour is to continue using module_param() here.
- */
 module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
"Force SMMU mappings to be installed at a particular stage of 
translation. A value of '1' or '2' forces the corresponding stage. All other 
values are ignored (i.e. no stage is forced). Note that selecting a specific 
stage will disable support for nested translation.");
@@ -1878,6 +1873,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 },
{ },
 };
+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 
 #ifdef CONFIG_ACPI
 static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
@@ -2165,12 +2161,12 @@ static int arm_smmu_legacy_bus_init(void)
 }
 device_initcall_sync(arm_smmu_legacy_bus_init);
 
-static void arm_smmu_device_shutdown(struct platform_device *pdev)
+static int arm_smmu_device_remove(struct platform_device *pdev)
 {
struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
if (!smmu)
-   return;
+   return -ENODEV;
 
if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
dev_err(&pdev->dev, "removing device with active domains!\n");
@@ -2186,6 +2182,12 @@ static void arm_smmu_device_shutdown(struct 
platform_device *pdev)
clk_bulk_disable(smmu->num_clks, smmu->clks);
 
clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+   return 0;
+}
+
+static void arm_smmu_device_shutdown(struct platform_device *pdev)
+{
+   arm_smmu_device_remove(pdev);
 }
 
 static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
@@ -2235,12 +2237,16 @@ static const struct dev_pm_ops arm_smmu_pm_ops = {
 
 static struct platform_driver arm_smmu_driver = {
.driver = {
-   .name   = "arm-smmu",
-   .of_match_table = of_match_ptr(arm_smmu_of_match),
-   .pm = &arm_smmu_pm_ops,
-   .suppress_bind_attrs= true,
+   .name   = "arm-smmu",
+   .of_match_table = of_match_ptr(arm_smmu_of_match),
+   .pm = &arm_smmu_pm_ops,
},
.probe  = arm_smmu_device_probe,
+   .remove = arm_smmu_device_remove,
.shutdown = arm_smmu_device_shutdown,
 };
-builtin_platform_driver(arm_smmu_driver);
+module_platform_driver(arm_smmu_driver);
+
+MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
+MODULE_AUTHOR("Will Deacon ");
+MODULE_LICENSE("GPL v2");
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/9] iommu/of: Take a ref to the IOMMU driver during ->of_xlate()

2019-11-08 Thread Will Deacon
Ensure that we hold a reference to the IOMMU driver module while calling
the '->of_xlate()' callback during early device probing.

Signed-off-by: Will Deacon 
---
 drivers/iommu/of_iommu.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 78faa9f73a91..25491403a0bd 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,16 +90,16 @@ static int of_iommu_xlate(struct device *dev,
 {
const struct iommu_ops *ops;
struct fwnode_handle *fwnode = &iommu_spec->np->fwnode;
-   int err;
+   int ret;
 
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
!of_device_is_available(iommu_spec->np))
return NO_IOMMU;
 
-   err = iommu_fwspec_init(dev, &iommu_spec->np->fwnode, ops);
-   if (err)
-   return err;
+   ret = iommu_fwspec_init(dev, &iommu_spec->np->fwnode, ops);
+   if (ret)
+   return ret;
/*
 * The otherwise-empty fwspec handily serves to indicate the specific
 * IOMMU device we're waiting for, which will be useful if we ever get
@@ -107,7 +108,12 @@ static int of_iommu_xlate(struct device *dev,
if (!ops)
return driver_deferred_probe_check_state(dev);
 
-   return ops->of_xlate(dev, iommu_spec);
+   if (!try_module_get(ops->owner))
+   return -ENODEV;
+
+   ret = ops->of_xlate(dev, iommu_spec);
+   module_put(ops->owner);
+   return ret;
 }
 
 struct of_pci_iommu_alias_info {
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/9] iommu/of: Request ACS from the PCI core when configuring IOMMU linkage

2019-11-08 Thread Will Deacon
To avoid having to export 'pci_request_acs()' to modular IOMMU drivers,
move the call into the 'of_dma_configure()' path in a similar manner to
the way in which ACS is configured when probing via ACPI/IORT.

Signed-off-by: Will Deacon 
---
 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 614a93aa5305..78faa9f73a91 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -177,6 +177,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
 
+   pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, &info);
} else if (dev_is_fsl_mc(dev)) {
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/9] drivers/iommu: Export core IOMMU API symbols to permit modular drivers

2019-11-08 Thread Will Deacon
Building IOMMU drivers as modules requires that the core IOMMU API
symbols are exported as GPL symbols.

Signed-off-by: Will Deacon 
---
 drivers/iommu/iommu-sysfs.c | 5 +
 drivers/iommu/iommu.c   | 8 
 2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/iommu-sysfs.c b/drivers/iommu/iommu-sysfs.c
index e436ff813e7e..99869217fbec 100644
--- a/drivers/iommu/iommu-sysfs.c
+++ b/drivers/iommu/iommu-sysfs.c
@@ -87,6 +87,7 @@ int iommu_device_sysfs_add(struct iommu_device *iommu,
put_device(iommu->dev);
return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_device_sysfs_add);
 
 void iommu_device_sysfs_remove(struct iommu_device *iommu)
 {
@@ -94,6 +95,8 @@ void iommu_device_sysfs_remove(struct iommu_device *iommu)
device_unregister(iommu->dev);
iommu->dev = NULL;
 }
+EXPORT_SYMBOL_GPL(iommu_device_sysfs_remove);
+
 /*
  * IOMMU drivers can indicate a device is managed by a given IOMMU using
  * this interface.  A link to the device will be created in the "devices"
@@ -119,6 +122,7 @@ int iommu_device_link(struct iommu_device *iommu, struct 
device *link)
 
return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_device_link);
 
 void iommu_device_unlink(struct iommu_device *iommu, struct device *link)
 {
@@ -128,3 +132,4 @@ void iommu_device_unlink(struct iommu_device *iommu, struct 
device *link)
sysfs_remove_link(&link->kobj, "iommu");
sysfs_remove_link_from_group(&iommu->dev->kobj, "devices", 
dev_name(link));
 }
+EXPORT_SYMBOL_GPL(iommu_device_unlink);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d658c7c6a2ab..c1aadb570145 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -141,6 +141,7 @@ int iommu_device_register(struct iommu_device *iommu)
spin_unlock(&iommu_device_lock);
return 0;
 }
+EXPORT_SYMBOL_GPL(iommu_device_register);
 
 void iommu_device_unregister(struct iommu_device *iommu)
 {
@@ -148,6 +149,7 @@ void iommu_device_unregister(struct iommu_device *iommu)
list_del(&iommu->list);
spin_unlock(&iommu_device_lock);
 }
+EXPORT_SYMBOL_GPL(iommu_device_unregister);
 
 static struct iommu_param *iommu_get_dev_param(struct device *dev)
 {
@@ -886,6 +888,7 @@ struct iommu_group *iommu_group_ref_get(struct iommu_group 
*group)
kobject_get(group->devices_kobj);
return group;
 }
+EXPORT_SYMBOL_GPL(iommu_group_ref_get);
 
 /**
  * iommu_group_put - Decrement group reference
@@ -1259,6 +1262,7 @@ struct iommu_group *generic_device_group(struct device 
*dev)
 {
return iommu_group_alloc();
 }
+EXPORT_SYMBOL_GPL(generic_device_group);
 
 /*
  * Use standard PCI bus topology, isolation features, and DMA alias quirks
@@ -1326,6 +1330,7 @@ struct iommu_group *pci_device_group(struct device *dev)
/* No shared group found, allocate new */
return iommu_group_alloc();
 }
+EXPORT_SYMBOL_GPL(pci_device_group);
 
 /* Get the IOMMU group for device on fsl-mc bus */
 struct iommu_group *fsl_mc_device_group(struct device *dev)
@@ -1338,6 +1343,7 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
group = iommu_group_alloc();
return group;
 }
+EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
@@ -1406,6 +1412,7 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
 
return group;
 }
+EXPORT_SYMBOL(iommu_group_get_for_dev);
 
 struct iommu_domain *iommu_group_default_domain(struct iommu_group *group)
 {
@@ -2185,6 +2192,7 @@ struct iommu_resv_region 
*iommu_alloc_resv_region(phys_addr_t start,
region->type = type;
return region;
 }
+EXPORT_SYMBOL_GPL(iommu_alloc_resv_region);
 
 static int
 request_default_domain_for_dev(struct device *dev, unsigned long type)
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/9] PCI: Export pci_ats_disabled() as a GPL symbol to modules

2019-11-08 Thread Will Deacon
Building drivers for ATS-aware IOMMUs as modules requires access to
pci_ats_disabled(). Export it as a GPL symbol to get things working.

Acked-by: Bjorn Helgaas 
Signed-off-by: Will Deacon 
---
 drivers/pci/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a97e2571a527..4fbe5b576dd8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -123,6 +123,7 @@ bool pci_ats_disabled(void)
 {
return pcie_ats_disabled;
 }
+EXPORT_SYMBOL_GPL(pci_ats_disabled);
 
 /* Disable bridge_d3 for all PCIe ports */
 static bool pci_bridge_d3_disable;
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/9] iommu: Permit modular builds of ARM SMMU[v3] drivers

2019-11-08 Thread Will Deacon
Hi all,

This is version two of the patches I previously posted here:

  https://lore.kernel.org/lkml/20191030145112.19738-1-w...@kernel.org/

Changes since v1 include:

  * Build single "arm-smmu-mod.ko" module for the Arm SMMU driver
  * Hold a reference to the IOMMU driver module across {add,remove}_device()
  * Take a reference to the IOMMU driver module during of_xlate()
  * Added Bjorn's ack on the PCI export patch

Please note that I haven't been able to test this properly, since I don't
currently have access to any Arm SMMU hardware.

Cheers,

Will

Cc: Jean-Philippe Brucker 
Cc: Jordan Crouse 
Cc: John Garry 
Cc: Bjorn Helgaas 
Cc: Saravana Kannan 
Cc: "Isaac J. Manjarres" 
Cc: Robin Murphy 
Cc: Lorenzo Pieralisi 
Cc: Joerg Roedel 

--->8

Will Deacon (9):
  drivers/iommu: Export core IOMMU API symbols to permit modular drivers
  iommu/of: Request ACS from the PCI core when configuring IOMMU linkage
  PCI: Export pci_ats_disabled() as a GPL symbol to modules
  drivers/iommu: Take a ref to the IOMMU driver prior to ->add_device()
  iommu/of: Take a ref to the IOMMU driver during ->of_xlate()
  Revert "iommu/arm-smmu: Make arm-smmu-v3 explicitly non-modular"
  iommu/arm-smmu-v3: Allow building as a module
  Revert "iommu/arm-smmu: Make arm-smmu explicitly non-modular"
  iommu/arm-smmu: Allow building as a module

 drivers/iommu/Kconfig   | 16 ++-
 drivers/iommu/Makefile  |  3 +-
 drivers/iommu/arm-smmu-v3.c | 27 +++-
 drivers/iommu/arm-smmu.c| 87 ++---
 drivers/iommu/iommu-sysfs.c |  5 +++
 drivers/iommu/iommu.c   | 27 +++-
 drivers/iommu/of_iommu.c| 17 +---
 drivers/pci/pci.c   |  1 +
 include/linux/iommu.h   |  2 +
 9 files changed, 130 insertions(+), 55 deletions(-)

-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 2/3] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)

2019-11-08 Thread Alex Williamson
On Fri, 8 Nov 2019 12:23:41 +
"Liu, Yi L"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Friday, November 8, 2019 6:07 AM
> > To: Liu, Yi L 
> > Subject: Re: [RFC v2 2/3] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > 
> > On Wed, 6 Nov 2019 13:27:26 +
> > "Liu, Yi L"  wrote:
> >   
> > > > From: Alex Williamson 
> > > > Sent: Wednesday, November 6, 2019 7:36 AM
> > > > To: Liu, Yi L 
> > > > Subject: Re: [RFC v2 2/3] vfio/type1: 
> > > > VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > > >
> > > > On Thu, 24 Oct 2019 08:26:22 -0400
> > > > Liu Yi L  wrote:
> > > >  
> > > > > This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims
> > > > > to passdown PASID allocation/free request from the virtual
> > > > > iommu. This is required to get PASID managed in system-wide.
> > > > >
> > > > > Cc: Kevin Tian 
> > > > > Signed-off-by: Liu Yi L 
> > > > > Signed-off-by: Yi Sun 
> > > > > Signed-off-by: Jacob Pan 
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 114  
> > > >   
> > > > >  include/uapi/linux/vfio.h   |  25 +
> > > > >  2 files changed, 139 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c  
> > b/drivers/vfio/vfio_iommu_type1.c  
> > > > > index cd8d3a5..3d73a7d 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -2248,6 +2248,83 @@ static int vfio_cache_inv_fn(struct device 
> > > > > *dev,  
> > void  
> > > > *data)  
> > > > >   return iommu_cache_invalidate(dc->domain, dev, &ustruct->info);
> > > > >  }
> > > > >
> > > > > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > > > > +  int min_pasid,
> > > > > +  int max_pasid)
> > > > > +{
> > > > > + int ret;
> > > > > + ioasid_t pasid;
> > > > > + struct mm_struct *mm = NULL;
> > > > > +
> > > > > + mutex_lock(&iommu->lock);
> > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > > > + ret = -EINVAL;
> > > > > + goto out_unlock;
> > > > > + }
> > > > > + mm = get_task_mm(current);
> > > > > + /* Track ioasid allocation owner by mm */
> > > > > + pasid = ioasid_alloc((struct ioasid_set *)mm, min_pasid,
> > > > > + max_pasid, NULL);  
> > > >
> > > > Are we sure we want to tie this to the task mm vs perhaps the
> > > > vfio_iommu pointer?  
> > >
> > > Here we want to have a kind of per-VM mark, which can be used to do
> > > ownership check on whether a pasid is held by a specific VM. This is
> > > very important to prevent across VM affect. vfio_iommu pointer is
> > > competent for vfio as vfio is both pasid alloc requester and pasid
> > > consumer. e.g. vfio requests pasid alloc from ioasid and also it will
> > > invoke bind_gpasid(). vfio can either check ownership before invoking
> > > bind_gpasid() or pass vfio_iommu pointer to iommu driver. But in future,
> > > there may be other modules which are just consumers of pasid. And they
> > > also want to do ownership check for a pasid. Then, it would be hard for
> > > them as they are not the pasid alloc requester. So here better to have
> > > a system wide structure to perform as the per-VM mark. task mm looks
> > > to be much competent.  
> > 
> > Ok, so it's intentional to have a VM-wide token.  Elsewhere in the
> > type1 code (vfio_dma_do_map) we record the task_struct per dma mapping
> > so that we can get the task mm as needed.  Would the task_struct
> > pointer provide any advantage?  
> 
> I think we may use task_struct pointer to make type1 code consistent.
> How do you think?

If it has the same utility, sure.
 
> > Also, an overall question, this provides userspace with pasid alloc and
> > free ioctls, (1) what prevents a userspace process from consuming every
> > available pasid, and (2) if the process exits or crashes without
> > freeing pasids, how are they recovered aside from a reboot?  
> 
> For question (1), I think we only need to take care about malicious
> userspace process. As vfio usage is under privilege mode, so we may
> be safe on it so far.

No, where else do we ever make this assumption?  vfio requires a
privileged entity to configure the system for vfio, bind devices for
user use, and grant those devices to the user, but the usage of the
device is always assumed to be by an unprivileged user.  It is
absolutely not acceptable require a privileged user.  It's vfio's
responsibility to protect the system from the user.

> However, we may need to introduce a kind of credit
> mechanism to protect it. I've thought it, but no good idea yet. Would be
> happy to hear from you.

It's a limited system resource and it's unclear how many might
reasonably used by a user.  I don't have an easy answer.

> For question (2), I think we need to reclaim the allocated pasids when
> the vfio container fd 

Re: [PATCH 5/7] iommu/arm-smmu-v3: Allow building as a module

2019-11-08 Thread Will Deacon
Hi Jean-Philippe,

On Mon, Nov 04, 2019 at 08:15:24PM +0100, Jean-Philippe Brucker wrote:
> On Thu, Oct 31, 2019 at 03:42:47PM +, Will Deacon wrote:
> > > Sorry for the stupid question, but what prevents the iommu module from
> > > being unloaded when there are active users? There are no symbol
> > > dependencies to endpoint device drivers, because the interface is only
> > > exposed through the iommu-api, right? Is some sort of manual module
> > > reference counting needed?
> > 
> > Generally, I think unloading the IOMMU driver module while there are
> > active users is a pretty bad idea, much like unbinding the driver via
> > /sys in the same situation would also be fairly daft. However, I *think*
> > the code in __device_release_driver() tries to deal with this by
> > iterating over the active consumers and ->remove()ing them first.
> 
> > I'm without hardware access at the moment, so I haven't been able to
> > test this myself. We could nobble the module_exit() hook, but there's
> > still the "force unload" option depending on the .config.
> 
> Shame that we can't completely prevent module unloading, because handling
> rmmod cleanly is tricky.
> 
> On module unload we also need to tidy up the bus->iommu_ops installed by
> bus_set_iommu(), and remove the IOMMU groups (and probably other leaks I
> missed). I have a solution for the bus->iommu_ops, which is simply adding
> a bus_unset_iommu() counterpart with a refcount, but it doesn't deal with
> the IOMMU groups cleanly. If there are multiple IOMMU instances managing
> one bus, then we should only remove the IOMMU groups belonging to the
> instance that is being removed.

Hmm, but all of those IOMMU instances must be driven by the same driver,
right, since bus_set_iommu() can only take one set of callbacks for a given
bus? In which case, removing the driver module effectively removes all
instances of the IOMMU for that bus and I think we're ok.

If we couple that with Joerg's suggestion to take a reference to the driver
module in add_device(), then I think that actually it's harmless to leave
the bus ops installed and the groups should be sorted too. It means it's
pretty difficult to unload the module, but that's probably not a bad thing.

I'll post a v2 shortly...

> I'll think about this more, but the simple solution is attached if you
> want to test. It at least works with a single IOMMU now:
> 
> $ modprobe virtio-iommu
> [   25.180965] virtio_iommu virtio0: input address: 64 bits
> [   25.181437] virtio_iommu virtio0: page mask: 0xf000
> [   25.214493] virtio-pci :00:03.0: Adding to iommu group 0
> [   25.233252] virtio-pci :00:03.0: enabling device ( -> 0003)
> [   25.334810] e1000e :00:02.0: Adding to iommu group 1
> [   25.348997] e1000e :00:02.0: enabling device ( -> 0002)
> ... net test etc
> 
> $ rmmod virtio-iommu
> [   34.084816] e1000e: eth1 NIC Link is Down
> [   34.212152] pci :00:02.0: Removing from iommu group 1
> [   34.250558] pci :00:03.0: Removing from iommu group 0
> [   34.261570] virtio_iommu virtio0: device removed
> 
> $ modprobe virtio-iommu
> [   34.828982] virtio_iommu virtio0: input address: 64 bits
> [   34.829442] virtio_iommu virtio0: page mask: 0xf000
> [   34.844576] virtio-pci :00:03.0: Adding to iommu group 0
> [   34.916449] e1000e :00:02.0: Adding to iommu group 1
> 
> Thanks,
> Jean

> From 5437fcaabe1d4671e2dc5b90b7898c0bf698111b Mon Sep 17 00:00:00 2001
> From: Jean-Philippe Brucker 
> Date: Mon, 4 Nov 2019 15:52:36 +0100
> Subject: [PATCH] iommu: Add bus_unset_iommu()
> 
> Let modular IOMMU drivers undo bus_set_iommu(). Keep track of bus
> registrations with a list and refcount, and remove the iommu_ops from
> the bus when there are no IOMMU providers anymore.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 101 ++
>  include/linux/iommu.h |   1 +
>  2 files changed, 84 insertions(+), 18 deletions(-)

To be honest, I think we should be trying to move *away* from the bus-ops
abstraction rather than extending it. We already don't need it for DMA
domains on arm64, and I think it's really just a bit of a wart now because
iommu_domain_alloc() takes a 'struct bus_type *' as its argument.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

2019-11-08 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> With PASID granular translation type set to 0x11b, translation
> result from the first level(FL) also subject to a second level(SL)
> page table translation. This mode is used for SVA virtualization,
> where FL performs guest virtual to guest physical translation and
> SL performs guest physical to host physical translation.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-pasid.c | 207 
> 
>  drivers/iommu/intel-pasid.h |  12 +++
>  2 files changed, 219 insertions(+)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index ffbd416ed3b8..f846a907cfcf 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
>   pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
>  }
>  
> +/*
> + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emt(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
> +}
> +
> +/*
> + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pat(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);
> +}
> +
> +/*
> + * Setup the Cache Disable (CD) field (Bit 89)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_cd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 25, 1);
should be pasid_set_bits(&pe->val[1], 1 << 25, 1 << 25);
and same for below individual bit settings.

a macro could be introduced, taking the offset (up to 511) and the size
and this would automatically select the right pe->val[n] and convert the
offset into a 64b one. I think the readability would be improved versus
the spec.

Not related to this patch but it may be worth to "&" the "bits" value
with the mask to avoid any wrong value to overwrite other fields?

> +}
> +
> +/*
> + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emte(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> +}
> +
> +/*
> + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_eafe(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[2], 1 << 7, 1);> +}
> +
> +/*
> + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pcd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> +}
> +
> +/*
> + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pwt(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> +}
> +
>  static void
>  pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
>   u16 did, int pasid)
> @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>  
>   return 0;
>  }
> +
> +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + struct iommu_gpasid_bind_data_vtd *pasid_data)
> +{
> + /*
> +  * Not all guest PASID table entry fields are passed down during bind,
> +  * here we only set up the ones that are dependent on guest settings.
> +  * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
> +  * therefore not set. Other fields, such as snoop related, are set based
> +  * on host needs regardless of  guest settings.
> +  */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> + if (!ecap_srs(iommu->ecap)) {
> + pr_err("No supervisor request support on %s\n",
> +iommu->name);
> + return -EINVAL;
> + }
> + pasid_set_sre(pte);
> + }
> +
> + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && 
> ecap_eafs(iommu->ecap))
> + pasid_set_eafe(pte);
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> + pasid_set_emte(pte);
> + pasid_set_emt(pte, pasid_data->emt);
> + }
> +
> + /*
> +  * Memory type is only applicable to devices inside processor coherent
> +  * domain. PCIe devices are not included. We can skip the rest of the
> +  * flags if IOMMU does not support MTS.
> +  */
> + if (!ecap_mts(iommu->ecap)) {
> + pr_info("%s

[RESEND PATCH v8 2/3] uacce: add uacce driver

2019-11-08 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
only data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

The IOMMU core only tracks mm<->device bonds at the moment, because it
only needs to handle IOTLB invalidation and PASID table entries. However
uacce needs a finer granularity since multiple queues from the same
device can be bound to an mm. When the mm exits, all bound queues must
be stopped so that the IOMMU can safely clear the PASID table entry and
reallocate the PASID.

An intermediate struct uacce_mm links uacce devices and queues.
Note that an mm may be bound to multiple devices but an uacce_mm
structure only ever belongs to a single device, because we don't need
anything more complex (if multiple devices are bound to one mm, then
we'll create one uacce_mm for each bond).

uacce_device --+-- uacce_mm --+-- uacce_queue
   |  '-- uacce_queue
   |
   '-- uacce_mm --+-- uacce_queue
  +-- uacce_queue
  '-- uacce_queue

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |  53 +++
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile|   1 +
 drivers/misc/uacce/Kconfig   |  13 +
 drivers/misc/uacce/Makefile  |   2 +
 drivers/misc/uacce/uacce.c   | 633 +++
 include/linux/uacce.h| 155 +++
 include/uapi/misc/uacce/uacce.h  |  36 ++
 8 files changed, 894 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..20a6ada
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,53 @@
+What:   /sys/class/uacce//id
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce//api
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce//flags
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce//available_instances
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce//algorithms
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator, separated by new 
line.
+
+What:   /sys/class/uacce//region_mmio_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of mmio region queue file
+
+What:   /sys/class/uacce//region_dus_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of dus region queue file
+
+What:   /sys/class/uacce//numa_distance
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce//node_id
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c55b637..929feb0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 

[RESEND PATCH v8 3/3] crypto: hisilicon - register zip engine to uacce

2019-11-08 Thread Zhangfei Gao
Register qm to uacce framework for user crypto driver

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
 drivers/crypto/hisilicon/qm.c   | 256 ++--
 drivers/crypto/hisilicon/qm.h   |  11 ++
 drivers/crypto/hisilicon/zip/zip_main.c |  38 ++---
 include/uapi/misc/uacce/hisi_qm.h   |  23 +++
 4 files changed, 289 insertions(+), 39 deletions(-)
 create mode 100644 include/uapi/misc/uacce/hisi_qm.h

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index a8ed6990..bf8442d 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "qm.h"
 
 /* eq/aeq irq enable */
@@ -465,17 +468,22 @@ static void qm_cq_head_update(struct hisi_qp *qp)
 
 static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm *qm)
 {
-   struct qm_cqe *cqe = qp->cqe + qp->qp_status.cq_head;
-
-   if (qp->req_cb) {
-   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
-   dma_rmb();
-   qp->req_cb(qp, qp->sqe + qm->sqe_size * cqe->sq_head);
-   qm_cq_head_update(qp);
-   cqe = qp->cqe + qp->qp_status.cq_head;
-   qm_db(qm, qp->qp_id, QM_DOORBELL_CMD_CQ,
- qp->qp_status.cq_head, 0);
-   atomic_dec(&qp->qp_status.used);
+   struct qm_cqe *cqe;
+
+   if (qp->event_cb) {
+   qp->event_cb(qp);
+   } else {
+   cqe = qp->cqe + qp->qp_status.cq_head;
+
+   if (qp->req_cb) {
+   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
+   dma_rmb();
+   qp->req_cb(qp, qp->sqe + qm->sqe_size *
+  cqe->sq_head);
+   qm_cq_head_update(qp);
+   cqe = qp->cqe + qp->qp_status.cq_head;
+   atomic_dec(&qp->qp_status.used);
+   }
}
 
/* set c_flag */
@@ -1271,7 +1279,7 @@ static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, 
int pasid)
  * @qp: The qp we want to start to run.
  * @arg: Accelerator specific argument.
  *
- * After this function, qp can receive request from user. Return qp_id if
+ * After this function, qp can receive request from user. Return 0 if
  * successful, Return -EBUSY if failed.
  */
 int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
@@ -1316,7 +1324,7 @@ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long 
arg)
 
dev_dbg(dev, "queue %d started\n", qp_id);
 
-   return qp_id;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(hisi_qm_start_qp);
 
@@ -1397,6 +1405,213 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm)
}
 }
 
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+   wake_up_interruptible(&qp->uacce_q->wait);
+}
+
+static int hisi_qm_get_available_instances(struct uacce_device *uacce)
+{
+   int i, ret;
+   struct hisi_qm *qm = uacce->priv;
+
+   read_lock(&qm->qps_lock);
+   for (i = 0, ret = 0; i < qm->qp_num; i++)
+   if (!qm->qp_array[i])
+   ret++;
+   read_unlock(&qm->qps_lock);
+
+   return ret;
+}
+
+static int hisi_qm_uacce_get_queue(struct uacce_device *uacce,
+  unsigned long arg,
+  struct uacce_queue *q)
+{
+   struct hisi_qm *qm = uacce->priv;
+   struct hisi_qp *qp;
+   u8 alg_type = 0;
+
+   qp = hisi_qm_create_qp(qm, alg_type);
+   if (IS_ERR(qp))
+   return PTR_ERR(qp);
+
+   q->priv = qp;
+   q->uacce = uacce;
+   qp->uacce_q = q;
+   qp->event_cb = qm_qp_event_notifier;
+   qp->pasid = arg;
+
+   return 0;
+}
+
+static void hisi_qm_uacce_put_queue(struct uacce_queue *q)
+{
+   struct hisi_qp *qp = q->priv;
+
+   hisi_qm_cache_wb(qp->qm);
+   hisi_qm_release_qp(qp);
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_uacce_mmap(struct uacce_queue *q,
+ struct vm_area_struct *vma,
+ struct uacce_qfile_region *qfr)
+{
+   struct hisi_qp *qp = q->priv;
+   struct hisi_qm *qm = qp->qm;
+   size_t sz = vma->vm_end - vma->vm_start;
+   struct pci_dev *pdev = qm->pdev;
+   struct device *dev = &pdev->dev;
+   unsigned long vm_pgoff;
+   int ret;
+
+   switch (qfr->type) {
+   case UACCE_QFRT_MMIO:
+   if (qm->ver == QM_HW_V2) {
+   if (sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR +
+   QM_DOORBELL_SQ_CQ_BASE_V2 / PAGE_SIZE))
+   return -EINVAL;
+   } else {
+   if (sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)
+

[RESEND PATCH v8 0/3] Add uacce module for Accelerator

2019-11-08 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc4-uacce-v8

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v8

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v8:
Address some comments from Jonathan
Merge Jean's patch, using uacce_mm instead of pid for sva_exit

v7:
As suggested by Jean and Jerome
Only consider sva case and remove unused dma apis for the first patch.
Also add mm_exit for sva and vm_ops.close etc


v6: https://lkml.org/lkml/2019/10/16/231
Change sys qfrs_size to different file, suggested by Jonathan
Fix crypto daily build issue and based on crypto code base, also 5.4-rc1.

v5: https://lkml.org/lkml/2019/10/14/74
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4: https://lkml.org/lkml/2019/9/17/116
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3: https://lkml.org/lkml/2019/9/2/990
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2: https://lkml.org/lkml/2019/8/28/565
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1: https://lkml.org/lkml/2019/8/14/277
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.


The Fork Scenario
=
For a process with allocated qu

[RESEND PATCH v8 1/3] uacce: Add documents for uacce

2019-11-08 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 159 +++
 1 file changed, 159 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..733521a
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+-
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+only data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+The uacce device, built around the IOMMU SVA API, can access multiple
+address spaces, including the one without PASID.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |

[PATCH v8 2/3] uacce: add uacce driver

2019-11-08 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
only data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

The IOMMU core only tracks mm<->device bonds at the moment, because it
only needs to handle IOTLB invalidation and PASID table entries. However
uacce needs a finer granularity since multiple queues from the same
device can be bound to an mm. When the mm exits, all bound queues must
be stopped so that the IOMMU can safely clear the PASID table entry and
reallocate the PASID.

An intermediate struct uacce_mm links uacce devices and queues.
Note that an mm may be bound to multiple devices but an uacce_mm
structure only ever belongs to a single device, because we don't need
anything more complex (if multiple devices are bound to one mm, then
we'll create one uacce_mm for each bond).

uacce_device --+-- uacce_mm --+-- uacce_queue
   |  '-- uacce_queue
   |
   '-- uacce_mm --+-- uacce_queue
  +-- uacce_queue
  '-- uacce_queue

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |  53 +++
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile|   1 +
 drivers/misc/uacce/Kconfig   |  13 +
 drivers/misc/uacce/Makefile  |   2 +
 drivers/misc/uacce/uacce.c   | 634 +++
 include/linux/uacce.h| 155 +++
 include/uapi/misc/uacce/uacce.h  |  36 ++
 8 files changed, 895 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..20a6ada
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,53 @@
+What:   /sys/class/uacce//id
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce//api
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce//flags
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce//available_instances
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce//algorithms
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator, separated by new 
line.
+
+What:   /sys/class/uacce//region_mmio_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of mmio region queue file
+
+What:   /sys/class/uacce//region_dus_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of dus region queue file
+
+What:   /sys/class/uacce//numa_distance
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce//node_id
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c55b637..929feb0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 

[PATCH v8 3/3] crypto: hisilicon - register zip engine to uacce

2019-11-08 Thread Zhangfei Gao
Register qm to uacce framework for user crypto driver

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
 drivers/crypto/hisilicon/qm.c   | 256 ++--
 drivers/crypto/hisilicon/qm.h   |  11 ++
 drivers/crypto/hisilicon/zip/zip_main.c |  38 ++---
 include/uapi/misc/uacce/hisi_qm.h   |  23 +++
 4 files changed, 289 insertions(+), 39 deletions(-)
 create mode 100644 include/uapi/misc/uacce/hisi_qm.h

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index a8ed6990..bf8442d 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "qm.h"
 
 /* eq/aeq irq enable */
@@ -465,17 +468,22 @@ static void qm_cq_head_update(struct hisi_qp *qp)
 
 static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm *qm)
 {
-   struct qm_cqe *cqe = qp->cqe + qp->qp_status.cq_head;
-
-   if (qp->req_cb) {
-   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
-   dma_rmb();
-   qp->req_cb(qp, qp->sqe + qm->sqe_size * cqe->sq_head);
-   qm_cq_head_update(qp);
-   cqe = qp->cqe + qp->qp_status.cq_head;
-   qm_db(qm, qp->qp_id, QM_DOORBELL_CMD_CQ,
- qp->qp_status.cq_head, 0);
-   atomic_dec(&qp->qp_status.used);
+   struct qm_cqe *cqe;
+
+   if (qp->event_cb) {
+   qp->event_cb(qp);
+   } else {
+   cqe = qp->cqe + qp->qp_status.cq_head;
+
+   if (qp->req_cb) {
+   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
+   dma_rmb();
+   qp->req_cb(qp, qp->sqe + qm->sqe_size *
+  cqe->sq_head);
+   qm_cq_head_update(qp);
+   cqe = qp->cqe + qp->qp_status.cq_head;
+   atomic_dec(&qp->qp_status.used);
+   }
}
 
/* set c_flag */
@@ -1271,7 +1279,7 @@ static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, 
int pasid)
  * @qp: The qp we want to start to run.
  * @arg: Accelerator specific argument.
  *
- * After this function, qp can receive request from user. Return qp_id if
+ * After this function, qp can receive request from user. Return 0 if
  * successful, Return -EBUSY if failed.
  */
 int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
@@ -1316,7 +1324,7 @@ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long 
arg)
 
dev_dbg(dev, "queue %d started\n", qp_id);
 
-   return qp_id;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(hisi_qm_start_qp);
 
@@ -1397,6 +1405,213 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm)
}
 }
 
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+   wake_up_interruptible(&qp->uacce_q->wait);
+}
+
+static int hisi_qm_get_available_instances(struct uacce_device *uacce)
+{
+   int i, ret;
+   struct hisi_qm *qm = uacce->priv;
+
+   read_lock(&qm->qps_lock);
+   for (i = 0, ret = 0; i < qm->qp_num; i++)
+   if (!qm->qp_array[i])
+   ret++;
+   read_unlock(&qm->qps_lock);
+
+   return ret;
+}
+
+static int hisi_qm_uacce_get_queue(struct uacce_device *uacce,
+  unsigned long arg,
+  struct uacce_queue *q)
+{
+   struct hisi_qm *qm = uacce->priv;
+   struct hisi_qp *qp;
+   u8 alg_type = 0;
+
+   qp = hisi_qm_create_qp(qm, alg_type);
+   if (IS_ERR(qp))
+   return PTR_ERR(qp);
+
+   q->priv = qp;
+   q->uacce = uacce;
+   qp->uacce_q = q;
+   qp->event_cb = qm_qp_event_notifier;
+   qp->pasid = arg;
+
+   return 0;
+}
+
+static void hisi_qm_uacce_put_queue(struct uacce_queue *q)
+{
+   struct hisi_qp *qp = q->priv;
+
+   hisi_qm_cache_wb(qp->qm);
+   hisi_qm_release_qp(qp);
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_uacce_mmap(struct uacce_queue *q,
+ struct vm_area_struct *vma,
+ struct uacce_qfile_region *qfr)
+{
+   struct hisi_qp *qp = q->priv;
+   struct hisi_qm *qm = qp->qm;
+   size_t sz = vma->vm_end - vma->vm_start;
+   struct pci_dev *pdev = qm->pdev;
+   struct device *dev = &pdev->dev;
+   unsigned long vm_pgoff;
+   int ret;
+
+   switch (qfr->type) {
+   case UACCE_QFRT_MMIO:
+   if (qm->ver == QM_HW_V2) {
+   if (sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR +
+   QM_DOORBELL_SQ_CQ_BASE_V2 / PAGE_SIZE))
+   return -EINVAL;
+   } else {
+   if (sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)
+

[PATCH v8 0/3] Add uacce module for Accelerator

2019-11-08 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc4-uacce-v7

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v7

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v8:
Address some comments from Jonathan
Merge Jean's patch, using uacce_mm instead of pid for sva_exit

v7:
As suggested by Jean and Jerome
Only consider sva case and remove unused dma apis for the first patch.
Also add mm_exit for sva and vm_ops.close etc


v6: https://lkml.org/lkml/2019/10/16/231
Change sys qfrs_size to different file, suggested by Jonathan
Fix crypto daily build issue and based on crypto code base, also 5.4-rc1.

v5: https://lkml.org/lkml/2019/10/14/74
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4: https://lkml.org/lkml/2019/9/17/116
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3: https://lkml.org/lkml/2019/9/2/990
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2: https://lkml.org/lkml/2019/8/28/565
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1: https://lkml.org/lkml/2019/8/14/277
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.


The Fork Scenario
=
For a process with allocated qu

[PATCH v8 1/3] uacce: Add documents for uacce

2019-11-08 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 159 +++
 1 file changed, 159 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..733521a
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+-
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+only data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+The uacce device, built around the IOMMU SVA API, can access multiple
+address spaces, including the one without PASID.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |

RE: [RFC v2 2/3] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)

2019-11-08 Thread Liu, Yi L
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, November 8, 2019 6:07 AM
> To: Liu, Yi L 
> Subject: Re: [RFC v2 2/3] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)
> 
> On Wed, 6 Nov 2019 13:27:26 +
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Wednesday, November 6, 2019 7:36 AM
> > > To: Liu, Yi L 
> > > Subject: Re: [RFC v2 2/3] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > >
> > > On Thu, 24 Oct 2019 08:26:22 -0400
> > > Liu Yi L  wrote:
> > >
> > > > This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims
> > > > to passdown PASID allocation/free request from the virtual
> > > > iommu. This is required to get PASID managed in system-wide.
> > > >
> > > > Cc: Kevin Tian 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Yi Sun 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 114
> > > 
> > > >  include/uapi/linux/vfio.h   |  25 +
> > > >  2 files changed, 139 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> > > > index cd8d3a5..3d73a7d 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -2248,6 +2248,83 @@ static int vfio_cache_inv_fn(struct device *dev,
> void
> > > *data)
> > > > return iommu_cache_invalidate(dc->domain, dev, &ustruct->info);
> > > >  }
> > > >
> > > > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > > > +int min_pasid,
> > > > +int max_pasid)
> > > > +{
> > > > +   int ret;
> > > > +   ioasid_t pasid;
> > > > +   struct mm_struct *mm = NULL;
> > > > +
> > > > +   mutex_lock(&iommu->lock);
> > > > +   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > > +   ret = -EINVAL;
> > > > +   goto out_unlock;
> > > > +   }
> > > > +   mm = get_task_mm(current);
> > > > +   /* Track ioasid allocation owner by mm */
> > > > +   pasid = ioasid_alloc((struct ioasid_set *)mm, min_pasid,
> > > > +   max_pasid, NULL);
> > >
> > > Are we sure we want to tie this to the task mm vs perhaps the
> > > vfio_iommu pointer?
> >
> > Here we want to have a kind of per-VM mark, which can be used to do
> > ownership check on whether a pasid is held by a specific VM. This is
> > very important to prevent across VM affect. vfio_iommu pointer is
> > competent for vfio as vfio is both pasid alloc requester and pasid
> > consumer. e.g. vfio requests pasid alloc from ioasid and also it will
> > invoke bind_gpasid(). vfio can either check ownership before invoking
> > bind_gpasid() or pass vfio_iommu pointer to iommu driver. But in future,
> > there may be other modules which are just consumers of pasid. And they
> > also want to do ownership check for a pasid. Then, it would be hard for
> > them as they are not the pasid alloc requester. So here better to have
> > a system wide structure to perform as the per-VM mark. task mm looks
> > to be much competent.
> 
> Ok, so it's intentional to have a VM-wide token.  Elsewhere in the
> type1 code (vfio_dma_do_map) we record the task_struct per dma mapping
> so that we can get the task mm as needed.  Would the task_struct
> pointer provide any advantage?

I think we may use task_struct pointer to make type1 code consistent.
How do you think?

> Also, an overall question, this provides userspace with pasid alloc and
> free ioctls, (1) what prevents a userspace process from consuming every
> available pasid, and (2) if the process exits or crashes without
> freeing pasids, how are they recovered aside from a reboot?

For question (1), I think we only need to take care about malicious
userspace process. As vfio usage is under privilege mode, so we may
be safe on it so far. However, we may need to introduce a kind of credit
mechanism to protect it. I've thought it, but no good idea yet. Would be
happy to hear from you.

For question (2), I think we need to reclaim the allocated pasids when
the vfio container fd is released just like what vfio does to the domain
mappings. I didn't add it yet. But I can add it in next version if you think
it would make the pasid alloc/free be much sound.

> > > > +   if (pasid == INVALID_IOASID) {
> > > > +   ret = -ENOSPC;
> > > > +   goto out_unlock;
> > > > +   }
> > > > +   ret = pasid;
> > > > +out_unlock:
> > > > +   mutex_unlock(&iommu->lock);
> 
> What does holding this lock protect?  That the vfio_iommu remains
> backed by an iommu during this operation, even though we don't do
> anything to release allocated pasids when that iommu backing is removed?

yes, it is unnecessary to hold the lock here. At least for the operations in
this patch. will remove it. :-)

> > > > +   if (mm)
> > > > +   mmput(mm);
> >

Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-11-08 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 12 ++--
>  drivers/iommu/intel-pasid.c | 36 
>  drivers/iommu/intel-svm.c   | 39 +++
>  3 files changed, 29 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ced1d89ef977..2ea09b988a23 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
> *domain,
>   domain->auxd_refcnt--;
>  
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>  }
>  
>  static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct dmar_domain 
> *domain,
>   if (domain->default_pasid <= 0) {
>   int pasid;
>  
> - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> -  pci_max_pasids(to_pci_dev(dev)),
> -  GFP_KERNEL);
> - if (pasid <= 0) {
> + /* No private data needed for the default pasid */
> + pasid = ioasid_alloc(NULL, PASID_MIN, 
> pci_max_pasids(to_pci_dev(dev)) - 1,
> + NULL);
> + if (pasid == INVALID_IOASID) {
>   pr_err("Can't allocate default pasid\n");
>   return -ENODEV;
>   }
> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct dmar_domain 
> *domain,
>   spin_unlock(&iommu->lock);
>   spin_unlock_irqrestore(&device_domain_lock, flags);
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>  
>   return ret;
>  }
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index d81e857d2b25..e79d680fe300 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -26,42 +26,6 @@
>   */
>  static DEFINE_SPINLOCK(pasid_lock);
>  u32 intel_pasid_max_id = PASID_MAX;
> -static DEFINE_IDR(pasid_idr);
> -
> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> -{
> - int ret, min, max;
> -
> - min = max_t(int, start, PASID_MIN);
> - max = min_t(int, end, intel_pasid_max_id);
> -
> - WARN_ON(in_interrupt());
> - idr_preload(gfp);
> - spin_lock(&pasid_lock);
> - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> - spin_unlock(&pasid_lock);
> - idr_preload_end();
> -
> - return ret;
> -}
> -
> -void intel_pasid_free_id(int pasid)
> -{
> - spin_lock(&pasid_lock);
> - idr_remove(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -}
> -
> -void *intel_pasid_lookup_id(int pasid)
> -{
> - void *p;
> -
> - spin_lock(&pasid_lock);
> - p = idr_find(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -
> - return p;
> -}
>  
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
>  {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..a9a7f85a09bc 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "intel-pasid.h"
> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, 
> int flags, struct svm_dev_
>   if (pasid_max > intel_pasid_max_id)
>   pasid_max = intel_pasid_max_id;
>  
> - /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> - ret = intel_pasid_alloc_id(svm,
> -!!cap_caching_mode(iommu->cap),
> -pasid_max - 1, GFP_KERNEL);
> - if (ret < 0) {
> + /* Do not use PASID 0, reserved for RID to PASID */
> + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> + pasid_max - 1, svm);
pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
is exclusive right? If you fixed an issue, you can mention it in the
commit message.
> + if (svm->pasid == INVALID_IOASID) {
>   kfree(svm);>kfree(sdev);
> + ret = ENOSPC;
-ENOSPC.
Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM
>   goto out;
>   }
> - svm->pasid = ret;
>   svm->notifier.ops = &intel_mmuops;
>   svm->mm = mm;
>   svm->flags = flags;
> @@ -337,7 +337,7 @@ int intel_svm

Re: [PATCH 5/7] iommu/arm-smmu-v3: Allow building as a module

2019-11-08 Thread Will Deacon
Hi Joerg,

On Tue, Nov 05, 2019 at 01:15:08PM +0100, Joerg Roedel wrote:
> On Thu, Oct 31, 2019 at 03:42:47PM +, Will Deacon wrote:
> > Generally, I think unloading the IOMMU driver module while there are
> > active users is a pretty bad idea, much like unbinding the driver via
> > /sys in the same situation would also be fairly daft. However, I *think*
> > the code in __device_release_driver() tries to deal with this by
> > iterating over the active consumers and ->remove()ing them first.
> > 
> > I'm without hardware access at the moment, so I haven't been able to
> > test this myself. We could nobble the module_exit() hook, but there's
> > still the "force unload" option depending on the .config.
> 
> Okay, but besides the force-unload case, can we prevent accidential
> unloading by taking a reference to the module in add_device() and release
> it in remove_device()?

That's probably a sensible starting point, yes. In conjunction with the
patch from Jean-Philippe to introduce bus_unset_iommu(), we might have
a fighting chance of getting this to work.

I'll spin a v2.

Thanks!

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

2019-11-08 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/Kconfig   |  1 +
>  drivers/iommu/intel-iommu.c | 67 
> +
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 70 insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index fd50ddbf..961fe5795a90 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
>   bool "Support for Shared Virtual Memory with Intel IOMMU"
>   depends on INTEL_IOMMU && X86
>   select PCI_PASID
> + select IOASID
>   select MMU_NOTIFIER
>   help
> Shared Virtual Memory (SVM) provides a facility for devices
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3f974919d3bd..ced1d89ef977 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   if (ecap_prs(iommu->ecap))
>   intel_svm_finish_prq(iommu);
>   }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
>  #endif
>  }
>  
> @@ -4910,6 +4913,44 @@ static int __init probe_acpi_namespace_devices(void)
>   return 0;
>  }
>  
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + /*
> +  * VT-d virtual command interface always uses the full 20 bit
> +  * PASID range. Host can partition guest PASID range based on
> +  * policies but it is out of guest's control.
> +  */
> + if (min < PASID_MIN || max > intel_pasid_max_id)> + return 
> INVALID_IOASID;

> +
> + if (vcmd_alloc_pasid(iommu, &ioasid))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> +  * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +  * We can only free the PASID when all the devices are unbond.
> +  */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
>  int __init intel_iommu_init(void)
>  {
>   int ret = -ENODEV;
> @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
>  "%s", iommu->name);
>   iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
>   iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> + pr_info("Register custom PASID allocator\n");
> + /*
> +  * Register a custom ASID allocator if we are running
> +  * in a guest, the purpose is to have a system wide 
> PASID
> +  * namespace among all PASID users.
> +  * There can be multiple vIOMMUs in each guest but only
> +  * one allocator is active. All vIOMMU allocators will
> +  * eventually be calling the same host allocator.
> +  */
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + ret = 
> ioasid_register_allocator(&iommu->pasid_allocator);
> + if (ret) {
> + pr_warn("Custom PASID allocator registeration 
> failed\n");
nit: registration
> + /*
> +  * Disable scalable mode on this IOMMU if there
> +  * is no custom allocator. Mixing SM capable 
> vIOMMU
> +  * and non-SM vIOMMU are not supported: 
nit; is not supported. But I guess you will reshape it according to
previous comments.
> +  */
> + intel_iommu_sm = 0;
> + }
> + }
> +#endif
>   }
>  
>   bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linu

Re: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

2019-11-08 Thread Auger Eric
Hi Jacob,
On 10/24/19 9:54 PM, Jacob Pan wrote:
> From: Lu Baolu 
> 
> Enabling IOMMU in a guest requires communication with the host
> driver for certain aspects. Use of PASID ID to enable Shared Virtual
> Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
> provides a Virtual Command Register (VCMD) to facilitate this.
> Writes to this register in the guest are trapped by QEMU which
> proxies the call to the host driver.
> 
> This virtual command interface consists of a capability register,
> a virtual command register, and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
> 
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command interface.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> ---
>  drivers/iommu/intel-pasid.c | 56 
> +
>  drivers/iommu/intel-pasid.h | 13 ++-
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 040a445be300..d81e857d2b25 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
>   return p;
>  }
>  
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + int ret = 0;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + *pasid = VCMD_VRSP_RESULT(res);
> + break;
> + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> + pr_info("IOMMU: %s: No PASID available\n", iommu->name);
> + ret = -ENOMEM;
> + break;
> + default:
> + ret = -ENODEV;
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +
> + return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + break;
> + case VCMD_VRSP_SC_INVALID_PASID:
> + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> + break;
> + default:
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +}
> +
>  /*
>   * Per device pasid table management:
>   */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index fc8cd8f17de1..e413e884e685 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -23,6 +23,16 @@
>  #define is_pasid_enabled(entry)  (((entry)->lo >> 3) & 0x1)
>  #define get_pasid_dir_size(entry)(1 << entry)->lo >> 9) & 0x7) + 7))
>  
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC   0x1
> +#define VCMD_CMD_FREE0x2
> +#define VCMD_VRSP_IP 0x1
> +#define VCMD_VRSP_SC(e)  (((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS 0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL  1
> +#define VCMD_VRSP_SC_INVALID_PASID   1
> +#define VCMD_VRSP_RESULT(e)  (((e) >> 8) & 0xf)
nit: pasid is 20b but result field is 56b large
Just in case a new command were to be added later on.
> +
>  /*
>   * Domain ID reserved for pasid entries programmed for first-level
>   * only and pass-through transfer modes.
> @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>  struct device *dev, int pasid);
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>struct device *dev, int pasid);
> -
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index

Re: [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register

2019-11-08 Thread Auger Eric
Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/dmar.c| 1 +
>  include/linux/intel-iommu.h | 4 
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..49bb7d76e646 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -950,6 +950,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 
> phys_addr)
>   warn_invalid_dmar(phys_addr, " returns all ones");
>   goto unmap;
>   }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>  
>   /* the registers might be more than one page */
>   map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ed11ef594378..2e1bed9b7eef 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -186,6 +186,9 @@
>  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
>  #define ecap_sc_support(e)   ((e >> 7) & 0x1) /* Snooping Control */
>  
> +/* Virtual command interface capabilities */
> +#define vccap_pasid(v)   ((v & DMA_VCS_PAS)) /* PASID allocation 
> */
> +
>  /* IOTLB_REG */
>  #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>  #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -520,6 +523,7 @@ struct intel_iommu {
>   u64 reg_size; /* size of hw register set */
>   u64 cap;
>   u64 ecap;
> + u64 vccap;
>   u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
>   raw_spinlock_t  register_lock; /* protect register handling */
>   int seq_id; /* sequence id of the iommu */
> 

with DMA_VCS_PAS's move in this patch as pointed out by Kevin or
vccap_pasid() move to patch 3, feel free to add

Reviewed-by: Eric Auger 

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread David Woodhouse
On Fri, 2019-11-08 at 08:47 +, Zeng, Jason wrote:
> > -Original Message-
> > From: David Woodhouse 
> > Sent: Friday, November 8, 2019 3:54 PM
> > To: Deepa Dinamani ; j...@8bytes.org; linux-
> > ker...@vger.kernel.org
> > Cc: iommu@lists.linux-foundation.org; Zeng, Jason ;
> > Tian, Kevin 
> > Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown
> > 
> > On Thu, 2019-11-07 at 12:59 -0800, Deepa Dinamani wrote:
> > > The intel-iommu driver assumes that the iommu state is
> > > cleaned up at the start of the new kernel.
> > > But, when we try to kexec boot something other than the
> > > Linux kernel, the cleanup cannot be relied upon.
> > > Hence, cleanup before we go down for reboot.
> > > 
> > > Keeping the cleanup at initialization also, in case BIOS
> > > leaves the IOMMU enabled.
> > > 
> > > I considered turning off iommu only during kexec reboot,
> > > but a clean shutdown seems always a good idea. But if
> > > someone wants to make it conditional, we can do that.
> > 
> > This is going to break things for the VMM live update scheme that Jason
> > presented at KVM Forum, isn't it?
> > 
> > In that case we rely on the IOMMU still operating during the
> > transition.
> 
> For VMM live update case, we should be able to detect and bypass
> the shutdown that Deepa introduced here, so keep IOMMU still operating?

Is that a 'yes' to Deepa's "if someone wants to make it conditional, we
can do that" ? 



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu