Re: [PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited

2019-11-27 Thread h...@lst.de
On Wed, Nov 27, 2019 at 06:22:57PM +, Thomas Hellstrom wrote:
> >  bool dma_addressing_limited(struct device *dev)
> >  {
> > +   if (force_dma_unencrypted(dev))
> > +   return true;
> > return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
> > dma_get_required_mask(dev);
> >  }
> 
> Any chance to have the case
> 
> (swiotlb_force == SWIOTLB_FORCE)
> 
> also included?

We have a hard time handling that in generic code.  Do we have any
good use case for SWIOTLB_FORCE not that we have force_dma_unencrypted?
I'd love to be able to get rid of it..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-11-27 Thread Christoph Hellwig
On Wed, Nov 27, 2019 at 02:11:28PM -0800, David Rientjes wrote:
> So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic 
> even when the DMA needs to be unencrypted for SEV.  Christoph's suggestion 
> was to wire up dmapool in kernel/dma/remap.c for this.  Is that necessary 
> to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or 
> can we do it within the DMA API itself so it's transparent to the driver?

It needs to be transparent to the driver.  Lots of drivers use GFP_ATOMIC
dma allocations, and all of them are broken on SEV setups currently.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/8] iommu/vt-d: Implement second level page table ops

2019-11-27 Thread Lu Baolu
This adds the implementation of page table callbacks for
the second level page table.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 81 +
 1 file changed, 81 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7752ff299cb5..96ead4e3395a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -413,6 +413,7 @@ int for_each_device_domain(int (*fn)(struct 
device_domain_info *info,
 }
 
 const struct iommu_ops intel_iommu_ops;
+static const struct pgtable_ops second_lvl_pgtable_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
 {
@@ -1720,6 +1721,7 @@ static struct dmar_domain *alloc_domain(int flags)
domain->nid = NUMA_NO_NODE;
domain->flags = flags;
domain->has_iotlb_device = false;
+   domain->ops = _lvl_pgtable_ops;
INIT_LIST_HEAD(>devices);
 
return domain;
@@ -2334,6 +2336,85 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
return 0;
 }
 
+static int second_lvl_domain_map_range(struct dmar_domain *domain,
+  unsigned long iova, phys_addr_t paddr,
+  size_t size, int prot)
+{
+   return __domain_mapping(domain, iova >> VTD_PAGE_SHIFT, NULL,
+   paddr >> VTD_PAGE_SHIFT,
+   aligned_nrpages(paddr, size), prot);
+}
+
+static struct page *
+second_lvl_domain_unmap_range(struct dmar_domain *domain,
+ unsigned long iova, size_t size)
+{
+   unsigned long start_pfn, end_pfn, nrpages;
+
+   start_pfn = mm_to_dma_pfn(IOVA_PFN(iova));
+   nrpages = aligned_nrpages(iova, size);
+   end_pfn = start_pfn + nrpages - 1;
+
+   return dma_pte_clear_level(domain, agaw_to_level(domain->agaw),
+  domain->pgd, 0, start_pfn, end_pfn, NULL);
+}
+
+static phys_addr_t
+second_lvl_domain_iova_to_phys(struct dmar_domain *domain,
+  unsigned long iova)
+{
+   struct dma_pte *pte;
+   int level = 0;
+   u64 phys = 0;
+
+   pte = pfn_to_dma_pte(domain, iova >> VTD_PAGE_SHIFT, );
+   if (pte)
+   phys = dma_pte_addr(pte);
+
+   return phys;
+}
+
+static void
+second_lvl_domain_flush_tlb_range(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ unsigned long addr, size_t size,
+ bool ih)
+{
+   unsigned long pages = aligned_nrpages(addr, size);
+   u16 did = domain->iommu_did[iommu->seq_id];
+   unsigned int mask;
+
+   if (pages) {
+   mask = ilog2(__roundup_pow_of_two(pages));
+   addr &= (u64)-1 << (VTD_PAGE_SHIFT + mask);
+   } else {
+   mask = MAX_AGAW_PFN_WIDTH;
+   addr = 0;
+   }
+
+   /*
+* Fallback to domain selective flush if no PSI support or the size is
+* too big.
+* PSI requires page size to be 2 ^ x, and the base address is naturally
+* aligned to the size
+*/
+   if (!pages || !cap_pgsel_inv(iommu->cap) ||
+   mask > cap_max_amask_val(iommu->cap))
+   iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+   else
+   iommu->flush.iotlb_inv(iommu, did, addr | ((int)ih << 6),
+  mask, DMA_TLB_PSI_FLUSH);
+
+   iommu_flush_dev_iotlb(domain, addr, mask);
+}
+
+static const struct pgtable_ops second_lvl_pgtable_ops = {
+   .map_range  = second_lvl_domain_map_range,
+   .unmap_range= second_lvl_domain_unmap_range,
+   .iova_to_phys   = second_lvl_domain_iova_to_phys,
+   .flush_tlb_range= second_lvl_domain_flush_tlb_range,
+};
+
 static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
  struct scatterlist *sg, unsigned long phys_pfn,
  unsigned long nr_pages, int prot)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] iommu/vt-d: Implement first level page table ops

2019-11-27 Thread Lu Baolu
This adds the implementation of page table callbacks for
the first level page table.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 56 +
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a314892ee72b..695a7a5fbe8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -414,6 +414,7 @@ int for_each_device_domain(int (*fn)(struct 
device_domain_info *info,
 }
 
 const struct iommu_ops intel_iommu_ops;
+static const struct pgtable_ops first_lvl_pgtable_ops;
 static const struct pgtable_ops second_lvl_pgtable_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
@@ -2330,6 +2331,61 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
return 0;
 }
 
+static int first_lvl_domain_map_range(struct dmar_domain *domain,
+ unsigned long iova, phys_addr_t paddr,
+ size_t size, int prot)
+{
+   return first_lvl_map_range(domain, PAGE_ALIGN(iova),
+  round_up(iova + size, PAGE_SIZE),
+  PAGE_ALIGN(paddr), prot);
+}
+
+static struct page *
+first_lvl_domain_unmap_range(struct dmar_domain *domain,
+unsigned long iova, size_t size)
+{
+   return first_lvl_unmap_range(domain, PAGE_ALIGN(iova),
+round_up(iova + size, PAGE_SIZE));
+}
+
+static phys_addr_t
+first_lvl_domain_iova_to_phys(struct dmar_domain *domain,
+ unsigned long iova)
+{
+   return first_lvl_iova_to_phys(domain, iova);
+}
+
+static void
+first_lvl_domain_flush_tlb_range(struct dmar_domain *domain,
+struct intel_iommu *iommu,
+unsigned long iova, size_t size, bool ih)
+{
+   unsigned long pages = aligned_nrpages(iova, size);
+   u16 did = domain->iommu_did[iommu->seq_id];
+   unsigned int mask;
+
+   if (pages) {
+   mask = ilog2(__roundup_pow_of_two(pages));
+   iova &= (u64)-1 << (VTD_PAGE_SHIFT + mask);
+   } else {
+   mask = MAX_AGAW_PFN_WIDTH;
+   iova = 0;
+   pages = -1;
+   }
+
+   iommu->flush.p_iotlb_inv(iommu, did, domain->default_pasid,
+iova, pages, ih);
+
+   iommu_flush_dev_iotlb(domain, iova, mask);
+}
+
+static const struct pgtable_ops first_lvl_pgtable_ops = {
+   .map_range  = first_lvl_domain_map_range,
+   .unmap_range= first_lvl_domain_unmap_range,
+   .iova_to_phys   = first_lvl_domain_iova_to_phys,
+   .flush_tlb_range= first_lvl_domain_flush_tlb_range,
+};
+
 static int second_lvl_domain_map_range(struct dmar_domain *domain,
   unsigned long iova, phys_addr_t paddr,
   size_t size, int prot)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 8/8] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr

2019-11-27 Thread Lu Baolu
This adds the Intel VT-d specific callback of setting
DOMAIN_ATTR_NESTING domain attribution. It is necessary
to let the VT-d driver know that the domain represents
a virutual machine which requires the IOMMU hardware to
support nested translation mode. Return success if the
IOMMU hardware suports nested mode, otherwise failure.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 56 +
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68b2f98ecd65..ee717dcb9644 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -308,6 +308,12 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_LOSE_CHILDREN  BIT(1)
 
+/*
+ * Domain represents a virtual machine which demands iommu nested
+ * translation mode support.
+ */
+#define DOMAIN_FLAG_NESTING_MODE   BIT(2)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -5929,6 +5935,24 @@ static inline bool iommu_pasid_support(void)
return ret;
 }
 
+static inline bool nested_mode_support(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = true;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+   ret = false;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static bool intel_iommu_capable(enum iommu_cap cap)
 {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -6305,10 +6329,42 @@ static bool intel_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
 }
 
+static int
+intel_iommu_domain_set_attr(struct iommu_domain *domain,
+   enum iommu_attr attr, void *data)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   unsigned long flags;
+   int ret = 0;
+
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   spin_lock_irqsave(_domain_lock, flags);
+   if (nested_mode_support() &&
+   list_empty(_domain->devices)) {
+   dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE;
+   dmar_domain->ops = _lvl_pgtable_ops;
+   } else {
+   ret = -ENODEV;
+   }
+   spin_unlock_irqrestore(_domain_lock, flags);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
.domain_free= intel_iommu_domain_free,
+   .domain_set_attr= intel_iommu_domain_set_attr,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/8] iommu/vt-d: Identify domains using first level page table

2019-11-27 Thread Lu Baolu
This checks whether a domain should use first level page table
for map/unmap. And if so, we should attach the domain to the
device in first level translation mode.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Cc: Sanjay Kumar 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 63 +++--
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 695a7a5fbe8e..68b2f98ecd65 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -417,6 +417,11 @@ const struct iommu_ops intel_iommu_ops;
 static const struct pgtable_ops first_lvl_pgtable_ops;
 static const struct pgtable_ops second_lvl_pgtable_ops;
 
+static inline bool domain_pgtable_first_lvl(struct dmar_domain *domain)
+{
+   return domain->ops == _lvl_pgtable_ops;
+}
+
 static bool translation_pre_enabled(struct intel_iommu *iommu)
 {
return (iommu->flags & VTD_FLAG_TRANS_PRE_ENABLED);
@@ -1702,6 +1707,44 @@ static bool first_lvl_5lp_support(void)
return first_level_5lp_supported;
 }
 
+/*
+ * Check and return whether first level is used by default for
+ * DMA translation.
+ */
+static bool first_level_by_default(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   static int first_level_support = -1;
+
+   if (likely(first_level_support != -1))
+   return first_level_support;
+
+   first_level_support = 1;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) {
+   first_level_support = 0;
+   break;
+   }
+#ifdef CONFIG_X86
+   /*
+* Currently we don't support paging mode missmatching.
+* Could be turned on later if there is a case.
+*/
+   if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+   !cap_5lp_support(iommu->cap)) {
+   first_level_support = 0;
+   break;
+   }
+#endif /* #ifdef CONFIG_X86 */
+   }
+   rcu_read_unlock();
+
+   return first_level_support;
+}
+
 static struct dmar_domain *alloc_domain(int flags)
 {
struct dmar_domain *domain;
@@ -1714,7 +1757,10 @@ static struct dmar_domain *alloc_domain(int flags)
domain->nid = NUMA_NO_NODE;
domain->flags = flags;
domain->has_iotlb_device = false;
-   domain->ops = _lvl_pgtable_ops;
+   if (first_level_by_default())
+   domain->ops = _lvl_pgtable_ops;
+   else
+   domain->ops = _lvl_pgtable_ops;
domain->first_lvl_5lp = first_lvl_5lp_support();
spin_lock_init(>page_table_lock);
INIT_LIST_HEAD(>devices);
@@ -2710,6 +2756,11 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain,
dev, PASID_RID2PASID);
+   else if (domain_pgtable_first_lvl(domain))
+   ret = intel_pasid_setup_first_level(iommu, dev,
+   domain->pgd, PASID_RID2PASID,
+   domain->iommu_did[iommu->seq_id],
+   PASID_FLAG_SUPERVISOR_MODE);
else
ret = intel_pasid_setup_second_level(iommu, domain,
dev, PASID_RID2PASID);
@@ -5597,8 +5648,14 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
goto attach_failed;
 
/* Setup the PASID entry for mediated devices: */
-   ret = intel_pasid_setup_second_level(iommu, domain, dev,
-domain->default_pasid);
+   if (domain_pgtable_first_lvl(domain))
+   ret = intel_pasid_setup_first_level(iommu, dev,
+   domain->pgd, domain->default_pasid,
+   domain->iommu_did[iommu->seq_id],
+   PASID_FLAG_SUPERVISOR_MODE);
+   else
+   ret = intel_pasid_setup_second_level(iommu, domain, dev,
+domain->default_pasid);
if (ret)
goto table_failed;
spin_unlock(>lock);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/8] iommu/vt-d: Add first level page table interfaces

2019-11-27 Thread Lu Baolu
This adds functions to manipulate first level page tables
which could be used by a scalale mode capable IOMMU unit.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/intel-iommu.c|  33 +++
 drivers/iommu/intel-pgtable.c  | 376 +
 include/linux/intel-iommu.h|  33 ++-
 include/trace/events/intel_iommu.h |  60 +
 5 files changed, 502 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/intel-pgtable.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 35d17094fe3b..aa04f4c3ae26 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -18,7 +18,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
-obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
 obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 66f76f6df2c2..a314892ee72b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1670,6 +1670,37 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 #endif
 }
 
+/* First level 5-level paging support */
+static bool first_lvl_5lp_support(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   static int first_level_5lp_supported = -1;
+
+   if (likely(first_level_5lp_supported != -1))
+   return first_level_5lp_supported;
+
+   first_level_5lp_supported = 1;
+#ifdef CONFIG_X86
+   /* Match IOMMU first level and CPU paging mode */
+   if (!cpu_feature_enabled(X86_FEATURE_LA57)) {
+   first_level_5lp_supported = 0;
+   return first_level_5lp_supported;
+   }
+#endif /* #ifdef CONFIG_X86 */
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!cap_5lp_support(iommu->cap)) {
+   first_level_5lp_supported = 0;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return first_level_5lp_supported;
+}
+
 static struct dmar_domain *alloc_domain(int flags)
 {
struct dmar_domain *domain;
@@ -1683,6 +1714,8 @@ static struct dmar_domain *alloc_domain(int flags)
domain->flags = flags;
domain->has_iotlb_device = false;
domain->ops = _lvl_pgtable_ops;
+   domain->first_lvl_5lp = first_lvl_5lp_support();
+   spin_lock_init(>page_table_lock);
INIT_LIST_HEAD(>devices);
 
return domain;
diff --git a/drivers/iommu/intel-pgtable.c b/drivers/iommu/intel-pgtable.c
new file mode 100644
index ..4a26d08a7570
--- /dev/null
+++ b/drivers/iommu/intel-pgtable.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * intel-pgtable.c - Intel IOMMU page table manipulation library
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+#define pr_fmt(fmt) "DMAR: " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * first_lvl_map: Map a range of IO virtual address to physical addresses.
+ */
+#ifdef CONFIG_X86
+#define pgtable_populate(domain, nm)   \
+do {   \
+   void *__new = alloc_pgtable_page(domain->nid);  \
+   if (!__new) \
+   return -ENOMEM; \
+   smp_wmb();  \
+   spin_lock(&(domain)->page_table_lock);  \
+   if (nm ## _present(*nm)) {  \
+   free_pgtable_page(__new);   \
+   } else {\
+   set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\
+   domain_flush_cache(domain, nm, sizeof(nm##_t)); \
+   }   \
+   spin_unlock(&(domain)->page_table_lock);\
+} while (0)
+
+static int
+first_lvl_map_pte_range(struct dmar_domain *domain, pmd_t *pmd,
+   unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
+{
+   pte_t *pte, *first_pte;
+   u64 pfn;
+
+   pfn = phys_addr >> PAGE_SHIFT;
+   if (unlikely(pmd_none(*pmd)))
+   pgtable_populate(domain, pmd);
+
+   first_pte = pte = pte_offset_kernel(pmd, addr);
+
+   do {
+   

[PATCH v2 0/8] Use 1st-level for DMA remapping

2019-11-27 Thread Lu Baolu
Intel VT-d in scalable mode supports two types of page talbes
for DMA translation: the first level page table and the second
level page table. The first level page table uses the same
format as the CPU page table, while the second level page table
keeps compatible with previous formats. The software is able
to choose any one of them for DMA remapping according to the use
case.

This patchset aims to move IOVA (I/O Virtual Address) translation
to 1st-level page table in scalable mode. This will simplify vIOMMU
(IOMMU simulated by VM hypervisor) design by using the two-stage
translation, a.k.a. nested mode translation.

As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
support is now implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mappings
and write them to a shadowed page table, which will be used by the
physical IOMMU. Each time when mappings are created or destroyed in
vIOMMU, the simulation software has to intervene. Hence, the changes
on GIOVA->GPA could be shadowed to host.


 .---.
 |  vIOMMU   |
 |---| ..
 |   |IOTLB flush trap |QEMU|
 .---. (map/unmap) ||
 |GIOVA->GPA |>|..  |
 '---' || GIOVA->HPA |  |
 |   | |''  |
 '---' ||
   ||
   ''
|
<
|
v VFIO/IOMMU API
  .---.
  |  pIOMMU   |
  |---|
  |   |
  .---.
  |GIOVA->HPA |
  '---'
  |   |
  '---'

In VT-d 3.0, scalable mode is introduced, which offers two-level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual-stage) translation in host. Compared with current shadow GIOVA
support, the new approach makes the vIOMMU design simpler and more
efficient as we only need to flush the pIOMMU IOTLB and possible
device-IOTLB when an IOVA mapping in vIOMMU is torn down.

 .---.
 |  vIOMMU   |
 |---| .---.
 |   |IOTLB flush trap |   QEMU|
 .---.(unmap)  |---|
 |GIOVA->GPA |>|   |
 '---' '---'
 |   |   |
 '---'   |
   <--
   |  VFIO/IOMMU  
   |  cache invalidation and  
   | guest gpd bind interfaces
   v
 .---.
 |  pIOMMU   |
 |---|
 .---.
 |GIOVA->GPA |<---First level
 '---'
 | GPA->HPA  |<---Scond level
 '---'
 '---'

This patch set includes two parts. The former part implements the
per-domain page table abstraction, which makes the page table
difference transparent to various map/unmap APIs. The later part
applies the first level page table for IOVA translation unless the
DOMAIN_ATTR_NESTING domain attribution has been set, which indicates
nested mode in use.

Based-on-idea-by: Ashok Raj 
Based-on-idea-by: Kevin Tian 
Based-on-idea-by: Liu Yi L 
Based-on-idea-by: Jacob Pan 
Based-on-idea-by: Sanjay Kumar 
Based-on-idea-by: Lu Baolu 

Change log:

 v1->v2
 - The first series was posted here
   https://lkml.org/lkml/2019/9/23/297
 - Use per domain page table ops to handle different page tables.
 - Use first level for DMA remapping by default on both bare metal
   and vm guest.
 - Code refine according to code review comments for v1.

Lu Baolu (8):
  iommu/vt-d: Add per domain page table ops
  iommu/vt-d: Move domain_flush_cache helper into header
  iommu/vt-d: Implement second level page table ops
  iommu/vt-d: Apply per domain second level page table ops
  iommu/vt-d: Add first level page table interfaces
  iommu/vt-d: Implement first level page table ops
  iommu/vt-d: Identify domains using first level page table
  iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr

 drivers/iommu/Makefile |   2 +-
 drivers/iommu/intel-iommu.c| 412 +++--
 drivers/iommu/intel-pgtable.c  | 376 ++
 include/linux/intel-iommu.h|  64 -
 include/trace/events/intel_iommu.h |  60 +
 5 files changed, 837 insertions(+), 77 deletions(-)
 create mode 100644 

[PATCH v2 2/8] iommu/vt-d: Move domain_flush_cache helper into header

2019-11-27 Thread Lu Baolu
So that it could be used in other source files as well.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 7 ---
 include/linux/intel-iommu.h | 7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 677e82a828f0..7752ff299cb5 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -833,13 +833,6 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
return iommu;
 }
 
-static void domain_flush_cache(struct dmar_domain *domain,
-  void *addr, int size)
-{
-   if (!domain->iommu_coherency)
-   clflush_cache_range(addr, size);
-}
-
 static int device_context_mapped(struct intel_iommu *iommu, u8 bus, u8 devfn)
 {
struct context_entry *context;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index e8bfe7466ebb..9b259756057b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -681,6 +681,13 @@ static inline int first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte & ~VTD_PAGE_MASK);
 }
 
+static inline void
+domain_flush_cache(struct dmar_domain *domain, void *addr, int size)
+{
+   if (!domain->iommu_coherency)
+   clflush_cache_range(addr, size);
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
 extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/8] iommu/vt-d: Apply per domain second level page table ops

2019-11-27 Thread Lu Baolu
This applies per domain page table ops to various domain
mapping and unmapping interfaces.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 118 
 1 file changed, 52 insertions(+), 66 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 96ead4e3395a..66f76f6df2c2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -80,6 +80,7 @@
 #define IOVA_START_PFN (1)
 
 #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT)
+#define PFN_ADDR(pfn)  ((pfn) << PAGE_SHIFT)
 
 /* page table handling */
 #define LEVEL_STRIDE   (9)
@@ -1153,8 +1154,8 @@ static struct page *domain_unmap(struct dmar_domain 
*domain,
BUG_ON(start_pfn > last_pfn);
 
/* we don't need lock here; nobody else touches the iova range */
-   freelist = dma_pte_clear_level(domain, agaw_to_level(domain->agaw),
-  domain->pgd, 0, start_pfn, last_pfn, 
NULL);
+   freelist = domain->ops->unmap_range(domain, PFN_ADDR(start_pfn),
+   PFN_ADDR(last_pfn - start_pfn + 1));
 
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
@@ -1484,39 +1485,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
spin_unlock_irqrestore(_domain_lock, flags);
 }
 
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
- struct dmar_domain *domain,
- unsigned long pfn, unsigned int pages,
- int ih, int map)
-{
-   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
-   uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
-   u16 did = domain->iommu_did[iommu->seq_id];
-
-   BUG_ON(pages == 0);
-
-   if (ih)
-   ih = 1 << 6;
-   /*
-* Fallback to domain selective flush if no PSI support or the size is
-* too big.
-* PSI requires page size to be 2 ^ x, and the base address is naturally
-* aligned to the size
-*/
-   if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
-   iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
-   else
-   iommu->flush.iotlb_inv(iommu, did, addr | ih,
-  mask, DMA_TLB_PSI_FLUSH);
-
-   /*
-* In caching mode, changes of pages from non-present to present require
-* flush. However, device IOTLB doesn't need to be flushed in this case.
-*/
-   if (!cap_caching_mode(iommu->cap) || !map)
-   iommu_flush_dev_iotlb(domain, addr, mask);
-}
-
 /* Notification for newly created mappings */
 static inline void __mapping_notify_one(struct intel_iommu *iommu,
struct dmar_domain *domain,
@@ -1524,7 +1492,8 @@ static inline void __mapping_notify_one(struct 
intel_iommu *iommu,
 {
/* It's a non-present to present mapping. Only flush if caching mode */
if (cap_caching_mode(iommu->cap))
-   iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
+   domain->ops->flush_tlb_range(domain, iommu, PFN_ADDR(pfn),
+PFN_ADDR(pages), 0);
else
iommu_flush_write_buffer(iommu);
 }
@@ -1536,16 +1505,8 @@ static void iommu_flush_iova(struct iova_domain *iovad)
 
domain = container_of(iovad, struct dmar_domain, iovad);
 
-   for_each_domain_iommu(idx, domain) {
-   struct intel_iommu *iommu = g_iommus[idx];
-   u16 did = domain->iommu_did[iommu->seq_id];
-
-   iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
-
-   if (!cap_caching_mode(iommu->cap))
-   iommu_flush_dev_iotlb(get_iommu_domain(iommu, did),
- 0, MAX_AGAW_PFN_WIDTH);
-   }
+   for_each_domain_iommu(idx, domain)
+   domain->ops->flush_tlb_range(domain, g_iommus[idx], 0, 0, 0);
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -2419,13 +2380,43 @@ static int domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
  struct scatterlist *sg, unsigned long phys_pfn,
  unsigned long nr_pages, int prot)
 {
-   int iommu_id, ret;
struct intel_iommu *iommu;
+   int iommu_id, ret;
 
/* Do the real mapping first */
-   ret = __domain_mapping(domain, iov_pfn, sg, phys_pfn, nr_pages, prot);
-   if (ret)
-   return ret;
+   if (!sg) {
+   ret = domain->ops->map_range(domain, PFN_ADDR(iov_pfn),
+PFN_ADDR(phys_pfn),
+PFN_ADDR(nr_pages),
+   

[PATCH v2 1/8] iommu/vt-d: Add per domain page table ops

2019-11-27 Thread Lu Baolu
The Intel VT-d in scalable mode supports two types of
page talbes for DMA translation: the first level page
table and the second level page table. The IOMMU driver
is able to choose one of them for DMA remapping according
to the use case. The first level page table uses the same
format as the CPU page table, while the second level page
table keeps compatible with previous formats.

This abstracts the page tables used in Intel IOMMU driver
by defining a per domain page table ops structure which
contains callbacks for various page table operations.

Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 326146a36dbf..e8bfe7466ebb 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -499,6 +499,28 @@ struct context_entry {
u64 hi;
 };
 
+struct dmar_domain;
+
+/*
+ * struct pgtable_ops - page table ops
+ * @map_range: map a physically contiguous memory region to iova
+ * @unmap_range: unmap a physically contiguous memory region
+ * @iova_to_phys: return the physical address mapped to @iova
+ * @flush_tlb_range: flush the tlb caches as the result of map or unmap
+ */
+struct pgtable_ops {
+   int (*map_range)(struct dmar_domain *domain,
+unsigned long iova, phys_addr_t paddr,
+size_t size, int prot);
+   struct page *(*unmap_range)(struct dmar_domain *domain,
+   unsigned long iova, size_t size);
+   phys_addr_t (*iova_to_phys)(struct dmar_domain *domain,
+   unsigned long iova);
+   void (*flush_tlb_range)(struct dmar_domain *domain,
+   struct intel_iommu *iommu,
+   unsigned long iova, size_t size, bool ih);
+};
+
 struct dmar_domain {
int nid;/* node id */
 
@@ -517,8 +539,10 @@ struct dmar_domain {
struct list_head auxd;  /* link to device's auxiliary list */
struct iova_domain iovad;   /* iova's that belong to this domain */
 
+   /* page table used by this domain */
struct dma_pte  *pgd;   /* virtual address */
int gaw;/* max guest address width */
+   const struct pgtable_ops *ops;  /* page table ops */
 
/* adjusted guest address width, 0 is level 2 30-bit */
int agaw;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


How to read BIOS'es memory from IOMMU

2019-11-27 Thread Aaron Gray
Hi, I am interested in writing a program to verify BIOS integrity via
SHA512. Most modern BIOS'es seem to be accessible through IOMMU. I'm
wondering if there is a device driver for IOMMU or whether one could
be created for the purposes of accessing BIOS memory.

If anyone has any leads or advice or even can show me how to do this
in code via example/pseudo code I would appreciate it.

Regards,
Aaron

-- 
Aaron Gray

Independent Open Source Software Engineer, Computer Language
Researcher, Information Theorist, and amateur computer scientist.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-11-27 Thread David Rientjes via iommu
On Wed, 18 Sep 2019, Christoph Hellwig wrote:

> On Tue, Sep 17, 2019 at 06:41:02PM +, Lendacky, Thomas wrote:
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev 
> > > *dev)
> > >   dev->admin_tagset.timeout = ADMIN_TIMEOUT;
> > >   dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> > >   dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
> > > - dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
> > > + dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
> > > +   BLK_MQ_F_BLOCKING;
> > 
> > I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required
> > to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called
> > from a module. Is there a DMA API that could be called to get that info?
> 
> The DMA API must support non-blocking calls, and various drivers rely
> on that.  So we need to provide that even for the SEV case.  If the
> actual blocking can't be made to work we'll need to wire up the DMA
> pool in kernel/dma/remap.c for it (and probably move it to separate
> file).
> 

Resurrecting this thread from a couple months ago because it appears that 
this is still an issue with 5.4 guests.

dma_pool_alloc(), regardless of whether mem_flags allows blocking or not, 
can always sleep if the device's DMA must be unencrypted and 
mem_encrypt_active() == true.  We know this because vm_unmap_aliases() can 
always block.

NVMe's setup of PRPs and SGLs uses dma_pool_alloc(GFP_ATOMIC) but when 
this is a SEV-enabled guest this allocation may block due to the 
possibility of allocating DMA coherent memory through dma_direct_alloc().

It seems like one solution would be to add significant latency by doing 
BLK_MQ_F_BLOCKING if force_dma_unencrypted() is true for the device but 
this comes with significant downsides.

So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic 
even when the DMA needs to be unencrypted for SEV.  Christoph's suggestion 
was to wire up dmapool in kernel/dma/remap.c for this.  Is that necessary 
to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or 
can we do it within the DMA API itself so it's transparent to the driver?

Thomas/Brijesh: separately, it seems the use of set_memory_encrypted() or 
set_memory_decrypted() must be possible without blocking; is this only an 
issue from the DMA API point of view or can it be done elsewhere?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited

2019-11-27 Thread Thomas Hellstrom via iommu
Hi,

On Wed, 2019-11-27 at 15:40 +0100, Christoph Hellwig wrote:
> Devices that are forced to DMA through unencrypted bounce buffers
> need to be treated as if they are addressing limited.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/mapping.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 1dbe6d725962..f6c35b53d996 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -416,6 +416,8 @@ EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
>   */
>  bool dma_addressing_limited(struct device *dev)
>  {
> + if (force_dma_unencrypted(dev))
> + return true;
>   return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
>   dma_get_required_mask(dev);
>  }

Any chance to have the case

(swiotlb_force == SWIOTLB_FORCE)

also included?

Otherwise for the series

Reviewed-by: Thomas Hellström 

 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne
On Wed, 2019-11-27 at 19:06 +, Robin Murphy wrote:
> On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:
> > > On 26/11/2019 12:51 pm, Leon Romanovsky wrote:
> > > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:
> > > > > Some users need to make sure their rounding function accepts and
> > > > > returns
> > > > > 64bit long variables regardless of the architecture. Sadly
> > > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> > > > > new generic 64bit variant of the function and cleanup rougue custom
> > > > > implementations.
> > > > 
> > > > Is it possible to create general roundup/rounddown_pow_two() which will
> > > > work correctly for any type of variables, instead of creating special
> > > > variant for every type?
> > > 
> > > In fact, that is sort of the case already - roundup_pow_of_two() itself
> > > wraps ilog2() such that the constant case *is* type-independent. And
> > > since ilog2() handles non-constant values anyway, might it be reasonable
> > > to just take the strongly-typed __roundup_pow_of_two() helper out of the
> > > loop as below?
> > > 
> > > Robin
> > > 
> > 
> > That looks way better that's for sure. Some questions.
> > 
> > > ->8-
> > > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > > index 83a4a3ca3e8a..e825f8a6e8b5 100644
> > > --- a/include/linux/log2.h
> > > +++ b/include/linux/log2.h
> > > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> > > */
> > >#define roundup_pow_of_two(n)  \
> > >(  \
> > > - __builtin_constant_p(n) ? ( \
> > > - (n == 1) ? 1 :  \
> > > - (1UL << (ilog2((n) - 1) + 1))   \
> > > -) :  \
> > > - __roundup_pow_of_two(n) \
> > > + (__builtin_constant_p(n) && (n == 1)) ? \
> > > + 1 : (1UL << (ilog2((n) - 1) + 1))   \
> > 
> > Then here you'd have to use ULL instead of UL, right? I want my 64bit value
> > everywhere regardless of the CPU arch. The downside is that would affect
> > performance to some extent (i.e. returning a 64bit value where you used to
> > have
> > a 32bit one)?
> 
> True, although it's possible that 1ULL might result in the same codegen 
> if the compiler can see that the result is immediately truncated back to 
> long anyway. Or at worst, I suppose "(typeof(n))1" could suffice, 
> however ugly. Either way, this diff was only an illustration rather than 
> a concrete proposal, but it might be an interesting diversion to 
> investigate.
> 
> On that note, though, you should probably be using ULL in your current 
> patch too.

I actually meant to, the fix got lost. Thanks for pointing it out.

As I see Leon also likes this, I'll try out this implementation and come back
with some results.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Leon Romanovsky
On Wed, Nov 27, 2019 at 07:06:12PM +, Robin Murphy wrote:
> On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:
> > > On 26/11/2019 12:51 pm, Leon Romanovsky wrote:
> > > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:
> > > > > Some users need to make sure their rounding function accepts and 
> > > > > returns
> > > > > 64bit long variables regardless of the architecture. Sadly
> > > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> > > > > new generic 64bit variant of the function and cleanup rougue custom
> > > > > implementations.
> > > >
> > > > Is it possible to create general roundup/rounddown_pow_two() which will
> > > > work correctly for any type of variables, instead of creating special
> > > > variant for every type?
> > >
> > > In fact, that is sort of the case already - roundup_pow_of_two() itself
> > > wraps ilog2() such that the constant case *is* type-independent. And
> > > since ilog2() handles non-constant values anyway, might it be reasonable
> > > to just take the strongly-typed __roundup_pow_of_two() helper out of the
> > > loop as below?
> > >
> > > Robin
> > >
> >
> > That looks way better that's for sure. Some questions.
> >
> > > ->8-
> > > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > > index 83a4a3ca3e8a..e825f8a6e8b5 100644
> > > --- a/include/linux/log2.h
> > > +++ b/include/linux/log2.h
> > > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> > > */
> > >#define roundup_pow_of_two(n)  \
> > >(  \
> > > - __builtin_constant_p(n) ? ( \
> > > - (n == 1) ? 1 :  \
> > > - (1UL << (ilog2((n) - 1) + 1))   \
> > > -) :  \
> > > - __roundup_pow_of_two(n) \
> > > + (__builtin_constant_p(n) && (n == 1)) ? \
> > > + 1 : (1UL << (ilog2((n) - 1) + 1))   \
> >
> > Then here you'd have to use ULL instead of UL, right? I want my 64bit value
> > everywhere regardless of the CPU arch. The downside is that would affect
> > performance to some extent (i.e. returning a 64bit value where you used to 
> > have
> > a 32bit one)?
>
> True, although it's possible that 1ULL might result in the same codegen if
> the compiler can see that the result is immediately truncated back to long
> anyway. Or at worst, I suppose "(typeof(n))1" could suffice, however ugly.
> Either way, this diff was only an illustration rather than a concrete
> proposal, but it might be an interesting diversion to investigate.
>
> On that note, though, you should probably be using ULL in your current patch
> too.
>
> > Also, what about callers to this function on platforms with 32bit 'unsigned
> > longs' that happen to input a 64bit value into this. IIUC we'd have a 
> > change of
> > behaviour.
>
> Indeed, although the change in such a case would be "start getting the
> expected value instead of nonsense", so it might very well be welcome ;)

Agree, if code overflowed with 32 bits before this change, the code was already
broken. At least now, it won't overflow.

>
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Robin Murphy

On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote:

On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:

On 26/11/2019 12:51 pm, Leon Romanovsky wrote:

On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
new generic 64bit variant of the function and cleanup rougue custom
implementations.


Is it possible to create general roundup/rounddown_pow_two() which will
work correctly for any type of variables, instead of creating special
variant for every type?


In fact, that is sort of the case already - roundup_pow_of_two() itself
wraps ilog2() such that the constant case *is* type-independent. And
since ilog2() handles non-constant values anyway, might it be reasonable
to just take the strongly-typed __roundup_pow_of_two() helper out of the
loop as below?

Robin



That looks way better that's for sure. Some questions.


->8-
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..e825f8a6e8b5 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
*/
   #define roundup_pow_of_two(n)\
   (\
-   __builtin_constant_p(n) ? ( \
-   (n == 1) ? 1 :  \
-   (1UL << (ilog2((n) - 1) + 1)) \
-  ) :  \
-   __roundup_pow_of_two(n) \
+   (__builtin_constant_p(n) && (n == 1)) ? \
+   1 : (1UL << (ilog2((n) - 1) + 1)) \


Then here you'd have to use ULL instead of UL, right? I want my 64bit value
everywhere regardless of the CPU arch. The downside is that would affect
performance to some extent (i.e. returning a 64bit value where you used to have
a 32bit one)?


True, although it's possible that 1ULL might result in the same codegen 
if the compiler can see that the result is immediately truncated back to 
long anyway. Or at worst, I suppose "(typeof(n))1" could suffice, 
however ugly. Either way, this diff was only an illustration rather than 
a concrete proposal, but it might be an interesting diversion to 
investigate.


On that note, though, you should probably be using ULL in your current 
patch too.



Also, what about callers to this function on platforms with 32bit 'unsigned
longs' that happen to input a 64bit value into this. IIUC we'd have a change of
behaviour.


Indeed, although the change in such a case would be "start getting the 
expected value instead of nonsense", so it might very well be welcome ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: match the original algorithm

2019-11-27 Thread Qian Cai



> On Nov 27, 2019, at 1:01 PM, John Garry  wrote:
> 
> I haven't gone into the details, but this patch alone is giving this:
> 
> root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see 
> /sys/kernel/debug/kmemleak)
> 
> root@(none)$ cat /sys/kernel/debug/kmemleak
> unreferenced object 0x002339843000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
>  hex dump (first 32 bytes):
>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  backtrace:
>[<1d2710bf>] kmem_cache_alloc+0x188/0x260
>[] init_iova_domain+0x1e8/0x2a8
>[<2646fc92>] iommu_setup_dma_ops+0x200/0x710
>[] arch_setup_dma_ops+0x80/0x128
>[<994e1e43>] acpi_dma_configure+0x11c/0x140
>[] pci_dma_configure+0xe0/0x108
>[] really_probe+0x210/0x548
>[<87884b1b>] driver_probe_device+0x7c/0x148
>[<10af2936>] device_driver_attach+0x94/0xa0
>[] __driver_attach+0xa4/0x110
>[] bus_for_each_dev+0xe8/0x158
>[] driver_attach+0x30/0x40
>[<3cf39ba8>] bus_add_driver+0x234/0x2f0
>[<43830a45>] driver_register+0xbc/0x1d0
>[] __pci_register_driver+0xb0/0xc8
>[] sas_v3_pci_driver_init+0x20/0x28
> unreferenced object 0x002339844000 (size 2048):
>  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
> 
> [snip]
> 
> And I don't feel like continuing until it's resolved

Thanks for talking a hit by this before me. It is frustrating that people tend 
not to test their patches properly  with things like kmemleak.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne
On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:
> On 26/11/2019 12:51 pm, Leon Romanovsky wrote:
> > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:
> > > Some users need to make sure their rounding function accepts and returns
> > > 64bit long variables regardless of the architecture. Sadly
> > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> > > new generic 64bit variant of the function and cleanup rougue custom
> > > implementations.
> > 
> > Is it possible to create general roundup/rounddown_pow_two() which will
> > work correctly for any type of variables, instead of creating special
> > variant for every type?
> 
> In fact, that is sort of the case already - roundup_pow_of_two() itself 
> wraps ilog2() such that the constant case *is* type-independent. And 
> since ilog2() handles non-constant values anyway, might it be reasonable 
> to just take the strongly-typed __roundup_pow_of_two() helper out of the 
> loop as below?
> 
> Robin
> 

That looks way better that's for sure. Some questions.

> ->8-
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..e825f8a6e8b5 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>*/
>   #define roundup_pow_of_two(n)   \
>   (   \
> - __builtin_constant_p(n) ? ( \
> - (n == 1) ? 1 :  \
> - (1UL << (ilog2((n) - 1) + 1))   \
> -) :  \
> - __roundup_pow_of_two(n) \
> + (__builtin_constant_p(n) && (n == 1)) ? \
> + 1 : (1UL << (ilog2((n) - 1) + 1))   \

Then here you'd have to use ULL instead of UL, right? I want my 64bit value
everywhere regardless of the CPU arch. The downside is that would affect
performance to some extent (i.e. returning a 64bit value where you used to have
a 32bit one)?

Also, what about callers to this function on platforms with 32bit 'unsigned
longs' that happen to input a 64bit value into this. IIUC we'd have a change of
behaviour.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Robin Murphy

On 26/11/2019 12:51 pm, Leon Romanovsky wrote:

On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
new generic 64bit variant of the function and cleanup rougue custom
implementations.


Is it possible to create general roundup/rounddown_pow_two() which will
work correctly for any type of variables, instead of creating special
variant for every type?


In fact, that is sort of the case already - roundup_pow_of_two() itself 
wraps ilog2() such that the constant case *is* type-independent. And 
since ilog2() handles non-constant values anyway, might it be reasonable 
to just take the strongly-typed __roundup_pow_of_two() helper out of the 
loop as below?


Robin

->8-
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..e825f8a6e8b5 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  */
 #define roundup_pow_of_two(n)  \
 (  \
-   __builtin_constant_p(n) ? ( \
-   (n == 1) ? 1 :  \
-   (1UL << (ilog2((n) - 1) + 1)) \
-  ) :  \
-   __roundup_pow_of_two(n) \
+   (__builtin_constant_p(n) && (n == 1)) ? \
+   1 : (1UL << (ilog2((n) - 1) + 1)) \
  )

 /**
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: match the original algorithm

2019-11-27 Thread John Garry

On 21/11/2019 00:13, Cong Wang wrote:

The IOVA cache algorithm implemented in IOMMU code does not
exactly match the original algorithm described in the paper.

Particularly, it doesn't need to free the loaded empty magazine
when trying to put it back to global depot.

This patch makes it exactly match the original algorithm.



I haven't gone into the details, but this patch alone is giving this:

root@(none)$ [  123.857024] kmemleak: 8 new suspected memory leaks (see 
/sys/kernel/debug/kmemleak)


root@(none)$ cat /sys/kernel/debug/kmemleak
unreferenced object 0x002339843000 (size 2048):
  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<1d2710bf>] kmem_cache_alloc+0x188/0x260
[] init_iova_domain+0x1e8/0x2a8
[<2646fc92>] iommu_setup_dma_ops+0x200/0x710
[] arch_setup_dma_ops+0x80/0x128
[<994e1e43>] acpi_dma_configure+0x11c/0x140
[] pci_dma_configure+0xe0/0x108
[] really_probe+0x210/0x548
[<87884b1b>] driver_probe_device+0x7c/0x148
[<10af2936>] device_driver_attach+0x94/0xa0
[] __driver_attach+0xa4/0x110
[] bus_for_each_dev+0xe8/0x158
[] driver_attach+0x30/0x40
[<3cf39ba8>] bus_add_driver+0x234/0x2f0
[<43830a45>] driver_register+0xbc/0x1d0
[] __pci_register_driver+0xb0/0xc8
[] sas_v3_pci_driver_init+0x20/0x28
unreferenced object 0x002339844000 (size 2048):
  comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s)

[snip]

And I don't feel like continuing until it's resolved

Thanks,
John


Cc: Joerg Roedel 
Signed-off-by: Cong Wang 
---
  drivers/iommu/iova.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 41c605b0058f..92f72a85e62a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -900,7 +900,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
  
  	if (!iova_magazine_full(cpu_rcache->loaded)) {

can_insert = true;
-   } else if (!iova_magazine_full(cpu_rcache->prev)) {
+   } else if (iova_magazine_empty(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
can_insert = true;
} else {
@@ -909,8 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (new_mag) {
spin_lock(>lock);
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
-   rcache->depot[rcache->depot_size++] =
-   cpu_rcache->loaded;
+   swap(rcache->depot[rcache->depot_size], 
cpu_rcache->prev);
+   swap(cpu_rcache->prev, cpu_rcache->loaded);
+   rcache->depot_size++;
} else {
mag_to_free = cpu_rcache->loaded;
}
@@ -963,14 +964,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
  
  	if (!iova_magazine_empty(cpu_rcache->loaded)) {

has_pfn = true;
-   } else if (!iova_magazine_empty(cpu_rcache->prev)) {
+   } else if (iova_magazine_full(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
has_pfn = true;
} else {
spin_lock(>lock);
if (rcache->depot_size > 0) {
-   iova_magazine_free(cpu_rcache->loaded);
-   cpu_rcache->loaded = 
rcache->depot[--rcache->depot_size];
+   swap(rcache->depot[rcache->depot_size - 1], 
cpu_rcache->prev);
+   swap(cpu_rcache->prev, cpu_rcache->loaded);
+   rcache->depot_size--;
has_pfn = true;
}
spin_unlock(>lock);



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] PCI: Add parameter nr_devfns to pci_add_dma_alias

2019-11-27 Thread Logan Gunthorpe



On 2019-11-27 6:27 a.m., James Sewart wrote:
>   * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
>   * which is used to program permissible bus-devfn source addresses for DMA
> @@ -5873,8 +5874,12 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>   * cannot be left as a userspace activity).  DMA aliases should therefore
>   * be configured via quirks, such as the PCI fixup header quirk.
>   */
> -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
> +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, int nr_devfns)
>  {
> + int devfn_to = devfn_from + nr_devfns - 1;
> +
> + BUG_ON(nr_devfns < 1);

Why not just make nr_devfns unsigned and do nothing if it's zero? It
might also be worth checking that nr_devfns + devfn_from is less than
U8_MAX... But I'd probably avoid the BUG_ON and just truncate it.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne
On Tue, 2019-11-26 at 10:19 +0100, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> new generic 64bit variant of the function and cleanup rougue custom
> implementations.
> 
> Signed-off-by: Nicolas Saenz Julienne 

Small Nit: I corrected the patch subject for next version.

linux/log2.h: Add roundup/rounddown_pow_two_u64() family of functions

Note the change here:  

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/7] Raspberry Pi 4 PCIe support

2019-11-27 Thread Nicolas Saenz Julienne
Hi Bjorn,

On Tue, 2019-11-26 at 15:50 -0600, Bjorn Helgaas wrote:
> On Tue, Nov 26, 2019 at 10:19:38AM +0100, Nicolas Saenz Julienne wrote:
> > This series aims at providing support for Raspberry Pi 4's PCIe
> > controller, which is also shared with the Broadcom STB family of
> > devices.
> > Jim Quinlan (3):
> >   dt-bindings: PCI: Add bindings for brcmstb's PCIe device
> >   PCI: brcmstb: add Broadcom STB PCIe host controller driver
> >   PCI: brcmstb: add MSI capability
> 
> Please update these subjects to match the others, i.e., capitalize
> "Add".  Also, I think "Add MSI capability" really means "Add support
> for MSI ..."; in PCIe terms the "MSI Capability" is a structure in
> config space and it's there whether the OS supports it or not.
> 
> No need to repost just for this.

Noted, I'll update them.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()

2019-11-27 Thread Robin Murphy

On 27/11/2019 2:16 pm, Thierry Reding wrote:
[...]

Nevermind that, I figured out that I was missingthe initialization of
some of the S2CR variables. I've got something that I think is working
now, though I don't know yet how to go about cleaning up the initial
mapping and "recycling" it.

I'll clean things up a bit, run some more tests and post a new patch
that can serve as a basis for discussion.


I'm a little puzzled by the smmu->identity domain - disregarding the 
fact that it's not actually used by the given diff ;) - if isolation is 
the reason for not simply using a bypass S2CR for the window between 
reset and the relevant .add_device call where the default domain proper 
comes in[1], then surely exposing the union of memory regions to the 
union of all associated devices isn't all that desirable either.


Either way, I'll give you the pre-emptive warning that this is the SMMU 
in the way of my EFI framebuffer ;)


...
arm-smmu 7fb2.iommu:1 context banks (1 stage-2 only)
...

Robin.

[1] the fact that it currently depends on probe order whether getting 
that .add_device call requires a driver probing for the device is an 
error as discussed elsewhere, and will get fixed separately, I promise.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: support SMMU module probing from the IORT

2019-11-27 Thread John Garry

On 25/11/2019 16:04, Lorenzo Pieralisi wrote:

On Fri, Nov 22, 2019 at 06:41:25PM +0100, Ard Biesheuvel wrote:

Add support for SMMU drivers built as modules to the ACPI/IORT device
probing path, by deferring the probe of the master if the SMMU driver is
known to exist but has not been loaded yet. Given that the IORT code
registers a platform device for each SMMU that it discovers, we can
easily trigger the udev based autoloading of the SMMU drivers by making
the platform device identifier part of the module alias.

Signed-off-by: Ard Biesheuvel 
---
  drivers/acpi/arm64/iort.c   | 4 ++--
  drivers/iommu/arm-smmu-v3.c | 1 +
  drivers/iommu/arm-smmu.c| 1 +
  3 files changed, 4 insertions(+), 2 deletions(-)


I think it is best if Will picks this up and add it to the
series that modularize the SMMU drivers:

Acked-by: Lorenzo Pieralisi 


Tested-by: John Garry  # only manual smmu ko loading




diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 5a7551d060f2..a696457a9b11 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -850,9 +850,9 @@ static inline bool iort_iommu_driver_enabled(u8 type)
  {
switch (type) {
case ACPI_IORT_NODE_SMMU_V3:
-   return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+   return IS_ENABLED(CONFIG_ARM_SMMU_V3);
case ACPI_IORT_NODE_SMMU:
-   return IS_BUILTIN(CONFIG_ARM_SMMU);
+   return IS_ENABLED(CONFIG_ARM_SMMU);
default:
pr_warn("IORT node type %u does not describe an SMMU\n", type);
return false;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7669beafc493..bf6a1e8eb9b0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3733,4 +3733,5 @@ module_platform_driver(arm_smmu_driver);
  
  MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations");

  MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu-v3");
  MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d55acc48aee3..db5106b0955b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2292,4 +2292,5 @@ module_platform_driver(arm_smmu_driver);
  
  MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");

  MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
--
2.20.1


.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-27 Thread Christian Zigotzky

On 27 November 2019 at 07:56 am, Mike Rapoport wrote:


Maybe we'll simply force bottom up allocation before calling
swiotlb_init()? Anyway, it's the last memblock allocation.


diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 62f74b1b33bd..771e6cf7e2b9 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -286,14 +286,15 @@ void __init mem_init(void)
/*
 * book3s is limited to 16 page sizes due to encoding this in
 * a 4-bit field for slices.
 */
BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
  
  #ifdef CONFIG_SWIOTLB

+   memblock_set_bottom_up(true);
swiotlb_init(0);
  #endif
  
  	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);

set_max_mapnr(max_pfn);
memblock_free_all();
  
  

Hello Mike,

I tested the latest Git kernel with your new patch today. My PCI TV card 
works without any problems.


Thanks,
Christian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


make dma_addressing_limited work for memory encryption setups

2019-11-27 Thread Christoph Hellwig
Hi all,

this little series fixes dma_addressing_limited to return true for
systems that use bounce buffers due to memory encryption.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dma-mapping: move dma_addressing_limited out of line

2019-11-27 Thread Christoph Hellwig
This function isn't used in the fast path, and moving it out of line
will reduce include clutter with the next change.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 14 +-
 kernel/dma/mapping.c| 15 +++
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index c4d8741264bd..94ef74ecd18a 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -687,19 +687,7 @@ static inline int dma_coerce_mask_and_coherent(struct 
device *dev, u64 mask)
return dma_set_mask_and_coherent(dev, mask);
 }
 
-/**
- * dma_addressing_limited - return if the device is addressing limited
- * @dev:   device to check
- *
- * Return %true if the devices DMA mask is too small to address all memory in
- * the system, else %false.  Lack of addressing bits is the prime reason for
- * bounce buffering, but might not be the only one.
- */
-static inline bool dma_addressing_limited(struct device *dev)
-{
-   return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
-   dma_get_required_mask(dev);
-}
+bool dma_addressing_limited(struct device *dev);
 
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..1dbe6d725962 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -405,3 +405,18 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+/**
+ * dma_addressing_limited - return if the device is addressing limited
+ * @dev:   device to check
+ *
+ * Return %true if the devices DMA mask is too small to address all memory in
+ * the system, else %false.  Lack of addressing bits is the prime reason for
+ * bounce buffering, but might not be the only one.
+ */
+bool dma_addressing_limited(struct device *dev)
+{
+   return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
+   dma_get_required_mask(dev);
+}
+EXPORT_SYMBOL_GPL(dma_addressing_limited);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited

2019-11-27 Thread Christoph Hellwig
Devices that are forced to DMA through unencrypted bounce buffers
need to be treated as if they are addressing limited.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/mapping.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 1dbe6d725962..f6c35b53d996 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -416,6 +416,8 @@ EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
  */
 bool dma_addressing_limited(struct device *dev)
 {
+   if (force_dma_unencrypted(dev))
+   return true;
return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
dma_get_required_mask(dev);
 }
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()

2019-11-27 Thread Thierry Reding
On Wed, Nov 27, 2019 at 01:16:54PM +0100, Thierry Reding wrote:
> On Tue, Sep 17, 2019 at 06:59:50PM +0100, Will Deacon wrote:
> > On Mon, Sep 02, 2019 at 04:52:45PM +0200, Thierry Reding wrote:
> > > On Mon, Sep 02, 2019 at 03:22:35PM +0100, Robin Murphy wrote:
> > > > On 29/08/2019 12:14, Thierry Reding wrote:
> > > > > From: Thierry Reding 
> > > > > 
> > > > > For device tree nodes, use the standard of_iommu_get_resv_regions()
> > > > > implementation to obtain the reserved memory regions associated with a
> > > > > device.
> > > > 
> > > > This covers the window between iommu_probe_device() setting up a default
> > > > domain and the device's driver finally probing and taking control, but
> > > > iommu_probe_device() represents the point that the IOMMU driver first 
> > > > knows
> > > > about this device - there's still a window from whenever the IOMMU 
> > > > driver
> > > > itself probed up to here where the "unidentified" traffic may have 
> > > > already
> > > > been disrupted. Some IOMMU drivers have no option but to make the 
> > > > necessary
> > > > configuration during their own probe routine, at which point a struct 
> > > > device
> > > > for the display/etc. endpoint may not even exist yet.
> > > 
> > > Yeah, I think I'm actually running into this issue with the ARM SMMU
> > > driver. The above works fine with the Tegra SMMU driver, though, because
> > > it doesn't touch the SMMU configuration until a device is attached to a
> > > domain.
> > > 
> > > For anything earlier than iommu_probe_device(), I don't see a way of
> > > doing this generically. I've been working on a prototype to make these
> > > reserved memory regions early on for ARM SMMU but I've been failing so
> > > far. I think it would possibly work if we just switched the default for
> > > stream IDs to be "bypass" if they have any devices that have reserved
> > > memory regions, but again, this isn't quite working (yet).
> > 
> > I think we should avoid the use of "bypass" outside of the IOMMU probe()
> > routine if at all possible, since it leaves the thing wide open if we don't
> > subsequently probe the master.
> > 
> > Why can't we initialise a page-table early for StreamIDs with these
> > reserved regions, and then join that up later on if we see a device with
> > one of those StreamIDs attaching to a DMA domain? I suppose the nasty
> > case would be attaching to a domain that already has other masters
> > attached to it. Can we forbid that somehow for these devices? Otherwise,
> > I think we'd have to transiently switch to bypass whilst switching page
> > table.
> > 
> > What problems did you run into trying to implement this?
> 
> I picked this up again and was trying to make this work with your
> suggestion. Below is a rough draft and it seems to be working to some
> degree. Here's an extract of the log when I run this on Jetson TX2:
> 
>   [3.755328] arm-smmu 1200.iommu: probing hardware 
> configuration...
>   [3.762187] arm-smmu 1200.iommu: SMMUv2 with:
>   [3.767137] arm-smmu 1200.iommu: stage 1 translation
>   [3.772806] arm-smmu 1200.iommu: stage 2 translation
>   [3.778471] arm-smmu 1200.iommu: nested translation
>   [3.784050] arm-smmu 1200.iommu: stream matching with 
> 128 register groups
>   [3.791651] arm-smmu 1200.iommu: 64 context banks (0 
> stage-2 only)
>   [3.798603] arm-smmu 1200.iommu: Supported page sizes: 
> 0x61311000
>   [3.805460] arm-smmu 1200.iommu: Stage-1: 48-bit VA -> 
> 48-bit IPA
>   [3.812310] arm-smmu 1200.iommu: Stage-2: 48-bit IPA -> 
> 48-bit PA
>   [3.819159] arm-smmu 1200.iommu: > 
> arm_smmu_setup_identity(smmu=0001eabcec80)
>   [3.827373] arm-smmu 1200.iommu:   identity domain: 
> 0001eaf8cae8 (ops: 0x0)
>   [3.835614] arm-smmu 1200.iommu:   np: /ethernet@249
>   [3.841635] arm-smmu 1200.iommu:   np: /sdhci@340
>   [3.847315] arm-smmu 1200.iommu:   np: /sdhci@342
>   [3.852990] arm-smmu 1200.iommu:   np: /sdhci@344
>   [3.858683] arm-smmu 1200.iommu:   np: /sdhci@346
>   [3.864370] arm-smmu 1200.iommu:   np: /hda@351
>   [3.869897] arm-smmu 1200.iommu:   np: /usb@353
>   [3.875421] arm-smmu 1200.iommu:   np: /pcie@10003000
>   [3.881109] arm-smmu 1200.iommu:   np: /host1x@13e0
>   [3.887012] arm-smmu 1200.iommu:   np: 
> /host1x@13e0/display-hub@1520/display@1520
>   [3.896344] arm-smmu 1200.iommu: region: 
> /reserved-memory/framebuffer@9607c000
>   [3.904707] arm-smmu 1200.iommu:   [mem 
> 0x9607c000-0x9687bfff]
>   [3.915719] arm-smmu 1200.iommu: /iommu@1200: 1 arguments
>   [3.922487] arm-smmu 1200.iommu:   0: 0009
>   [

[PATCH v3 2/2] PCI: Add DMA alias quirk for PLX PEX NTB

2019-11-27 Thread James Sewart via iommu
The PLX PEX NTB forwards DMA transactions using Requester ID's that
don't exist as PCI devices. The devfn for a transaction is used as an
index into a lookup table storing the origin of a transaction on the
other side of the bridge.

This patch aliases all possible devfn's to the NTB device so that any
transaction coming in is governed by the mappings for the NTB.

Reviewed-by: Logan Gunthorpe 
Signed-off-by: James Sewart 
---
 drivers/pci/quirks.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0f3f5afc73fd..3a67049ca630 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5315,6 +5315,21 @@ SWITCHTEC_QUIRK(0x8574);  /* PFXI 64XG3 */
 SWITCHTEC_QUIRK(0x8575);  /* PFXI 80XG3 */
 SWITCHTEC_QUIRK(0x8576);  /* PFXI 96XG3 */
 
+/*
+ * PLX NTB uses devfn proxy IDs to move TLPs between NT endpoints. These IDs
+ * are used to forward responses to the originator on the other side of the
+ * NTB. Alias all possible IDs to the NTB to permit access when the IOMMU is
+ * turned on.
+ */
+static void quirk_plx_ntb_dma_alias(struct pci_dev *pdev)
+{
+   pci_info(pdev, "Setting PLX NTB proxy ID aliases\n");
+   /* PLX NTB may use all 256 devfns */
+   pci_add_dma_alias(pdev, 0, 256);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b0, quirk_plx_ntb_dma_alias);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b1, quirk_plx_ntb_dma_alias);
+
 /*
  * On Lenovo Thinkpad P50 SKUs with a Nvidia Quadro M1000M, the BIOS does
  * not always reset the secondary Nvidia GPU between reboots if the system
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] PCI: Add parameter nr_devfns to pci_add_dma_alias

2019-11-27 Thread James Sewart via iommu
pci_add_dma_alias can now be used to create a dma alias for a range of
devfns.

Signed-off-by: James Sewart 
---
 drivers/pci/pci.c| 21 -
 drivers/pci/quirks.c | 14 +++---
 include/linux/pci.h  |  2 +-
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a97e2571a527..71dbabf5ee3d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5857,7 +5857,8 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
 /**
  * pci_add_dma_alias - Add a DMA devfn alias for a device
  * @dev: the PCI device for which alias is added
- * @devfn: alias slot and function
+ * @devfn_from: alias slot and function
+ * @nr_devfns: Number of subsequent devfns to alias
  *
  * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
  * which is used to program permissible bus-devfn source addresses for DMA
@@ -5873,8 +5874,12 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
  * cannot be left as a userspace activity).  DMA aliases should therefore
  * be configured via quirks, such as the PCI fixup header quirk.
  */
-void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
+void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, int nr_devfns)
 {
+   int devfn_to = devfn_from + nr_devfns - 1;
+
+   BUG_ON(nr_devfns < 1);
+
if (!dev->dma_alias_mask)
dev->dma_alias_mask = bitmap_zalloc(U8_MAX, GFP_KERNEL);
if (!dev->dma_alias_mask) {
@@ -5882,9 +5887,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
return;
}
 
-   set_bit(devfn, dev->dma_alias_mask);
-   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
-PCI_SLOT(devfn), PCI_FUNC(devfn));
+   bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns);
+
+   if (nr_devfns == 1)
+   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from));
+   else
+   pci_info(dev, "Enabling fixed DMA alias for devfn range from 
%02x.%d to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from),
+   PCI_SLOT(devfn_to), PCI_FUNC(devfn_to));
 }
 
 bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 320255e5e8f8..0f3f5afc73fd 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3932,7 +3932,7 @@ int pci_dev_specific_reset(struct pci_dev *dev, int probe)
 static void quirk_dma_func0_alias(struct pci_dev *dev)
 {
if (PCI_FUNC(dev->devfn) != 0)
-   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0), 1);
 }
 
 /*
@@ -3946,7 +3946,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe476, 
quirk_dma_func0_alias);
 static void quirk_dma_func1_alias(struct pci_dev *dev)
 {
if (PCI_FUNC(dev->devfn) != 1)
-   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1));
+   pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1), 1);
 }
 
 /*
@@ -4031,7 +4031,7 @@ static void quirk_fixed_dma_alias(struct pci_dev *dev)
 
id = pci_match_id(fixed_dma_alias_tbl, dev);
if (id)
-   pci_add_dma_alias(dev, id->driver_data);
+   pci_add_dma_alias(dev, id->driver_data, 1);
 }
 
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ADAPTEC2, 0x0285, 
quirk_fixed_dma_alias);
@@ -4073,9 +4073,9 @@ DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, 
quirk_use_pcie_bridge_dma_alias);
  */
 static void quirk_mic_x200_dma_alias(struct pci_dev *pdev)
 {
-   pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0));
-   pci_add_dma_alias(pdev, PCI_DEVFN(0x11, 0x0));
-   pci_add_dma_alias(pdev, PCI_DEVFN(0x12, 0x3));
+   pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0), 1);
+   pci_add_dma_alias(pdev, PCI_DEVFN(0x11, 0x0), 1);
+   pci_add_dma_alias(pdev, PCI_DEVFN(0x12, 0x3), 1);
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, 
quirk_mic_x200_dma_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, 
quirk_mic_x200_dma_alias);
@@ -5273,7 +5273,7 @@ static void quirk_switchtec_ntb_dma_alias(struct pci_dev 
*pdev)
pci_dbg(pdev,
"Aliasing Partition %d Proxy ID %02x.%d\n",
pp, PCI_SLOT(devfn), PCI_FUNC(devfn));
-   pci_add_dma_alias(pdev, devfn);
+   pci_add_dma_alias(pdev, devfn, 1);
}
}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1a6cf19eac2d..f51bdda49e9a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2323,7 +2323,7 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct 
pci_dev *pdev)
 }
 #endif
 
-void pci_add_dma_alias(struct pci_dev *dev, u8 devfn);
+void pci_add_dma_alias(struct pci_dev 

Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()

2019-11-27 Thread Thierry Reding
On Tue, Sep 17, 2019 at 06:59:50PM +0100, Will Deacon wrote:
> On Mon, Sep 02, 2019 at 04:52:45PM +0200, Thierry Reding wrote:
> > On Mon, Sep 02, 2019 at 03:22:35PM +0100, Robin Murphy wrote:
> > > On 29/08/2019 12:14, Thierry Reding wrote:
> > > > From: Thierry Reding 
> > > > 
> > > > For device tree nodes, use the standard of_iommu_get_resv_regions()
> > > > implementation to obtain the reserved memory regions associated with a
> > > > device.
> > > 
> > > This covers the window between iommu_probe_device() setting up a default
> > > domain and the device's driver finally probing and taking control, but
> > > iommu_probe_device() represents the point that the IOMMU driver first 
> > > knows
> > > about this device - there's still a window from whenever the IOMMU driver
> > > itself probed up to here where the "unidentified" traffic may have already
> > > been disrupted. Some IOMMU drivers have no option but to make the 
> > > necessary
> > > configuration during their own probe routine, at which point a struct 
> > > device
> > > for the display/etc. endpoint may not even exist yet.
> > 
> > Yeah, I think I'm actually running into this issue with the ARM SMMU
> > driver. The above works fine with the Tegra SMMU driver, though, because
> > it doesn't touch the SMMU configuration until a device is attached to a
> > domain.
> > 
> > For anything earlier than iommu_probe_device(), I don't see a way of
> > doing this generically. I've been working on a prototype to make these
> > reserved memory regions early on for ARM SMMU but I've been failing so
> > far. I think it would possibly work if we just switched the default for
> > stream IDs to be "bypass" if they have any devices that have reserved
> > memory regions, but again, this isn't quite working (yet).
> 
> I think we should avoid the use of "bypass" outside of the IOMMU probe()
> routine if at all possible, since it leaves the thing wide open if we don't
> subsequently probe the master.
> 
> Why can't we initialise a page-table early for StreamIDs with these
> reserved regions, and then join that up later on if we see a device with
> one of those StreamIDs attaching to a DMA domain? I suppose the nasty
> case would be attaching to a domain that already has other masters
> attached to it. Can we forbid that somehow for these devices? Otherwise,
> I think we'd have to transiently switch to bypass whilst switching page
> table.
> 
> What problems did you run into trying to implement this?

I picked this up again and was trying to make this work with your
suggestion. Below is a rough draft and it seems to be working to some
degree. Here's an extract of the log when I run this on Jetson TX2:

[3.755328] arm-smmu 1200.iommu: probing hardware 
configuration...
[3.762187] arm-smmu 1200.iommu: SMMUv2 with:
[3.767137] arm-smmu 1200.iommu: stage 1 translation
[3.772806] arm-smmu 1200.iommu: stage 2 translation
[3.778471] arm-smmu 1200.iommu: nested translation
[3.784050] arm-smmu 1200.iommu: stream matching with 
128 register groups
[3.791651] arm-smmu 1200.iommu: 64 context banks (0 
stage-2 only)
[3.798603] arm-smmu 1200.iommu: Supported page sizes: 
0x61311000
[3.805460] arm-smmu 1200.iommu: Stage-1: 48-bit VA -> 
48-bit IPA
[3.812310] arm-smmu 1200.iommu: Stage-2: 48-bit IPA -> 
48-bit PA
[3.819159] arm-smmu 1200.iommu: > 
arm_smmu_setup_identity(smmu=0001eabcec80)
[3.827373] arm-smmu 1200.iommu:   identity domain: 
0001eaf8cae8 (ops: 0x0)
[3.835614] arm-smmu 1200.iommu:   np: /ethernet@249
[3.841635] arm-smmu 1200.iommu:   np: /sdhci@340
[3.847315] arm-smmu 1200.iommu:   np: /sdhci@342
[3.852990] arm-smmu 1200.iommu:   np: /sdhci@344
[3.858683] arm-smmu 1200.iommu:   np: /sdhci@346
[3.864370] arm-smmu 1200.iommu:   np: /hda@351
[3.869897] arm-smmu 1200.iommu:   np: /usb@353
[3.875421] arm-smmu 1200.iommu:   np: /pcie@10003000
[3.881109] arm-smmu 1200.iommu:   np: /host1x@13e0
[3.887012] arm-smmu 1200.iommu:   np: 
/host1x@13e0/display-hub@1520/display@1520
[3.896344] arm-smmu 1200.iommu: region: 
/reserved-memory/framebuffer@9607c000
[3.904707] arm-smmu 1200.iommu:   [mem 
0x9607c000-0x9687bfff]
[3.915719] arm-smmu 1200.iommu: /iommu@1200: 1 arguments
[3.922487] arm-smmu 1200.iommu:   0: 0009
[3.927888] arm-smmu 1200.iommu:   SID: 0009 MASK: 7f80
[3.934132] arm-smmu 1200.iommu:   found index: 0
[3.939840] arm-smmu 1200.iommu:   np: 

Re: [PATCH v3 09/14] iommu/arm-smmu: Prevent forced unbinding of Arm SMMU drivers

2019-11-27 Thread John Garry

On 27/11/2019 11:04, John Garry wrote:

On 26/11/2019 20:27, Saravana Kannan wrote:

On Tue, Nov 26, 2019 at 1:13 AM John Garry  wrote:


On 21/11/2019 11:49, Will Deacon wrote:
Forcefully unbinding the Arm SMMU drivers is a pretty dangerous 
operation,

since it will likely lead to catastrophic failure for any DMA devices
mastering through the SMMU being unbound. When the driver then attempts
to "handle" the fatal faults, it's very easy to trip over dead data
structures, leading to use-after-free.

On John's machine, he reports that the machine was "unusable" due to
loss of the storage controller following a forced unbind of the SMMUv3
driver:

    | # cd ./bus/platform/drivers/arm-smmu-v3
    | # echo arm-smmu-v3.0.auto > unbind
    | hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
    | platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x0146
    | [hwprod 0x0146, hwcons 0x]

Prevent this forced unbinding of the drivers by setting 
"suppress_bind_attrs"

to true.


This seems a reasonable approach for now.

BTW, I'll give this series a spin this week, which again looks to be
your iommu/module branch, excluding the new IORT patch.




Hi Saravana,


Is this on a platform where of_devlink creates device links between
the iommu device and its suppliers?I'm guessing no? Because device
links should for unbinding of all the consumers before unbinding the
supplier.


I'm only really interested in ACPI, TBH.



Looks like it'll still allow the supplier to unbind if the consumers
don't allow unbinding. Is that the case here?


So just unbinding the driver from a device does not delete the device 
nor exit the device from it's IOMMU group - so we keep the reference to 
the SMMU ko. As such, I don't know how to realistically test unloading 
the SMMU ko when we have platform devices involved. Maybe someone can 
enlighten me...


But I could do it on our D06 dev board, where all devices behind the 
SMMUs are PCI based:


--- Initial state ---

root@(none)$ dmesg | grep Adding
[   30.271801] pcieport :00:00.0: Adding to iommu group 0
[   30.296088] pcieport :00:04.0: Adding to iommu group 1
[   30.322234] pcieport :00:08.0: Adding to iommu group 2
[   30.335641] pcieport :00:0c.0: Adding to iommu group 3
[   30.343114] pcieport :00:10.0: Adding to iommu group 4
[   30.355650] pcieport :00:12.0: Adding to iommu group 5
[   30.366794] pcieport :7c:00.0: Adding to iommu group 6
[   30.377993] hns3 :7d:00.0: Adding to iommu group 7
[   31.861957] hns3 :7d:00.1: Adding to iommu group 8
[   33.313967] hns3 :7d:00.2: Adding to iommu group 9
[   33.436029] hns3 :7d:00.3: Adding to iommu group 10
[   33.555935] hns3 :bd:00.0: Adding to iommu group 11
[   35.143851] pcieport :74:00.0: Adding to iommu group 12
[   35.150736] pcieport :80:00.0: Adding to iommu group 13
[   35.158910] pcieport :80:08.0: Adding to iommu group 14
[   35.166860] pcieport :80:0c.0: Adding to iommu group 15
[   35.174813] pcieport :80:10.0: Adding to iommu group 16
[   35.182854] pcieport :bc:00.0: Adding to iommu group 17
[   35.189702] pcieport :b4:00.0: Adding to iommu group 18
[   35.196445] hisi_sas_v3_hw :74:02.0: Adding to iommu group 19
[   39.410693] ahci :74:03.0: Adding to iommu group 20
root@(none)$ lsmod
Not tainted
arm_smmu_v3 40960 21 - Live 0x88c6

--- Start removing devices ---

root@(none)$  echo 1 > ./sys/devices/pci:00/:00:00.0/remove
[   55.567808] pci_bus :01: busn_res: [bus 01] is released
[   55.573514] pci :00:00.0: Removing from iommu group 0
root@(none)$  echo 1 > ./sys/devices/pci:00/:00:04.0/remove
[   61.767425] pci_bus :02: busn_res: [bus 02] is released
[   61.773132] pci :00:04.0: Removing from iommu group 1
root@(none)$ echo 1 > ./sys/devices/pci:00/:00:04.0/remove
sh: ./sys/devices/pci:00/:00:04.0/remove: No such file or directory
root@(none)$  echo 1 > ./sys/devices/pci:00/:00:08.0/remove
[   75.635417] pci_bus :03: busn_res: [bus 03] is released
[   75.641124] pci :00:08.0: Removing from iommu group 2
root@(none)$ echo 1 > ./sys/devices/pci:00/:00:0c.0/remove
[   81.587419] pci_bus :04: busn_res: [bus 04] is released
[   81.593110] pci :00:0c.0: Removing from iommu group 3
root@(none)$  echo 1 > ./sys/devices/pci:00/:00:10.0/remove
[   85.607605] pci_bus :05: busn_res: [bus 05] is released
[   85.613300] pci :00:10.0: Removing from iommu group 4
root@(none)$ echo 1 > ./sys/devices/pci:00/:00:12.0/remove
[   92.731421] pci_bus :06: busn_res: [bus 06] is released
[   92.737125] pci :00:12.0: Removing from iommu group 5
root@(none)$ echo 1 > ./sys/devices/pci:7c/:7c:00.0/remove
[  102.286726] pci :7d:00.0: Removing from iommu group 7
[  102.294157] pci :7d:00.1: Removing from iommu group 8
[  102.301634] pci :7d:00.2: Removing from iommu group 9
[  102.308973] pci 

Re: [PATCH v3 09/14] iommu/arm-smmu: Prevent forced unbinding of Arm SMMU drivers

2019-11-27 Thread John Garry

On 26/11/2019 20:27, Saravana Kannan wrote:

On Tue, Nov 26, 2019 at 1:13 AM John Garry  wrote:


On 21/11/2019 11:49, Will Deacon wrote:

Forcefully unbinding the Arm SMMU drivers is a pretty dangerous operation,
since it will likely lead to catastrophic failure for any DMA devices
mastering through the SMMU being unbound. When the driver then attempts
to "handle" the fatal faults, it's very easy to trip over dead data
structures, leading to use-after-free.

On John's machine, he reports that the machine was "unusable" due to
loss of the storage controller following a forced unbind of the SMMUv3
driver:

| # cd ./bus/platform/drivers/arm-smmu-v3
| # echo arm-smmu-v3.0.auto > unbind
| hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
| platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x0146
| [hwprod 0x0146, hwcons 0x]

Prevent this forced unbinding of the drivers by setting "suppress_bind_attrs"
to true.


This seems a reasonable approach for now.

BTW, I'll give this series a spin this week, which again looks to be
your iommu/module branch, excluding the new IORT patch.




Hi Saravana,


Is this on a platform where of_devlink creates device links between
the iommu device and its suppliers?I'm guessing no? Because device
links should for unbinding of all the consumers before unbinding the
supplier.


I'm only really interested in ACPI, TBH.



Looks like it'll still allow the supplier to unbind if the consumers
don't allow unbinding. Is that the case here?


So just unbinding the driver from a device does not delete the device 
nor exit the device from it's IOMMU group - so we keep the reference to 
the SMMU ko. As such, I don't know how to realistically test unloading 
the SMMU ko when we have platform devices involved. Maybe someone can 
enlighten me...


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-11-27 Thread Christoph Hellwig
On Wed, Nov 27, 2019 at 08:56:25AM +0200, Mike Rapoport wrote:
> Maybe we'll simply force bottom up allocation before calling
> swiotlb_init()? Anyway, it's the last memblock allocation.

That should work, but I don't think it is the proper fix.  The underlying
issue here is that ZONE_DMA/DMA32 sizing is something that needs to
be propagated to memblock and dma-direct as is based around addressing
limitations.  But our zone initialization is such a mess that we
can't just reuse a variable.  Nicolas has started to clean some of
this up, but we need to clean that whole zone initialization mess up
a lot more.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu