[PATCH 0/1] iommu/vt-d: Fixes for v5.18-rc3

2022-04-09 Thread Lu Baolu
Hi Joerg,

One fix is queued for v5.18. It aims to fix:

 - Calculate a feasible mask for non-aligned page-selective
   IOTLB invalidation.

Please consider it for the iommu/fix branch.

Best regards,
Lu Baolu

David Stevens (1):
  iommu/vt-d: Calculate mask for non-aligned flushes

 drivers/iommu/intel/iommu.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: Calculate mask for non-aligned flushes

2022-04-09 Thread Lu Baolu
From: David Stevens 

Calculate the appropriate mask for non-size-aligned page selective
invalidation. Since psi uses the mask value to mask out the lower order
bits of the target address, properly flushing the iotlb requires using a
mask value such that [pfn, pfn+pages) all lie within the flushed
size-aligned region.  This is not normally an issue because iova.c
always allocates iovas that are aligned to their size. However, iovas
which come from other sources (e.g. userspace via VFIO) may not be
aligned.

To properly flush the IOTLB, both the start and end pfns need to be
equal after applying the mask. That means that the most efficient mask
to use is the index of the lowest bit that is equal where all higher
bits are also equal. For example, if pfn=0x17f and pages=3, then
end_pfn=0x181, so the smallest mask we can use is 8. Any differences
above the highest bit of pages are due to carrying, so by xnor'ing pfn
and end_pfn and then masking out the lower order bits based on pages, we
get 0xff00, where the first set bit is the mask we want to use.

Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking")
Cc: sta...@vger.kernel.org
Signed-off-by: David Stevens 
Reviewed-by: Kevin Tian 
Link: https://lore.kernel.org/r/20220401022430.1262215-1-steve...@google.com
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942..0ea47e17b379 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1588,7 +1588,8 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
  unsigned long pfn, unsigned int pages,
  int ih, int map)
 {
-   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
+   unsigned int aligned_pages = __roundup_pow_of_two(pages);
+   unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
u16 did = domain->iommu_did[iommu->seq_id];
 
@@ -1600,10 +1601,30 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
if (domain_use_first_level(domain)) {
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
} else {
+   unsigned long bitmask = aligned_pages - 1;
+
+   /*
+* PSI masks the low order bits of the base address. If the
+* address isn't aligned to the mask, then compute a mask value
+* needed to ensure the target range is flushed.
+*/
+   if (unlikely(bitmask & pfn)) {
+   unsigned long end_pfn = pfn + pages - 1, shared_bits;
+
+   /*
+* Since end_pfn <= pfn + bitmask, the only way bits
+* higher than bitmask can differ in pfn and end_pfn is
+* by carrying. This means after masking out bitmask,
+* high bits starting with the first set bit in
+* shared_bits are all equal in both pfn and end_pfn.
+*/
+   shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
+   mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
+   }
+
/*
 * Fallback to domain selective flush if no PSI support or
-* the size is too big. PSI requires page size to be 2 ^ x,
-* and the base address is naturally aligned to the size.
+* the size is too big.
 */
if (!cap_pgsel_inv(iommu->cap) ||
mask > cap_max_amask_val(iommu->cap))
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: Change return type of dmar_insert_one_dev_info()

2022-04-09 Thread Lu Baolu
The dmar_insert_one_dev_info() returns the pass-in domain on success and
NULL on failure. This doesn't make much sense. Change it to an integer.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d24e6da33a60..5682f3de205d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2474,10 +2474,9 @@ static bool dev_is_real_dma_subdevice(struct device *dev)
   pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
 }
 
-static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
-   int bus, int devfn,
-   struct device *dev,
-   struct dmar_domain *domain)
+static int dmar_insert_one_dev_info(struct intel_iommu *iommu, int bus,
+   int devfn, struct device *dev,
+   struct dmar_domain *domain)
 {
struct device_domain_info *info = dev_iommu_priv_get(dev);
unsigned long flags;
@@ -2490,7 +2489,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
spin_unlock(>lock);
if (ret) {
spin_unlock_irqrestore(_domain_lock, flags);
-   return NULL;
+   return -ENODEV;
}
list_add(>link, >devices);
spin_unlock_irqrestore(_domain_lock, flags);
@@ -2501,7 +2500,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (ret) {
dev_err(dev, "PASID table allocation failed\n");
dmar_remove_one_dev_info(dev);
-   return NULL;
+   return -ENOMEM;
}
 
/* Setup the PASID entry for requests without PASID: */
@@ -2519,17 +2518,17 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (ret) {
dev_err(dev, "Setup RID2PASID failed\n");
dmar_remove_one_dev_info(dev);
-   return NULL;
+   return -ENODEV;
}
}
 
if (domain_context_mapping(domain, dev)) {
dev_err(dev, "Domain context map failed\n");
dmar_remove_one_dev_info(dev);
-   return NULL;
+   return -ENODEV;
}
 
-   return domain;
+   return 0;
 }
 
 static int iommu_domain_identity_map(struct dmar_domain *domain,
@@ -2607,7 +2606,6 @@ static int __init si_domain_init(int hw)
 
 static int domain_add_dev_info(struct dmar_domain *domain, struct device *dev)
 {
-   struct dmar_domain *ndomain;
struct intel_iommu *iommu;
u8 bus, devfn;
 
@@ -2615,11 +2613,7 @@ static int domain_add_dev_info(struct dmar_domain 
*domain, struct device *dev)
if (!iommu)
return -ENODEV;
 
-   ndomain = dmar_insert_one_dev_info(iommu, bus, devfn, dev, domain);
-   if (ndomain != domain)
-   return -EBUSY;
-
-   return 0;
+   return dmar_insert_one_dev_info(iommu, bus, devfn, dev, domain);
 }
 
 static bool device_has_rmrr(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/4] iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE

2022-04-09 Thread Lu Baolu

On 2022/4/7 23:23, Jason Gunthorpe wrote:

While the comment was correct that this flag was intended to convey the
block no-snoop support in the IOMMU, it has become widely implemented and
used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel
driver was different.

Now that the Intel driver is using enforce_cache_coherency() update the
comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about
IOMMU_CACHE.  Fix the Intel driver to return true since IOMMU_CACHE always
works.

The two places that test this flag, usnic and vdpa, are both assigning
userspace pages to a driver controlled iommu_domain and require
IOMMU_CACHE behavior as they offer no way for userspace to synchronize
caches.

Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/intel/iommu.c | 2 +-
  include/linux/iommu.h   | 3 +--
  2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8f3674e997df06..14ba185175e9ec 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4556,7 +4556,7 @@ static bool intel_iommu_enforce_cache_coherency(struct 
iommu_domain *domain)
  static bool intel_iommu_capable(enum iommu_cap cap)
  {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
-   return domain_update_iommu_snooping(NULL);
+   return true;
if (cap == IOMMU_CAP_INTR_REMAP)
return irq_remapping_enabled == 1;
  
diff --git a/include/linux/iommu.h b/include/linux/iommu.h

index fe4f24c469c373..fd58f7adc52796 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,8 +103,7 @@ static inline bool iommu_is_dma_domain(struct iommu_domain 
*domain)
  }
  
  enum iommu_cap {

-   IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
-  transactions */
+   IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU_CACHE is supported */
IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
  };


Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] vfio: Move the Intel no-snoop control off of IOMMU_CACHE

2022-04-09 Thread Lu Baolu

On 2022/4/8 16:16, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Thursday, April 7, 2022 11:24 PM

IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should
be cache
coherent" and is used by the DMA API. The definition allows for special
non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe
TLPs - so long as this behavior is opt-in by the device driver.

The flag is mainly used by the DMA API to synchronize the IOMMU setting
with the expected cache behavior of the DMA master. eg based on
dev_is_dma_coherent() in some case.

For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to
be
cache coherent' which has the practical effect of causing the IOMMU to
ignore the no-snoop bit in a PCIe TLP.

x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag.

Instead use the new domain op enforce_cache_coherency() which causes
every
IOPTE created in the domain to have the no-snoop blocking behavior.

Reconfigure VFIO to always use IOMMU_CACHE and call
enforce_cache_coherency() to operate the special Intel behavior.

Remove the IOMMU_CACHE test from Intel IOMMU.

Ultimately VFIO plumbs the result of enforce_cache_coherency() back into
the x86 platform code through kvm_arch_register_noncoherent_dma()
which
controls if the WBINVD instruction is available in the guest. No other
arch implements kvm_arch_register_noncoherent_dma().

Signed-off-by: Jason Gunthorpe 


Reviewed-by: Kevin Tian 

btw as discussed in last version it is not necessarily to recalculate
snoop control globally with this new approach. Will follow up to
clean it up after this series is merged.


Agreed. But it also requires the enforce_cache_coherency() to be called
only after domain being attached to a device just as VFIO is doing.

Anyway, for this change in iommu/vt-d:

Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] iommu: Introduce the domain op enforce_cache_coherency()

2022-04-09 Thread Lu Baolu

On 2022/4/8 16:05, Tian, Kevin wrote:

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d0014..1f930c0c225d94 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -540,6 +540,7 @@ struct dmar_domain {
u8 has_iotlb_device: 1;
u8 iommu_coherency: 1;  /* indicate coherency of
iommu access */
u8 iommu_snooping: 1;   /* indicate snooping control
feature */
+   u8 enforce_no_snoop : 1;/* Create IOPTEs with snoop control */

it reads like no_snoop is the result of the enforcement... Probably
force_snooping better matches the intention here.


+1

Other changes in iommu/vt-d looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu