[PATCH] iommu: add null pointer check

2022-03-28 Thread cgel . zte
From: Lv Ruyi 

kmem_cache_zalloc is a memory allocation function which can return NULL
when some internal memory errors happen. Add null pointer check to avoid
dereferencing null pointer.

Reported-by: Zeal Robot 
Signed-off-by: Lv Ruyi 
---
 drivers/iommu/fsl_pamu_domain.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 69a4a62dc3b9..43849c027298 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -152,6 +152,10 @@ static void attach_device(struct fsl_dma_domain 
*dma_domain, int liodn, struct d
}
 
info = kmem_cache_zalloc(iommu_devinfo_cache, GFP_ATOMIC);
+   if (!info) {
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return;
+   }
 
info->dev = dev;
info->liodn = liodn;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 11/11] iommu: Rename iommu-sva-lib.{c,h}

2022-03-28 Thread Lu Baolu
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/{iommu-sva-lib.h => iommu-sva.h}  | 0
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 drivers/iommu/intel/iommu.c | 2 +-
 drivers/iommu/intel/svm.c   | 2 +-
 drivers/iommu/io-pgfault.c  | 2 +-
 drivers/iommu/{iommu-sva-lib.c => iommu-sva.c}  | 2 +-
 drivers/iommu/iommu.c   | 2 +-
 drivers/iommu/Makefile  | 2 +-
 9 files changed, 8 insertions(+), 8 deletions(-)
 rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (100%)
 rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva.h
similarity index 100%
rename from drivers/iommu/iommu-sva-lib.h
rename to drivers/iommu/iommu-sva.h
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 5591321f9e11..a13543aa9df4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -9,7 +9,7 @@
 #include 
 
 #include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
 #include "../../io-pgtable-arm.h"
 
 struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 543d3ef1c102..ca2bd17eec41 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -31,7 +31,7 @@
 #include 
 
 #include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
 
 static bool disable_bypass = true;
 module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1d6a62512bca..57a3d54c414b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,7 @@
 #include 
 
 #include "../irq_remapping.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
 #include "pasid.h"
 #include "cap_audit.h"
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c76b99b46626..b6b83ca705d5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -25,7 +25,7 @@
 
 #include "pasid.h"
 #include "perf.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 static void intel_svm_drain_prq(struct device *dev, u32 pasid);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 159e4f107fe3..5468993c8c6c 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
 
 /**
  * struct iopf_queue - IO Page Fault queue
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva.c
similarity index 99%
rename from drivers/iommu/iommu-sva-lib.c
rename to drivers/iommu/iommu-sva.c
index a53574f9559a..dbe4b9958e48 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
 
 static DEFINE_MUTEX(iommu_sva_lock);
 static DECLARE_IOASID_SET(iommu_sva_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6b51ead9d63b..f4a5d6351421 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
 
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..c1763476162b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 10/11] iommu: Per-domain I/O page fault handling

2022-03-28 Thread Lu Baolu
Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. This also refactors the SVA implementation to be
the first consumer of the per-domain page fault handling model.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h |  4 ++
 drivers/iommu/iommu-sva-lib.h |  1 -
 drivers/iommu/io-pgfault.c| 70 +++---
 drivers/iommu/iommu-sva-lib.c | 71 +++
 4 files changed, 72 insertions(+), 74 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7e30b88d7bef..729d602fcba5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -51,6 +51,8 @@ struct iommu_sva_cookie;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);
+typedef enum iommu_page_response_code (*iommu_domain_iopf_handler_t)
+   (struct iommu_fault *, void *);
 
 struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped*/
@@ -102,6 +104,8 @@ struct iommu_domain {
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
struct iommu_sva_cookie *sva_cookie;
+   iommu_domain_iopf_handler_t fault_handler;
+   void *fault_data;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 95d51f748dff..abef83ac1b2d 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -8,7 +8,6 @@
 #include 
 #include 
 
-struct mm_struct *iommu_sva_find(ioasid_t pasid);
 struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain);
 
 /* I/O Page fault */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..159e4f107fe3 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,62 +69,6 @@ static int iopf_complete_group(struct device *dev, struct 
iopf_fault *iopf,
return iommu_page_response(dev, );
 }
 
-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
-   vm_fault_t ret;
-   struct mm_struct *mm;
-   struct vm_area_struct *vma;
-   unsigned int access_flags = 0;
-   unsigned int fault_flags = FAULT_FLAG_REMOTE;
-   struct iommu_fault_page_request *prm = >fault.prm;
-   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
-   if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
-   return status;
-
-   mm = iommu_sva_find(prm->pasid);
-   if (IS_ERR_OR_NULL(mm))
-   return status;
-
-   mmap_read_lock(mm);
-
-   vma = find_extend_vma(mm, prm->addr);
-   if (!vma)
-   /* Unmapped area */
-   goto out_put_mm;
-
-   if (prm->perm & IOMMU_FAULT_PERM_READ)
-   access_flags |= VM_READ;
-
-   if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
-   access_flags |= VM_WRITE;
-   fault_flags |= FAULT_FLAG_WRITE;
-   }
-
-   if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
-   access_flags |= VM_EXEC;
-   fault_flags |= FAULT_FLAG_INSTRUCTION;
-   }
-
-   if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
-   fault_flags |= FAULT_FLAG_USER;
-
-   if (access_flags & ~vma->vm_flags)
-   /* Access fault */
-   goto out_put_mm;
-
-   ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
-   status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
-   IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
-   mmap_read_unlock(mm);
-   mmput(mm);
-
-   return status;
-}
-
 static void iopf_handle_group(struct work_struct *work)
 {
struct iopf_group *group;
@@ -134,12 +78,24 @@ static void iopf_handle_group(struct work_struct *work)
group = container_of(work, struct iopf_group, work);
 
list_for_each_entry_safe(iopf, next, >faults, list) {
+   struct iommu_domain *domain;
+
+   domain = iommu_get_domain_for_dev_pasid(group->dev,
+   iopf->fault.prm.pasid);
+
+   if (!domain || !domain->fault_handler)
+   status = IOMMU_PAGE_RESP_INVALID;
+
/*
 * For the moment, errors are sticky: don't handle subsequent
 * faults in the group if there is an error.
 */
if (status == IOMMU_PAGE_RESP_SUCCESS)
-   status = iopf_handle_single(iopf);
+   status = domain->fault_handler(>fault,

[PATCH RFC v2 09/11] iommu: Remove SVA related callbacks from iommu ops

2022-03-28 Thread Lu Baolu
These ops'es have been replaced with the dev_attach/detach_pasid domain
ops'es. There's no need for them anymore. Remove them to avoid dead
code.

Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h   |  4 --
 include/linux/iommu.h |  8 ---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 17 ---
 drivers/iommu/iommu-sva-lib.h |  1 -
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 41 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 --
 drivers/iommu/intel/iommu.c   |  3 --
 drivers/iommu/intel/svm.c | 49 ---
 drivers/iommu/iommu-sva-lib.c |  4 +-
 9 files changed, 2 insertions(+), 128 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index c14283137fb5..7ce12590ce0f 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,10 +738,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 
*bus, u8 *devfn);
 extern void intel_svm_check(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
-void *drvdata);
-void intel_svm_unbind(struct iommu_sva *handle);
-u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
 extern const struct iommu_domain_ops intel_svm_domain_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 11c4d99e122d..7e30b88d7bef 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -213,9 +213,6 @@ struct iommu_iotlb_gather {
  * @dev_has/enable/disable_feat: per device entries to check/enable/disable
  *   iommu specific features.
  * @dev_feat_enabled: check enabled feature
- * @sva_bind: Bind process address space to device
- * @sva_unbind: Unbind process address space from device
- * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
  * @def_domain_type: device default domain type, return value:
  * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
@@ -249,11 +246,6 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
 
-   struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
-   void (*sva_unbind)(struct iommu_sva *handle);
-   u32 (*sva_get_pasid)(struct iommu_sva *handle);
-
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
 struct iommu_page_response *msg);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 7631c00fdcbd..2513309ec0db 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,10 +754,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
-void arm_smmu_sva_unbind(struct iommu_sva *handle);
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
 int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
  struct device *dev, ioasid_t id);
@@ -794,19 +790,6 @@ static inline bool arm_smmu_master_iopf_supported(struct 
arm_smmu_master *master
return false;
 }
 
-static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
-{
-   return ERR_PTR(-ENODEV);
-}
-
-static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
-
-static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
-   return IOMMU_PASID_INVALID;
-}
-
 static inline void arm_smmu_sva_notifier_synchronize(void) {}
 
 static inline int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 1a71218b07f5..95d51f748dff 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -8,7 +8,6 @@
 #include 
 #include 
 
-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
 struct mm_struct *iommu_sva_find(ioasid_t pasid);
 struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain);
 
diff --git 

[PATCH RFC v2 08/11] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-03-28 Thread Lu Baolu
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.

This refactors the IOMMU SVA interfaces implementation by using the
attach/detach_pasid_dev ops and align them with the concept of the
iommu domain. Put the new SVA code in the sva related file in order
to make it self-contained.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h |  51 +---
 drivers/iommu/iommu-sva-lib.c | 110 +-
 drivers/iommu/iommu.c |  92 
 3 files changed, 138 insertions(+), 115 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a46285488a57..11c4d99e122d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -629,7 +629,12 @@ struct iommu_fwspec {
  * struct iommu_sva - handle to a device-mm bond
  */
 struct iommu_sva {
-   struct device   *dev;
+   struct device   *dev;
+   ioasid_tpasid;
+   struct iommu_domain *domain;
+   /* Link to sva domain's bonds list */
+   struct list_headnode;
+   refcount_t  users;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
@@ -672,12 +677,6 @@ int iommu_dev_enable_feature(struct device *dev, enum 
iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
 bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
 
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
-   struct mm_struct *mm,
-   void *drvdata);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
-
 int iommu_device_use_default_domain(struct device *dev);
 void iommu_device_unuse_default_domain(struct device *dev);
 
@@ -1018,21 +1017,6 @@ iommu_dev_disable_feature(struct device *dev, enum 
iommu_dev_features feat)
return -ENODEV;
 }
 
-static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
-{
-   return NULL;
-}
-
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
-}
-
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
-   return IOMMU_PASID_INVALID;
-}
-
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
return NULL;
@@ -1085,6 +1069,29 @@ iommu_put_domain_for_dev_pasid(struct iommu_domain 
*domain)
 }
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_SVA
+struct iommu_sva *iommu_sva_bind_device(struct device *dev,
+   struct mm_struct *mm,
+   void *drvdata);
+void iommu_sva_unbind_device(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+#else /* CONFIG_IOMMU_SVA */
+static inline struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   return NULL;
+}
+
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+}
+
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+   return IOMMU_PASID_INVALID;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 /**
  * iommu_map_sgtable - Map the given buffer to the IOMMU domain
  * @domain:The IOMMU domain to perform the mapping
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 78820be23f15..1b45b7d01836 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -17,6 +17,7 @@ struct iommu_sva_cookie {
struct mm_struct *mm;
ioasid_t pasid;
refcount_t users;
+   struct list_head bonds;
 };
 
 /**
@@ -101,6 +102,7 @@ iommu_sva_alloc_domain(struct device *dev, struct mm_struct 
*mm)
cookie->mm = mm;
cookie->pasid = mm->pasid;
refcount_set(>users, 1);
+   INIT_LIST_HEAD(>bonds);
domain->type = IOMMU_DOMAIN_SVA;
domain->sva_cookie = cookie;
curr = xa_store(_domain_array, mm->pasid, domain, GFP_KERNEL);
@@ -118,6 +120,7 @@ iommu_sva_alloc_domain(struct device *dev, struct mm_struct 
*mm)
 static void iommu_sva_free_domain(struct iommu_domain *domain)
 {
xa_erase(_domain_array, domain->sva_cookie->pasid);
+   WARN_ON(!list_empty(>sva_cookie->bonds));
kfree(domain->sva_cookie);
domain->ops->free(domain);
 }
@@ -137,7 +140,7 @@ void iommu_sva_domain_put_user(struct iommu_domain *domain)
iommu_sva_free_domain(domain);
 }
 
-static __maybe_unused struct iommu_domain *
+static struct iommu_domain *
 iommu_sva_get_domain(struct device *dev, struct mm_struct *mm)
 {
struct iommu_domain *domain;
@@ -158,3 +161,108 @@ struct 

[PATCH RFC v2 07/11] arm-smmu-v3/sva: Add SVA domain support

2022-03-28 Thread Lu Baolu
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 14 +++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 42 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 21 ++
 3 files changed, 77 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..7631c00fdcbd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -759,6 +759,10 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, 
struct mm_struct *mm,
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
+int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id);
+void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t id);
 #else /* CONFIG_ARM_SMMU_V3_SVA */
 static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 {
@@ -804,5 +808,15 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
 }
 
 static inline void arm_smmu_sva_notifier_synchronize(void) {}
+
+static inline int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t id)
+{
+   return -ENODEV;
+}
+
+static inline void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+struct device *dev,
+ioasid_t id) {}
 #endif /* CONFIG_ARM_SMMU_V3_SVA */
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 22ddd05bbdcd..ce229820086d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -534,3 +534,45 @@ void arm_smmu_sva_notifier_synchronize(void)
 */
mmu_notifier_synchronize();
 }
+
+int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+   int ret = 0;
+   struct iommu_sva *handle;
+   struct mm_struct *mm = iommu_sva_domain_mm(domain);
+
+   if (domain->type != IOMMU_DOMAIN_SVA || !mm)
+   return -EINVAL;
+
+   mutex_lock(_lock);
+   handle = __arm_smmu_sva_bind(dev, mm);
+   if (IS_ERR(handle))
+   ret = PTR_ERR(handle);
+   mutex_unlock(_lock);
+
+   return ret;
+}
+
+void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t id)
+{
+   struct arm_smmu_bond *bond = NULL, *t;
+   struct mm_struct *mm = iommu_sva_domain_mm(domain);
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+   mutex_lock(_lock);
+   list_for_each_entry(t, >bonds, list) {
+   if (t->mm == mm) {
+   bond = t;
+   break;
+   }
+   }
+
+   if (!WARN_ON(!bond) && refcount_dec_and_test(>refs)) {
+   list_del(>list);
+   arm_smmu_mmu_notifier_put(bond->smmu_mn);
+   kfree(bond);
+   }
+   mutex_unlock(_lock);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index afc63fce6107..bd80de0bad98 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1995,10 +1995,31 @@ static bool arm_smmu_capable(enum iommu_cap cap)
}
 }
 
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+   kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+   .attach_dev_pasid   = arm_smmu_sva_attach_dev_pasid,
+   .detach_dev_pasid   = arm_smmu_sva_detach_dev_pasid,
+   .free   = arm_smmu_sva_domain_free,
+};
+
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
struct arm_smmu_domain *smmu_domain;
 
+   if (type == IOMMU_DOMAIN_SVA) {
+   struct iommu_domain *domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (domain)
+   domain->ops = _smmu_sva_domain_ops;
+
+   return domain;
+   }
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 06/11] iommu/vt-d: Add SVA domain support

2022-03-28 Thread Lu Baolu
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
---
 include/linux/intel-iommu.h |  1 +
 drivers/iommu/intel/iommu.c | 10 ++
 drivers/iommu/intel/svm.c   | 37 +
 3 files changed, 48 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d00..c14283137fb5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -744,6 +744,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
 u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+extern const struct iommu_domain_ops intel_svm_domain_ops;
 
 struct intel_svm_dev {
struct list_head list;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c1b91bce1530..5eae7cf9bee5 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4318,6 +4318,16 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return _domain->domain;
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   case IOMMU_DOMAIN_SVA:
+   dmar_domain = alloc_domain(type);
+   if (!dmar_domain)
+   return NULL;
+   domain = _domain->domain;
+   domain->ops = _svm_domain_ops;
+
+   return domain;
+#endif /* CONFIG_INTEL_IOMMU_SVM */
default:
return NULL;
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ee5ecde5b318..8f59dd641b2d 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -932,3 +932,40 @@ int intel_svm_page_response(struct device *dev,
mutex_unlock(_mutex);
return ret;
 }
+
+static int intel_svm_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct mm_struct *mm = iommu_sva_domain_mm(domain);
+   struct intel_iommu *iommu = info->iommu;
+   struct iommu_sva *sva;
+   int ret = 0;
+
+   mutex_lock(_mutex);
+   sva = intel_svm_bind_mm(iommu, dev, mm);
+   if (IS_ERR(sva))
+   ret = PTR_ERR(sva);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
+
+static void intel_svm_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid)
+{
+   mutex_lock(_mutex);
+   intel_svm_unbind_mm(dev, pasid);
+   mutex_unlock(_mutex);
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+   kfree(domain);
+}
+
+const struct iommu_domain_ops intel_svm_domain_ops = {
+   .attach_dev_pasid   = intel_svm_attach_dev_pasid,
+   .detach_dev_pasid   = intel_svm_detach_dev_pasid,
+   .free   = intel_svm_domain_free,
+};
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 05/11] iommu/vt-d: Remove SVM_FLAG_SUPERVISOR_MODE suport

2022-03-28 Thread Lu Baolu
The current in-kernel supervisor PASID support is based on the SVA
machinery in SVA lib. The binding between a kernel PASID and kernel
mapping has many flaws. Remove SVM_FLAG_SUPERVISOR_MODE support.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 53 +--
 1 file changed, 12 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d1c42dfae6ca..ee5ecde5b318 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned 
int pasid,
return 0;
 }
 
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
-unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
 {
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct 
mm_struct *mm,
 
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
-  struct mm_struct *mm,
-  unsigned int flags)
+  struct mm_struct *mm)
 {
struct device_domain_info *info = dev_iommu_priv_get(dev);
unsigned long iflags, sflags;
@@ -341,22 +339,18 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
 
svm->pasid = mm->pasid;
svm->mm = mm;
-   svm->flags = flags;
INIT_LIST_HEAD_RCU(>devs);
 
-   if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
-   svm->notifier.ops = _mmuops;
-   ret = mmu_notifier_register(>notifier, mm);
-   if (ret) {
-   kfree(svm);
-   return ERR_PTR(ret);
-   }
+   svm->notifier.ops = _mmuops;
+   ret = mmu_notifier_register(>notifier, mm);
+   if (ret) {
+   kfree(svm);
+   return ERR_PTR(ret);
}
 
ret = pasid_private_add(svm->pasid, svm);
if (ret) {
-   if (svm->notifier.ops)
-   mmu_notifier_unregister(>notifier, mm);
+   mmu_notifier_unregister(>notifier, mm);
kfree(svm);
return ERR_PTR(ret);
}
@@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
}
 
/* Setup the pasid table: */
-   sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
-   PASID_FLAG_SUPERVISOR_MODE : 0;
-   sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+   sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
sflags |= PASID_FLAG_PAGE_SNOOP;
spin_lock_irqsave(>lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
@@ -411,8 +403,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
kfree(sdev);
 free_svm:
if (list_empty(>devs)) {
-   if (svm->notifier.ops)
-   mmu_notifier_unregister(>notifier, mm);
+   mmu_notifier_unregister(>notifier, mm);
pasid_private_remove(mm->pasid);
kfree(svm);
}
@@ -822,37 +813,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-   unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
 
-   if (drvdata)
-   flags = *(unsigned int *)drvdata;
-
-   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
-   if (!ecap_srs(iommu->ecap)) {
-   dev_err(dev, "%s: Supervisor PASID not supported\n",
-   iommu->name);
-   return ERR_PTR(-EOPNOTSUPP);
-   }
-
-   if (mm) {
-   dev_err(dev, "%s: Supervisor PASID with user provided 
mm\n",
-   iommu->name);
-   return ERR_PTR(-EINVAL);
-   }
-
-   mm = _mm;
-   }
-
mutex_lock(_mutex);
-   ret = intel_svm_alloc_pasid(dev, mm, flags);
+   ret = intel_svm_alloc_pasid(dev, mm);
if (ret) {
mutex_unlock(_mutex);
return ERR_PTR(ret);
}
 
-   sva = intel_svm_bind_mm(iommu, dev, mm, 

[PATCH RFC v2 04/11] iommu: Add attach/detach_dev_pasid domain ops

2022-03-28 Thread Lu Baolu
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

 - SVA (Shared Virtual Address)
 - kernel DMA with PASID
 - hardware-assist mediated device

This adds a pair of common domain ops for this purpose and adds some
common helpers to attach/detach a domain to/from a {device, PASID} and
get/put the domain attached to {device, PASID}.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h | 36 ++
 drivers/iommu/iommu.c | 88 +++
 2 files changed, 124 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 29c4c2edd706..a46285488a57 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -269,6 +269,8 @@ struct iommu_ops {
  * struct iommu_domain_ops - domain specific operations
  * @attach_dev: attach an iommu domain to a device
  * @detach_dev: detach an iommu domain from a device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
  * @map: map a physically contiguous memory region to an iommu domain
  * @map_pages: map a physically contiguous set of pages of the same size to
  * an iommu domain.
@@ -286,6 +288,10 @@ struct iommu_ops {
 struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+   int (*attach_dev_pasid)(struct iommu_domain *domain,
+   struct device *dev, ioasid_t id);
+   void (*detach_dev_pasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t id);
 
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -679,6 +685,14 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, 
void *owner);
 void iommu_group_release_dma_owner(struct iommu_group *group);
 bool iommu_group_dma_owner_claimed(struct iommu_group *group);
 
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
+void iommu_put_domain_for_dev_pasid(struct iommu_domain *domain);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1047,6 +1061,28 @@ static inline bool iommu_group_dma_owner_claimed(struct 
iommu_group *group)
 {
return false;
 }
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid)
+{
+   return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid)
+{
+}
+
+static inline struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+   return NULL;
+}
+
+static inline void
+iommu_put_domain_for_dev_pasid(struct iommu_domain *domain)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9fb8a5b4491e..8163ad7f6902 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 
+#include "iommu-sva-lib.h"
+
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
@@ -38,6 +40,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+   struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -631,6 +634,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(>mutex);
INIT_LIST_HEAD(>devices);
INIT_LIST_HEAD(>entry);
+   xa_init(>pasid_array);
 
ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL);
if (ret < 0) {
@@ -3173,3 +3177,87 @@ bool iommu_group_dma_owner_claimed(struct iommu_group 
*group)
return user;
 }
 EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+   struct iommu_group *group;
+   int ret = -EINVAL;
+   void *curr;
+
+   if (!domain->ops->attach_dev_pasid)
+   return -EINVAL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return -ENODEV;
+
+   mutex_lock(>mutex);
+   /*
+* To keep things simple, we currently don't support IOMMU groups
+* with more than one device. Existing SVA-capable systems 

[PATCH RFC v2 03/11] iommu/sva: Add iommu_domain type for SVA

2022-03-28 Thread Lu Baolu
Add a new iommu domain type IOMMU_DOMAIN_SVA to represent an I/O page
table which is shared from CPU host VA. Add some helpers to get and
put an SVA domain and implement SVA domain life cycle management.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h |  7 +++
 drivers/iommu/iommu-sva-lib.h | 10 
 drivers/iommu/iommu-sva-lib.c | 89 +++
 3 files changed, 106 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 36f43af0af53..29c4c2edd706 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -42,6 +42,7 @@ struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
 struct iommu_dma_cookie;
+struct iommu_sva_cookie;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -64,6 +65,9 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_PT  (1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
 
+#define __IOMMU_DOMAIN_SHARED  (1U << 4)  /* Page table shared from CPU  */
+#define __IOMMU_DOMAIN_HOST_VA (1U << 5)  /* Host CPU virtual address */
+
 /*
  * This are the possible domain-types
  *
@@ -86,6 +90,8 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_DMA_FQ(__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SHARED |\
+__IOMMU_DOMAIN_HOST_VA)
 
 struct iommu_domain {
unsigned type;
@@ -95,6 +101,7 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+   struct iommu_sva_cookie *sva_cookie;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..1a71218b07f5 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -10,6 +10,7 @@
 
 int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
 struct mm_struct *iommu_sva_find(ioasid_t pasid);
+struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain);
 
 /* I/O Page fault */
 struct device;
@@ -26,6 +27,8 @@ int iopf_queue_flush_dev(struct device *dev);
 struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
+bool iommu_sva_domain_get_user(struct iommu_domain *domain);
+void iommu_sva_domain_put_user(struct iommu_domain *domain);
 
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
@@ -63,5 +66,12 @@ static inline int iopf_queue_discard_partial(struct 
iopf_queue *queue)
 {
return -ENODEV;
 }
+
+static inline bool iommu_sva_domain_get_user(struct iommu_domain *domain)
+{
+   return false;
+}
+
+static inline void iommu_sva_domain_put_user(struct iommu_domain *domain) { }
 #endif /* CONFIG_IOMMU_SVA */
 #endif /* _IOMMU_SVA_LIB_H */
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..78820be23f15 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -3,12 +3,21 @@
  * Helpers for IOMMU drivers implementing SVA
  */
 #include 
+#include 
+#include 
 #include 
 
 #include "iommu-sva-lib.h"
 
 static DEFINE_MUTEX(iommu_sva_lock);
 static DECLARE_IOASID_SET(iommu_sva_pasid);
+static DEFINE_XARRAY_ALLOC(sva_domain_array);
+
+struct iommu_sva_cookie {
+   struct mm_struct *mm;
+   ioasid_t pasid;
+   refcount_t users;
+};
 
 /**
  * iommu_sva_alloc_pasid - Allocate a PASID for the mm
@@ -69,3 +78,83 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+static struct iommu_domain *
+iommu_sva_alloc_domain(struct device *dev, struct mm_struct *mm)
+{
+   struct bus_type *bus = dev->bus;
+   struct iommu_sva_cookie *cookie;
+   struct iommu_domain *domain;
+   void *curr;
+
+   if (!bus || !bus->iommu_ops)
+   return NULL;
+
+   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   if (!cookie)
+   return NULL;
+
+   domain = bus->iommu_ops->domain_alloc(IOMMU_DOMAIN_SVA);
+   if (!domain)
+   goto err_domain_alloc;
+
+   cookie->mm = mm;
+   cookie->pasid = mm->pasid;
+   refcount_set(>users, 1);
+   domain->type = IOMMU_DOMAIN_SVA;
+   domain->sva_cookie = cookie;
+   curr = xa_store(_domain_array, mm->pasid, domain, GFP_KERNEL);
+   if (xa_err(curr))
+   goto err_xa_store;
+
+   return domain;
+err_xa_store:
+   domain->ops->free(domain);
+err_domain_alloc:
+   kfree(cookie);
+   return NULL;
+}
+
+static void iommu_sva_free_domain(struct 

[PATCH RFC v2 02/11] iommu: Add iommu_group_singleton_lockdown()

2022-03-28 Thread Lu Baolu
Some of the interfaces in the IOMMU core require that only a single
kernel device driver controls the device in the IOMMU group. The
existing method is to check the device count in the IOMMU group in
the interfaces. This is unreliable because any device added to the
IOMMU group later breaks this assumption without notifying the driver
using the interface. This adds iommu_group_singleton_lockdown() that
checks the requirement and locks down the IOMMU group with only single
device driver bound.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c42ece25854..9fb8a5b4491e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,7 @@ struct iommu_group {
struct list_head entry;
unsigned int owner_cnt;
void *owner;
+   bool singleton_lockdown;
 };
 
 struct group_device {
@@ -968,15 +969,16 @@ void iommu_group_remove_device(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_group_remove_device);
 
-static int iommu_group_device_count(struct iommu_group *group)
+/* Callers should hold the group->mutex. */
+static bool iommu_group_singleton_lockdown(struct iommu_group *group)
 {
-   struct group_device *entry;
-   int ret = 0;
-
-   list_for_each_entry(entry, >devices, list)
-   ret++;
+   if (group->owner_cnt != 1 ||
+   group->domain != group->default_domain ||
+   group->owner)
+   return false;
+   group->singleton_lockdown = true;
 
-   return ret;
+   return true;
 }
 
 static int __iommu_group_for_each_dev(struct iommu_group *group, void *data,
@@ -1936,7 +1938,7 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 */
mutex_lock(>mutex);
ret = -EINVAL;
-   if (iommu_group_device_count(group) != 1)
+   if (!iommu_group_singleton_lockdown(group))
goto out_unlock;
 
ret = __iommu_attach_group(domain, group);
@@ -1979,7 +1981,7 @@ void iommu_detach_device(struct iommu_domain *domain, 
struct device *dev)
return;
 
mutex_lock(>mutex);
-   if (iommu_group_device_count(group) != 1) {
+   if (!iommu_group_singleton_lockdown(group)) {
WARN_ON(1);
goto out_unlock;
}
@@ -2745,7 +2747,7 @@ iommu_sva_bind_device(struct device *dev, struct 
mm_struct *mm, void *drvdata)
 * affected by the problems that required IOMMU groups (lack of ACS
 * isolation, device ID aliasing and other hardware issues).
 */
-   if (iommu_group_device_count(group) != 1)
+   if (!iommu_group_singleton_lockdown(group))
goto out_unlock;
 
handle = ops->sva_bind(dev, mm, drvdata);
@@ -2842,7 +2844,7 @@ static int iommu_change_dev_def_domain(struct iommu_group 
*group,
 * case) that has already acquired some of the device locks and might be
 * waiting for T1 to release other device locks.
 */
-   if (iommu_group_device_count(group) != 1) {
+   if (!iommu_group_singleton_lockdown(group)) {
dev_err_ratelimited(prev_dev, "Cannot change default domain: 
Group has more than one device\n");
ret = -EINVAL;
goto out;
@@ -2975,7 +2977,7 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
 * 2. Get struct *dev which is needed to lock device
 */
mutex_lock(>mutex);
-   if (iommu_group_device_count(group) != 1) {
+   if (!iommu_group_singleton_lockdown(group)) {
mutex_unlock(>mutex);
pr_err_ratelimited("Cannot change default domain: Group has 
more than one device\n");
return -EINVAL;
@@ -3050,6 +3052,7 @@ int iommu_device_use_default_domain(struct device *dev)
mutex_lock(>mutex);
if (group->owner_cnt) {
if (group->domain != group->default_domain ||
+   group->singleton_lockdown ||
group->owner) {
ret = -EBUSY;
goto unlock_out;
@@ -3084,6 +3087,9 @@ void iommu_device_unuse_default_domain(struct device *dev)
if (!WARN_ON(!group->owner_cnt))
group->owner_cnt--;
 
+   if (!group->owner_cnt)
+   group->singleton_lockdown = false;
+
mutex_unlock(>mutex);
iommu_group_put(group);
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 01/11] iommu: Add pasid_bits field in struct dev_iommu

2022-03-28 Thread Lu Baolu
Use this field to save the pasid/ssid bits that a device is able to
support with its IOMMU hardware. It is a generic attribute of a device
and lifting it into the per-device dev_iommu struct makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver which suports PASID related features should set this
field before features are enabled on the devices.

Signed-off-by: Lu Baolu 
---
 include/linux/iommu.h   | 1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
 drivers/iommu/intel/iommu.c | 5 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6ef2df258673..36f43af0af53 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -368,6 +368,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
+   unsigned intpasid_bits;
 };
 
 int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..afc63fce6107 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2681,6 +2681,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
master->stall_enabled = true;
 
+   dev->iommu->pasid_bits = master->ssid_bits;
+
return >iommu;
 
 err_free_master:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6f7485c44a4b..c1b91bce1530 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4587,8 +4587,11 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
 
-   if (features >= 0)
+   if (features >= 0) {
info->pasid_supported = features | 1;
+   dev->iommu->pasid_bits =
+   fls(pci_max_pasids(pdev)) - 1;
+   }
}
 
if (info->ats_supported && ecap_prs(iommu->ecap) &&
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v2 00/11] iommu: SVA and IOPF refactoring

2022-03-28 Thread Lu Baolu
Hi,

The former part of this series refactors the IOMMU SVA code by assigning
an SVA type of iommu_domain to a shared virtual address and replacing
sva_bind/unbind iommu ops with attach/detach_dev_pasid domain ops.

The latter part changes the existing I/O page fault handling framework
from only serving SVA to a generic one. Any driver or component could
handle the I/O page faults for its domain in its own way by installing
an I/O page fault handler.

This series overlaps with another series posted here [1]. For the
convenience of review, I included all relevant patches in this series.
We will solve the overlap problem later.

This series is also available on github here [2].

[1] 
https://lore.kernel.org/lkml/20220315050713.2000518-1-jacob.jun@linux.intel.com/
[2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v2

Please help review and suggest.

Best regards,
baolu

Change log:
v1:
 - 
https://lore.kernel.org/linux-iommu/20220320064030.2936936-1-baolu...@linux.intel.com/
 - Initial post.

v2:
 - Add sva domain life cycle management to avoid race between unbind and
   page fault handling.
 - Use a single domain for each mm.
 - Return a single sva handler for the same binding.
 - Add a new helper to meet singleton group requirement.
 - Rework the SVA domain allocation for arm smmu v3 driver and move the
   pasid_bit initialization to device probe.
 - Drop the patch "iommu: Handle IO page faults directly".
 - Add mmget_not_zero(mm) in SVA page fault handler.

Lu Baolu (11):
  iommu: Add pasid_bits field in struct dev_iommu
  iommu: Add iommu_group_singleton_lockdown()
  iommu/sva: Add iommu_domain type for SVA
  iommu: Add attach/detach_dev_pasid domain ops
  iommu/vt-d: Remove SVM_FLAG_SUPERVISOR_MODE suport
  iommu/vt-d: Add SVA domain support
  arm-smmu-v3/sva: Add SVA domain support
  iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
  iommu: Remove SVA related callbacks from iommu ops
  iommu: Per-domain I/O page fault handling
  iommu: Rename iommu-sva-lib.{c,h}

 include/linux/intel-iommu.h   |   5 +-
 include/linux/iommu.h | 107 --
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  25 +-
 .../iommu/{iommu-sva-lib.h => iommu-sva.h}|  12 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  85 ++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  28 +-
 drivers/iommu/intel/iommu.c   |  20 +-
 drivers/iommu/intel/svm.c | 135 +++-
 drivers/iommu/io-pgfault.c|  72 +---
 drivers/iommu/iommu-sva-lib.c |  71 
 drivers/iommu/iommu-sva.c | 307 ++
 drivers/iommu/iommu.c | 208 ++--
 drivers/iommu/Makefile|   2 +-
 13 files changed, 655 insertions(+), 422 deletions(-)
 rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (80%)
 delete mode 100644 drivers/iommu/iommu-sva-lib.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd

2022-03-28 Thread Jason Wang
On Mon, Mar 28, 2022 at 8:23 PM Jason Gunthorpe  wrote:
>
> On Mon, Mar 28, 2022 at 09:53:27AM +0800, Jason Wang wrote:
> > To me, it looks more easier to not answer this question by letting
> > userspace know about the change,
>
> That is not backwards compatbile, so I don't think it helps unless we
> say if you open /dev/vfio/vfio you get old behavior and if you open
> /dev/iommu you get new...

Actually, this is one way to go. Trying to behave exactly like typ1
might be not easy.

>
> Nor does it answer if I can fix RDMA or not :\
>

vDPA has a backend feature negotiation, then actually, userspace can
tell vDPA to go with the new accounting approach. Not sure RDMA can do
the same.

Thanks

> So we really do need to know what exactly is the situation here.
>
> Jason
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch v1] iommu: arm-smmu: Use arm-smmu-nvidia impl for Tegra234

2022-03-28 Thread Ashish Mhetre via iommu
Tegra234 has 2 pairs of ARM MMU-500 instances. Each pair is used
together and should be programmed identically.
Add compatible string of Tegra234 iommu nodes in arm_smmu_impl_init()
so that arm-smmu-nvidia implementation will be used for programming
these SMMU instances.

Signed-off-by: Ashish Mhetre 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 2c25cce38060..658f3cc83278 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -211,7 +211,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
-   if (of_device_is_compatible(np, "nvidia,tegra194-smmu") ||
+   if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
+   of_device_is_compatible(np, "nvidia,tegra194-smmu") ||
of_device_is_compatible(np, "nvidia,tegra186-smmu"))
return nvidia_smmu_impl_init(smmu);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug report: VFIO map/unmep mem subject to race and DMA data goes to incorrect page (4.18.0)

2022-03-28 Thread Alex Williamson
On Mon, 28 Mar 2022 12:14:51 -0700
"Daniel F. Smith"  wrote:

> Hi Alex,
> 
> Answers to questions I can answer are in-line.  First an apology
> though---the machine with the FPGA board is 1000 miles remote, and I don't
> have root access.  It's unlikely I will be able to do kernel patch testing.
> 
> 
> Alex Williamson scribed the following, on or around Fri, Mar 25, 2022 at 
> 04:10:22PM -0600:
> > Hi Daniel,
> >   
> ...
> >
> > Coherency possibly.
> > 
> > There's a possible coherency issue at the compare depending on the
> > IOMMU capabilities which could affect whether DMA is coherent to memory
> > or requires an explicit flush.  I'm a little suspicious whether dmar0
> > is really the IOMMU controlling this device since you mention a 39bit
> > IOVA space, which is more typical of Intel client platforms which can
> > also have integrated graphics which often have a dedicated IOMMU at
> > dmar0 that isn't necessarily representative of the other IOMMUs in the
> > system, especially with regard to snoop-control.  Each dmar lists the
> > managed devices under it in sysfs to verify.  Support for snoop-control
> > would be identified in the ecap register rather than the cap register.
> > VFIO can also report coherency via the VFIO_DMA_CC_IOMMU extension
> > reported by VFIO_CHECK_EXTENSION ioctl.  
> 
> $ cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap
> d2008c40660462
> $ cat /sys/devices/virtual/iommu/dmar0/intel-iommu/ecap
> f050da
> $ lscpu | grep Model
> Model:   165
> Model name:  Intel(R) Xeon(R) W-1290P CPU @ 3.70GHz
> $ ls -l /sys/devices/virtual/iommu/dmar0/devices | wc -l
> 24
> $ ... ioctl(container_fd, VFIO_CHECK_EXTENSION, VFIO_DMA_CC_IOMMU)
> 0

Your ecap register reports bit 7 (Snoop Control) set, which should mean
that VT-d is enforcing coherency regardless of no-snoop transactions.
I suspect maybe the different result from the ioctl could be from
testing this extension before the IOMMU has been set for the
container(?)

> What are the implications of having no "IOMMU enforces DMA cache
> conherence"?  On this machine there is no access to a PCIe bus analyzer, but
> it's very unlikely that the TLPs would have NoSnoop set.

There's also bit 11 (Enable No Snoop) that could be cleared in the PCI
device control register, which would theoretically prevent the device
from using no-snoop TLPs.
 
> Is there a good way How can I tell what IOMMU I'm using?

Which DMAR?  Like this for example:

$ readlink -f /sys/bus/pci/devices/:04:00.0/iommu
/sys/devices/virtual/iommu/dmar3

Your listing of devices piped to wc would also reciprocally list the
device in that output.  With 24 devices there's a fair chance that
dmar0 is the only one used.

> (I did think it was strange that the IOMMU in this machine cannot handle
> enough bits for mapping IOVA==VMA.  The test code is running in a podman
> container, but (naively) I wouldn't expect that to make a difference.)

Single socket Xeon processors like this tend to share more similarities
to consumer desktop processors than to the "Scalable" line of Xeons.

FWIW, there's a proposal[1] for a new, shared userspace IOMMU interface
that includes an option for the kernel to allocate IOVAs for these
cases.

[1]https://lore.kernel.org/all/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com/

> > However, CPU coherency might lead to a miscompare, but not necessarily a
> > miscompare matching the previous iteration.  Still, for completeness
> > let's make sure this isn't a gap in the test programming making invalid
> > assumptions about CPU/DMA coherency.
> > 
> > The fact that randomizing the IOVA provides a workaround though might
> > suggest something relative to the IOMMU page table coherency.  But for
> > the new mmap target to have the data from the previous iteration, the
> > IOMMU PTE would need to be stale on read, but correct on write in order
> > to land back in your new mmap.  That seems peculiar.  Are we sure the
> > FPGA device isn't caching the value at the IOVA or using any sort of
> > IOTLB caching such as ATS that might not be working correctly?  
> 
> I cannot say for certain what the FPGA caches, if anything.  The IP for that
> part is closed (search for Xilinx PG302 QDMA).  It should (!) be
> well-tested... oh for an analyzer!
> 
> > > Suggestion: Document issue when using fixed IOVA, or fix if security
> > > is a concern.  
> > 
> > I don't know that there's enough information here to make any
> > conclusions.  Here are some further questions:
> > 
> >  * What size mappings are being used, both for the mmap and the VFIO
> >MAP/UNMAP operations.  
> 
> The test would often fail switching from an 8KB allocation to 12KB where the
> VMA would grow down by a page.  The mmap() always returned a 4KB aligned
> VMA, and the requested mmap() size was always an exact number of 4KB pages. 
> The VFIO map operations were always on the full extent of the mmap'd memory
> (likely makes Baulu's patch moot in this case).
> 
> A typical (not 

Re: [PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-28 Thread Jacob Pan
Hi BaoLu,

On Fri, 18 Mar 2022 20:43:54 +0800, Lu Baolu 
wrote:

> On 2022/3/15 13:07, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/dma-iommu.c | 65 +++
> >   include/linux/dma-iommu.h |  7 +
> >   include/linux/iommu.h |  9 ++
> >   3 files changed, 81 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index b22034975301..d0ff1a34b1b6 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> > IOMMU_DMA_MSI_COOKIE,
> >   };
> >   
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >   struct iommu_dma_cookie {
> > enum iommu_dma_cookie_type  type;
> > union {
> > @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> >   }
> >   
> > +/**
> > + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> > + * @dev:   Device to be enabled
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA map, the supervisor PASID will point to the same
> > IOVA
> > + * page table.
> > + *
> > + * @return the kernel PASID to be used for DMA or INVALID_IOASID on
> > failure  
> 
> The comment on the return value should be rephrased according to the
> real code.
> 
yes, will do.

> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +   max = iommu_get_dev_pasid_max(dev);
> > +   if (!max)
> > +   return -EINVAL;
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID)
> > +   return -ENOMEM;
> > +
> > +   ret = dom->ops->attach_dev_pasid(dom, dev, id);
> > +   if (ret) {
> > +   ioasid_put(id);
> > +   return ret;
> > +   }
> > +   *pasid = id;
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_enable_pasid_dma);
> > +
> > +/**
> > + * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
> > + * @dev:   Device's PASID DMA to be disabled
> > + *
> > + * It is the device driver's responsibility to ensure no more incoming
> > DMA
> > + * requests with the kernel PASID before calling this function. IOMMU
> > driver
> > + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared
> > and
> > + * drained.
> > + *
> > + * @return 0 on success or error code on failure  
> 
> Ditto.
> 
same

> > + */
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +
> > +   /* TODO: check the given PASID is within the ioasid_set */
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom->ops->detach_dev_pasid)
> > +   return;
> > +   dom->ops->detach_dev_pasid(dom, dev, pasid);
> > +   ioasid_put(pasid);
> > +}
> > +EXPORT_SYMBOL(iommu_disable_pasid_dma);
> > +
> >   /**
> >* iommu_dma_get_resv_regions - Reserved region driver helper
> >* @dev: Device from iommu_get_resv_regions()
> > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> > index 24607dc3c2ac..e6cb9b52a420 100644
> > --- a/include/linux/dma-iommu.h
> > +++ b/include/linux/dma-iommu.h
> > @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain
> > *domain); int iommu_get_msi_cookie(struct iommu_domain *domain,
> > dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain
> > *domain); 
> > +/*
> > + * For devices that can do DMA request with PASID, setup a system
> > PASID.
> > + * Address modes (IOVA, PA) are selected by the platform code.
> > + */  
> 
> No need for a comment here. Or move it to the function if need.
> 
right, this comment is obsolete. will remove.

> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> > +void 

Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-28 Thread Jacob Pan
Hi Kevin,

On Fri, 18 Mar 2022 05:33:38 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan
> > Sent: Thursday, March 17, 2022 5:02 AM
> > 
> > Hi Kevin,
> > 
> > On Wed, 16 Mar 2022 07:41:34 +, "Tian, Kevin" 
> > wrote:
> >   
> > > > From: Jason Gunthorpe 
> > > > Sent: Tuesday, March 15, 2022 10:33 PM
> > > >
> > > > On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:  
> > > > > + /*
> > > > > +  * Each domain could have multiple devices attached with
> > > > > shared or  
> > > > per  
> > > > > +  * device PASIDs. At the domain level, we keep track of
> > > > > unique PASIDs  
> > > > and  
> > > > > +  * device user count.
> > > > > +  * E.g. If a domain has two devices attached, device A
> > > > > has PASID 0, 1;
> > > > > +  * device B has PASID 0, 2. Then the domain would have
> > > > > PASID 0, 1, 2.
> > > > > +  */  
> > > >
> > > > A 2d array of xarray's seems like a poor data structure for this
> > > > task.  
> > >  
> > Perhaps i mis-presented here, I am not using 2D array. It is an 1D
> > xarray for domain PASIDs only. Then I use the existing device list in
> > each domain, adding another xa to track per-device-domain PASIDs.  
> > > besides that it also doesn't work when we support per-device PASID
> > > allocation in the future. In that case merging device PASIDs together
> > > is conceptually wrong.
> > >  
> > Sorry, could you elaborate? If we do per-dev PASID allocation, we could
> > use the ioasid_set for each pdev, right?  
> 
> My point is simply about the comment above which says the domain
> will have PASID 0, 1, 2 when there is [devA, PASID0] and [devB, PASID0].
> You can maintain a single  PASID list only when it's globally allocated
> cross devices. otherwise this has to be a tuple including device and
> PASID.
> 
Got you, you are right we don't want to limit to globally allocated scheme.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

2022-03-28 Thread Alex Williamson
On Mon, 28 Mar 2022 16:47:49 -0300
Jason Gunthorpe  wrote:

> On Mon, Mar 28, 2022 at 03:57:53PM -0300, Jason Gunthorpe wrote:
> 
> > So, currently AMD and Intel have exactly the same HW feature with a
> > different kAPI..  
> 
> I fixed it like below and made the ordering changes Kevin pointed
> to. Will send next week after the merge window:
> 
> 527e438a974a06 iommu: Delete IOMMU_CAP_CACHE_COHERENCY
> 5cbc8603ffdf20 vfio: Move the Intel no-snoop control off of IOMMU_CACHE
> ebc961f93d1af3 iommu: Introduce the domain op enforce_cache_coherency()
> 79c52a2bb1e60b vfio: Require that devices support DMA cache coherence
> 02168f961b6a75 iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with 
> dev_is_dma_coherent()
> 
> '79c can be avoided, we'd just drive IOMMU_CACHE off of
> dev_is_dma_coherent() - but if we do that I'd like to properly
> document the arch/iommu/platform/kvm combination that is using this..

We can try to enforce dev_is_dma_coherent(), as you note it's not going
to affect any x86 users.  arm64 is the only obviously relevant arch that
defines ARCH_HAS_SYNC_DMA_FOR_{DEVICE,CPU} but the device.dma_coherent
setting comes from ACPI/OF firmware, so someone from ARM land will need
to shout if this is an issue.  I think we'd need to back off and go
with documentation if a broken use case shows up.  Thanks,

Alex

 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 3c0ac3c34a7f9a..f144eb9fea8e31 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2269,6 +2269,12 @@ static int amd_iommu_def_domain_type(struct device 
> *dev)
>   return 0;
>  }
>  
> +static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> +{
> + /* IOMMU_PTE_FC is always set */
> + return true;
> +}
> +
>  const struct iommu_ops amd_iommu_ops = {
>   .capable = amd_iommu_capable,
>   .domain_alloc = amd_iommu_domain_alloc,
> @@ -2291,6 +2297,7 @@ const struct iommu_ops amd_iommu_ops = {
>   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
>   .iotlb_sync = amd_iommu_iotlb_sync,
>   .free   = amd_iommu_domain_free,
> + .enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
>   }
>  };
> 
> Thanks,
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

2022-03-28 Thread Jason Gunthorpe via iommu
On Mon, Mar 28, 2022 at 03:57:53PM -0300, Jason Gunthorpe wrote:

> So, currently AMD and Intel have exactly the same HW feature with a
> different kAPI..

I fixed it like below and made the ordering changes Kevin pointed
to. Will send next week after the merge window:

527e438a974a06 iommu: Delete IOMMU_CAP_CACHE_COHERENCY
5cbc8603ffdf20 vfio: Move the Intel no-snoop control off of IOMMU_CACHE
ebc961f93d1af3 iommu: Introduce the domain op enforce_cache_coherency()
79c52a2bb1e60b vfio: Require that devices support DMA cache coherence
02168f961b6a75 iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with 
dev_is_dma_coherent()

'79c can be avoided, we'd just drive IOMMU_CACHE off of
dev_is_dma_coherent() - but if we do that I'd like to properly
document the arch/iommu/platform/kvm combination that is using this..

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3c0ac3c34a7f9a..f144eb9fea8e31 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2269,6 +2269,12 @@ static int amd_iommu_def_domain_type(struct device *dev)
return 0;
 }
 
+static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+{
+   /* IOMMU_PTE_FC is always set */
+   return true;
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -2291,6 +2297,7 @@ const struct iommu_ops amd_iommu_ops = {
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
.iotlb_sync = amd_iommu_iotlb_sync,
.free   = amd_iommu_domain_free,
+   .enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
}
 };

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug report: VFIO map/unmep mem subject to race and DMA data goes to incorrect page (4.18.0)

2022-03-28 Thread Daniel F. Smith
Hi Alex,

Answers to questions I can answer are in-line.  First an apology
though---the machine with the FPGA board is 1000 miles remote, and I don't
have root access.  It's unlikely I will be able to do kernel patch testing.


Alex Williamson scribed the following, on or around Fri, Mar 25, 2022 at 
04:10:22PM -0600:
> Hi Daniel,
> 
...
>
> Coherency possibly.
> 
> There's a possible coherency issue at the compare depending on the
> IOMMU capabilities which could affect whether DMA is coherent to memory
> or requires an explicit flush.  I'm a little suspicious whether dmar0
> is really the IOMMU controlling this device since you mention a 39bit
> IOVA space, which is more typical of Intel client platforms which can
> also have integrated graphics which often have a dedicated IOMMU at
> dmar0 that isn't necessarily representative of the other IOMMUs in the
> system, especially with regard to snoop-control.  Each dmar lists the
> managed devices under it in sysfs to verify.  Support for snoop-control
> would be identified in the ecap register rather than the cap register.
> VFIO can also report coherency via the VFIO_DMA_CC_IOMMU extension
> reported by VFIO_CHECK_EXTENSION ioctl.

$ cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap
d2008c40660462
$ cat /sys/devices/virtual/iommu/dmar0/intel-iommu/ecap
f050da
$ lscpu | grep Model
Model:   165
Model name:  Intel(R) Xeon(R) W-1290P CPU @ 3.70GHz
$ ls -l /sys/devices/virtual/iommu/dmar0/devices | wc -l
24
$ ... ioctl(container_fd, VFIO_CHECK_EXTENSION, VFIO_DMA_CC_IOMMU)
0

What are the implications of having no "IOMMU enforces DMA cache
conherence"?  On this machine there is no access to a PCIe bus analyzer, but
it's very unlikely that the TLPs would have NoSnoop set.

Is there a good way How can I tell what IOMMU I'm using?

(I did think it was strange that the IOMMU in this machine cannot handle
enough bits for mapping IOVA==VMA.  The test code is running in a podman
container, but (naively) I wouldn't expect that to make a difference.)

> However, CPU coherency might lead to a miscompare, but not necessarily a
> miscompare matching the previous iteration.  Still, for completeness
> let's make sure this isn't a gap in the test programming making invalid
> assumptions about CPU/DMA coherency.
> 
> The fact that randomizing the IOVA provides a workaround though might
> suggest something relative to the IOMMU page table coherency.  But for
> the new mmap target to have the data from the previous iteration, the
> IOMMU PTE would need to be stale on read, but correct on write in order
> to land back in your new mmap.  That seems peculiar.  Are we sure the
> FPGA device isn't caching the value at the IOVA or using any sort of
> IOTLB caching such as ATS that might not be working correctly?

I cannot say for certain what the FPGA caches, if anything.  The IP for that
part is closed (search for Xilinx PG302 QDMA).  It should (!) be
well-tested... oh for an analyzer!

> > Suggestion: Document issue when using fixed IOVA, or fix if security
> > is a concern.
> 
> I don't know that there's enough information here to make any
> conclusions.  Here are some further questions:
> 
>  * What size mappings are being used, both for the mmap and the VFIO
>MAP/UNMAP operations.

The test would often fail switching from an 8KB allocation to 12KB where the
VMA would grow down by a page.  The mmap() always returned a 4KB aligned
VMA, and the requested mmap() size was always an exact number of 4KB pages. 
The VFIO map operations were always on the full extent of the mmap'd memory
(likely makes Baulu's patch moot in this case).

A typical (not consistent) syndrome would be:
  1st page: ok
  2nd page: previous mmap'd data.
  3rd page: ok
We saw the issue on transfers both to and from the card.  I.e., we placed a
memory block in the FPGA that we could interrogate when data were corrupted.

(And as mentioned, just changing the IOVA fixed this issue.)

>  * If the above is venturing into super page support (2MB), does the
>vfio_iommu_type1 module option disable_hugepages=1 affect the
>results.

N/A.

>  * Along the same lines, does the kernel command line option
>intel_iommu=sp_off produce different results.

Would this affect small pages?

>  * Does this behavior also occur on upstream kernels (ie. v5.17)?

Unknown, and (unfortunately) untestable at present.

>  * Do additional CPU cache flushes in the test program produce different
>results?

We did a number of experiments using combinations of MAP_LOCKED, mlock(),
barrier(), _mm_clflush().  They all affected reliability of the test
(through timing?), but all ultimately failed.  I'm happy to try other
flushes that can be achieved in non-root user space!

>  * Is this a consumer available FPGA device that others might be able
>to reproduce this issue?  I've always wanted such a device for
>testing, but also we can't rule out that the FPGA itself or its
>programming is the source 

Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

2022-03-28 Thread Jason Gunthorpe via iommu
On Mon, Mar 28, 2022 at 11:17:23AM -0600, Alex Williamson wrote:
> On Thu, 24 Mar 2022 10:46:22 -0300
> Jason Gunthorpe  wrote:
> 
> > On Thu, Mar 24, 2022 at 07:25:03AM +, Tian, Kevin wrote:
> > 
> > > Based on that here is a quick tweak of the force-snoop part (not 
> > > compiled).  
> > 
> > I liked your previous idea better, that IOMMU_CAP_CACHE_COHERENCY
> > started out OK but got weird. So lets fix it back to the way it was.
> > 
> > How about this:
> > 
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjgunthorpe%2Flinux%2Fcommits%2Fintel_no_snoopdata=04%7C01%7Cjgg%40nvidia.com%7C9d34426f1c1646af43a608da10ded6b5%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637840846514240225%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2ByHWyE8Yxcwxe8r8LoMQD9tPh5%2FZPaGfNsUkMlpRfWM%3Dreserved=0
> > 
> > b11c19a4b34c2a iommu: Move the Intel no-snoop control off of IOMMU_CACHE
> > 5263947f9d5f36 vfio: Require that device support DMA cache coherence
> 
> I have some issues with the argument here:
> 
>   This will block device/platform/iommu combinations that do not
>   support cache coherent DMA - but these never worked anyhow as VFIO
>   did not expose any interface to perform the required cache
>   maintenance operations.
> 
> VFIO never intended to provide such operations, it only tried to make
> the coherence of the device visible to userspace such that it can
> perform operations via other means, for example via KVM.  The "never
> worked" statement here seems false.

VFIO is generic. I expect if DPDK connects to VFIO then it will work
properly. That is definitely not the case today when
dev_is_dma_coherent() is false. This is what the paragraph is talking
about.

Remember, x86 wires dev_is_dma_coherent() to true, so this above
remark is not related to anything about x86.

We don't have a way in VFIO to negotiate that 'vfio can only be used
with kvm' so I hope no cases like that really do exist :( Do you know
of any?

> Commit b11c19a4b34c2a also appears to be a behavioral change.  AIUI
> vfio_domain.enforce_cache_coherency would only be set on Intel VT-d
> where snoop-control is supported, this translates to KVM emulating
> coherency instructions everywhere except VT-d w/ snoop-control.

It seems so.

> My understanding of AMD-Vi is that no-snoop TLPs are always coherent, so
> this would trigger unnecessary wbinvd emulation on those platforms.  

I look in the AMD manual and it looks like it works the same as intel
with a dedicated IOPTE bit:

  #define IOMMU_PTE_FC (1ULL << 60)

https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf Pg 79:

 FC: Force Coherent. Software uses the FC bit in the PTE to indicate
 the source of the upstream coherent attribute state for an
 untranslated DMA transaction.1 = the IOMMU sets the coherent attribute
 state in the upstream request. 0 = the IOMMU passes on the coherent
 attribute state from the originating request. Device internal
 address/page table translations are considered "untranslated accesses"
 by IOMMU.The FC state is returned in the ATS response to the device
 endpoint via the state of the (N)oSnoop bit.

So, currently AMD and Intel have exactly the same HW feature with a
different kAPI..

I would say it is wrong that AMD creates kernel owned domains for the
DMA-API to use that do not support snoop.

> don't know if other archs need similar, but it seems we're changing
> polarity wrt no-snoop TLPs from "everyone is coherent except this case
> on Intel" to "everyone is non-coherent except this opposite case on
> Intel".

Yes. We should not assume no-snoop blocking is a HW feature without
explicit knowledge that it is.

>From a kAPI compat perspective IOMMU_CAP_CACHE_COHERENCY
only has two impacts:
 - Only on x86 arch it controls kvm_arch_register_noncoherent_dma()
 - It triggers IOMMU_CACHE

If we look at the list of places where IOMMU_CAP_CACHE_COHERENCY is set:

 drivers/iommu/intel/iommu.c
   Must have IOMMU_CACHE set/clear to control no-snoop blocking

 drivers/iommu/amd/iommu.c
   Always sets its no-snoop block, inconsistent with Intel

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 drivers/iommu/arm/arm-smmu/arm-smmu.c
 drivers/iommu/arm/arm-smmu/qcom_iommu.c
   Must have IOMMU_CACHE set, ARM arch has no
   kvm_arch_register_noncoherent_dma()

   From what I could tell in the manuals and the prior discussion
   SMMU doesn't block no-snoop.

   ie ARM lies about IOMMU_CAP_CACHE_COHERENCY because it needs
   IOMM_CACHE set to work.

 drivers/iommu/fsl_pamu_domain.c
 drivers/iommu/s390-iommu.c
   Ignore IOMM_CACHE, arch has no kvm_arch_register_noncoherent_dma()

   No idea if the HW blocks no-snoop or not, but it doesn't matter.

So other than AMD, it is OK to change the sense and makes it clearer
for future driver authors what they are expected to do with this.

Thanks,
Jason
___
iommu mailing list

[PATCH V3 1/2] Documentation: x86: Add documentation for AMD IOMMU

2022-03-28 Thread Alex Deucher via iommu
Add preliminary documentation for AMD IOMMU.

Signed-off-by: Alex Deucher 
---

V2: Incorporate feedback from Robin to clarify IOMMU vs DMA engine (e.g.,
a device) and document proper DMA API.  Also correct the fact that
the AMD IOMMU is not limited to managing PCI devices.
v3: Fix spelling and rework text as suggested by Vasant

 Documentation/x86/amd-iommu.rst   | 69 +++
 Documentation/x86/index.rst   |  1 +
 Documentation/x86/intel-iommu.rst |  2 +-
 3 files changed, 71 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/x86/amd-iommu.rst

diff --git a/Documentation/x86/amd-iommu.rst b/Documentation/x86/amd-iommu.rst
new file mode 100644
index ..3b1fb8fec168
--- /dev/null
+++ b/Documentation/x86/amd-iommu.rst
@@ -0,0 +1,69 @@
+=
+AMD IOMMU Support
+=
+
+The architecture spec can be obtained from the below location.
+
+https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
+
+This guide gives a quick cheat sheet for some basic understanding.
+
+Some Keywords
+
+- IVRS - I/O Virtualization Reporting Structure
+- IVDB - I/O Virtualization Definition Block
+- IVHD - I/O Virtualization Hardware Definition
+- IOVA - I/O Virtual Address.
+
+Basic stuff
+---
+
+ACPI enumerates and lists the different IOMMUs on the platform, and
+device scope relationships between devices and which IOMMU controls
+them.
+
+What is IVRS?
+-
+
+The architecture defines an ACPI-compatible data structure called an I/O
+Virtualization Reporting Structure (IVRS) that is used to convey information
+related to I/O virtualization to system software.  The IVRS describes the
+configuration and capabilities of the IOMMUs contained in the platform as
+well as information about the devices that each IOMMU virtualizes.
+
+The IVRS provides information about the following:
+- IOMMUs present in the platform including their capabilities and proper 
configuration
+- System I/O topology relevant to each IOMMU
+- Peripheral devices that cannot be otherwise enumerated
+- Memory regions used by SMI/SMM, platform firmware, and platform hardware. 
These are
+generally exclusion ranges to be configured by system software.
+
+How is IOVA generated?
+--
+
+Well behaved drivers call dma_map_*() calls before sending command to device
+that needs to perform DMA. Once DMA is completed and mapping is no longer
+required, driver performs dma_unmap_*() calls to unmap the region.
+
+Fault reporting
+---
+
+When errors are reported, the IOMMU signals via an interrupt. The fault
+reason and device that caused it is printed on the console.
+
+Boot Message Sample
+---
+
+Something like this gets printed indicating presence of the IOMMU.
+
+   iommu: Default domain type: Translated
+   iommu: DMA domain TLB invalidation policy: lazy mode
+
+Fault reporting
+^^^
+
+::
+
+   AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xc02000 
flags=0x]
+   AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x0007 
address=0xc02000 flags=0x]
+
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..15711134eb68 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -22,6 +22,7 @@ x86-specific Documentation
mtrr
pat
intel-iommu
+   amd-iommu
intel_txt
amd-memory-encryption
pti
diff --git a/Documentation/x86/intel-iommu.rst 
b/Documentation/x86/intel-iommu.rst
index 099f13d51d5f..4d3391c7bd3f 100644
--- a/Documentation/x86/intel-iommu.rst
+++ b/Documentation/x86/intel-iommu.rst
@@ -1,5 +1,5 @@
 ===
-Linux IOMMU Support
+Intel IOMMU Support
 ===
 
 The architecture spec can be obtained from the below location.
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 2/2] Documentation: x86: Clarify Intel IOMMU documentation

2022-03-28 Thread Alex Deucher via iommu
Based on feedback from Robin on the initial AMD IOMMU
documentation, fix up the Intel documentation to
clarify IOMMU vs device and modern DMA API.

Signed-off-by: Alex Deucher 
---

V2: Fix spelling error in commit message
Rework ACPI section as suggested by Dave Hansen

 Documentation/x86/intel-iommu.rst | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/Documentation/x86/intel-iommu.rst 
b/Documentation/x86/intel-iommu.rst
index 4d3391c7bd3f..17d8497e506b 100644
--- a/Documentation/x86/intel-iommu.rst
+++ b/Documentation/x86/intel-iommu.rst
@@ -19,9 +19,8 @@ Some Keywords
 Basic stuff
 ---
 
-ACPI enumerates and lists the different DMA engines in the platform, and
-device scope relationships between PCI devices and which DMA engine  controls
-them.
+ACPI enumerates both the IOMMUs in the platform and which IOMMU
+controls a specific PCI device.
 
 What is RMRR?
 -
@@ -36,9 +35,9 @@ unity mappings for these regions for these devices to access 
these regions.
 How is IOVA generated?
 --
 
-Well behaved drivers call pci_map_*() calls before sending command to device
+Well behaved drivers call dma_map_*() calls before sending command to device
 that needs to perform DMA. Once DMA is completed and mapping is no longer
-required, device performs a pci_unmap_*() calls to unmap the region.
+required, device performs a dma_unmap_*() calls to unmap the region.
 
 The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
 device has its own domain (hence protection). Devices under p2p bridges
@@ -68,7 +67,7 @@ address from PCI MMIO ranges so they are not allocated for 
IOVA addresses.
 
 Fault reporting
 ---
-When errors are reported, the DMA engine signals via an interrupt. The fault
+When errors are reported, the IOMMU signals via an interrupt. The fault
 reason and device that caused it with fault reason is printed on console.
 
 See below for sample.
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 0/2] x86 IOMMU Documentation updates

2022-03-28 Thread Alex Deucher via iommu
This was originally just a patch to add an AMD IOMMU
documentation page, but grew into some cleanup of the
Intel IOMMU documentation page.

v2: AMD documentation rework
Add Intel Updates
v3: Further documentation reworks

Alex Deucher (2):
  Documentation: x86: Add documentation for AMD IOMMU
  Documentation: x86: Clarify Intel IOMMU documentation

 Documentation/x86/amd-iommu.rst   | 69 +++
 Documentation/x86/index.rst   |  1 +
 Documentation/x86/intel-iommu.rst | 13 +++---
 3 files changed, 76 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/x86/amd-iommu.rst

-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

2022-03-28 Thread Alex Williamson
On Thu, 24 Mar 2022 10:46:22 -0300
Jason Gunthorpe  wrote:

> On Thu, Mar 24, 2022 at 07:25:03AM +, Tian, Kevin wrote:
> 
> > Based on that here is a quick tweak of the force-snoop part (not compiled). 
> >  
> 
> I liked your previous idea better, that IOMMU_CAP_CACHE_COHERENCY
> started out OK but got weird. So lets fix it back to the way it was.
> 
> How about this:
> 
> https://github.com/jgunthorpe/linux/commits/intel_no_snoop
> 
> b11c19a4b34c2a iommu: Move the Intel no-snoop control off of IOMMU_CACHE
> 5263947f9d5f36 vfio: Require that device support DMA cache coherence

I have some issues with the argument here:

  This will block device/platform/iommu combinations that do not
  support cache coherent DMA - but these never worked anyhow as VFIO
  did not expose any interface to perform the required cache
  maintenance operations.

VFIO never intended to provide such operations, it only tried to make
the coherence of the device visible to userspace such that it can
perform operations via other means, for example via KVM.  The "never
worked" statement here seems false.

Commit b11c19a4b34c2a also appears to be a behavioral change.  AIUI
vfio_domain.enforce_cache_coherency would only be set on Intel VT-d
where snoop-control is supported, this translates to KVM emulating
coherency instructions everywhere except VT-d w/ snoop-control.

My understanding of AMD-Vi is that no-snoop TLPs are always coherent, so
this would trigger unnecessary wbinvd emulation on those platforms.  I
don't know if other archs need similar, but it seems we're changing
polarity wrt no-snoop TLPs from "everyone is coherent except this case
on Intel" to "everyone is non-coherent except this opposite case on
Intel".  Thanks,

Alex

> eab4b381c64a30 iommu: Restore IOMMU_CAP_CACHE_COHERENCY to its original 
> meaning
> 2752e12bed48f6 iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with 
> dev_is_dma_coherent()
> 
> If you like it could you take it from here?
> 
> Thanks,
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd

2022-03-28 Thread Jason Gunthorpe via iommu
On Mon, Mar 28, 2022 at 02:14:26PM +0100, Sean Mooney wrote:
> On Mon, 2022-03-28 at 09:53 +0800, Jason Wang wrote:
> > On Thu, Mar 24, 2022 at 7:46 PM Jason Gunthorpe  wrote:
> > > 
> > > On Thu, Mar 24, 2022 at 11:50:47AM +0800, Jason Wang wrote:
> > > 
> > > > It's simply because we don't want to break existing userspace. [1]
> > > 
> > > I'm still waiting to hear what exactly breaks in real systems.
> > > 
> > > As I explained this is not a significant change, but it could break
> > > something in a few special scenarios.
> > > 
> > > Also the one place we do have ABI breaks is security, and ulimit is a
> > > security mechanism that isn't working right. So we do clearly need to
> > > understand *exactly* what real thing breaks - if anything.
> > > 
> > > Jason
> > > 
> > 
> > To tell the truth, I don't know. I remember that Openstack may do some
> > accounting so adding Sean for more comments. But we really can't image
> > openstack is the only userspace that may use this.
> sorry there is a lot of context to this discussion i have tried to read back 
> the
> thread but i may have missed part of it.

Thanks Sean, this is quite interesting, though I'm not sure it
entirely reached the question

> tl;dr openstack does not currently track locked/pinned memory per
> use or per vm because we have no idea when libvirt will request it
> or how much is needed per device. when ulimits are configured today
> for nova/openstack its done at teh qemu user level outside of
> openstack in our installer tooling.  e.g. in tripleo the ulimits
> woudl be set on the nova_libvirt contaienr to constrain all vms
> spawned not per vm/process.

So, today, you expect the ulimit to be machine wide, like if your
machine has 1TB of memory you'd set the ulimit at 0.9 TB and you'd
like the stuff under to limit memory pinning to 0.9TB globally for all
qemus?

To be clear it doesn't work that way today at all, you might as well
just not bother setting ulimit to anything less than unlimited at the
openstack layer.

>hard_limit
>
>The optional hard_limit element is the maximum memory the
>guest can use. The units for this value are kibibytes
>(i.e. blocks of 1024 bytes). Users of QEMU and KVM are strongly
>advised not to set this limit as domain may get killed by the
>kernel if the guess is too low, and determining the memory needed
>for a process to run is an undecidable problem; that said, if you
>already set locked in memory backing because your workload
>demands it, you'll have to take into account the specifics of
>your deployment and figure out a value for hard_limit that is
>large enough to support the memory requirements of your guest,
>but small enough to protect your host against a malicious guest
>locking all memory.

And hard_limit is the ulimit that Alex was talking about?

So now we switched from talking about global per-user things to
per-qemu-instance things?

> we coudl not figure out how to automatically comptue a hard_limit in
> nova that would work for everyone and we felt exposign this to our
> users/operators was bit of a cop out when they likely cant caluate
> that properly either.

Not surprising..

> As a result we cant actully account for them
> today when schduilign workloads to a host. Im not sure this woudl
> chagne even if you exposed new user space apis unless we  had a way
> to inspect each VF to know how much locked memory that VF woudl need
> to lock?

We are not talking about a new uAPI we are talking about changing the
meaning of the existing ulimit. You can see it in your message above,
at the openstack level you were talking about global limits and then
in the libvirt level you are talking about per-qemu limits.

In the kernel both of these are being used by the same control and one
of the users is wrong.

The kernel consensus is that the ulimit is per-user and is used by all
kernel entities consistently

Currently vfio is different and uses it per-process and effectively
has its own private bucket.

When you talk about VDPA you start to see the problems here because
VPDA use a different accounting from VFIO. If you run VFIO and VDPA
together then you should need 2x the ulimit, but today you only need
1x because they don't share accounting buckets.

This also means the ulimit doesn't actually work the way it is
supposed to.

The question is how to fix it, if we do fix it, how much cares that
things work differently.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd

2022-03-28 Thread Sean Mooney
On Mon, 2022-03-28 at 09:53 +0800, Jason Wang wrote:
> On Thu, Mar 24, 2022 at 7:46 PM Jason Gunthorpe  wrote:
> > 
> > On Thu, Mar 24, 2022 at 11:50:47AM +0800, Jason Wang wrote:
> > 
> > > It's simply because we don't want to break existing userspace. [1]
> > 
> > I'm still waiting to hear what exactly breaks in real systems.
> > 
> > As I explained this is not a significant change, but it could break
> > something in a few special scenarios.
> > 
> > Also the one place we do have ABI breaks is security, and ulimit is a
> > security mechanism that isn't working right. So we do clearly need to
> > understand *exactly* what real thing breaks - if anything.
> > 
> > Jason
> > 
> 
> To tell the truth, I don't know. I remember that Openstack may do some
> accounting so adding Sean for more comments. But we really can't image
> openstack is the only userspace that may use this.
sorry there is a lot of context to this discussion i have tried to read back the
thread but i may have missed part of it.

tl;dr openstack does not currently track locked/pinned memory per use or per vm 
because we have
no idea when libvirt will request it or how much is needed per device. when 
ulimits are configured
today for nova/openstack its done at teh qemu user level outside of openstack 
in our installer tooling.
e.g. in tripleo the ulimits woudl be set on the nova_libvirt contaienr to 
constrain all vms spawned
not per vm/process.

full responce below
---

openstacks history with locked/pinned/unswapable memory is a bit complicated.
we currently only request locked memory explictly in 2 cases directly
https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/virt/libvirt/driver.py#L5769-L5784=
when the adminstartor configure the vm flaovr to requst amd's SEV feature or 
configures the flavor for realtime scheduling pirorotiy.
i say explictly as libvirt invented a request for locked/pinned pages implictly 
for sriov VFs and a number of other cases
which we were not aware of implictly. this only became apprent when we went to 
add vdpa supprot to openstack and libvirt
did not make that implict request and we had to fall back to requesting 
realtime instances as a workaround.

nova/openstack does have the ablity to generate the libvirt xml element that 
configure hard and soft limtis 
https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/virt/libvirt/config.py#L2559-L2590
however it is only ever used in our test code
https://github.com/openstack/nova/search?q=LibvirtConfigGuestMemoryTune

the descirption of hard limit in the libvirt docs stongly dicurages its use 
with a small caveat for locked memory
https://libvirt.org/formatdomain.html#memory-tuning

   hard_limit
   
   The optional hard_limit element is the maximum memory the guest can use. 
The units for this value are kibibytes (i.e. blocks of 1024 bytes). Users
   of QEMU and KVM are strongly advised not to set this limit as domain may get 
killed by the kernel if the guess is too low, and determining the memory
   needed for a process to run is an undecidable problem; that said, if you 
already set locked in memory backing because your workload demands it, you'll
   have to take into account the specifics of your deployment and figure out a 
value for hard_limit that is large enough to support the memory
   requirements of your guest, but small enough to protect your host against a 
malicious guest locking all memory.
   
we coudl not figure out how to automatically comptue a hard_limit in nova that 
would work for everyone and we felt exposign this to our
users/operators was  bit of a cop out when they likely cant caluate that 
properly either. As a result we cant actully account for them today when
schduilign workloads to a host. Im not sure this woudl chagne even if you 
exposed new user space apis unless we 
had a way to inspect each VF to know how much locked memory that VF woudl need 
to lock? same for vdpa devices,
mdevs ectra. cloud system dont normaly have quotas on "locked" memory used 
trasitivly via passthoguh devices so even if we had this info
its not imeditly apperant how we woudl consume it without altering our existing 
quotas. Openstack is a self service cloud plathform
where enduser can upload there own worload iamge so its basicaly impossibel for 
the oeprator of the cloud to know how much memroy to set teh ard limit
too without setting it overly large in most cases. from a management applciaton 
point of view we currently have no insigth into how
memory will be pinned in the kernel or when libvirt will invent addtional 
request for pinned/locked memeory or how large they are. 

instead of going down that route operators are encuraged to use ulimit to set a 
global limit on the amount of memory the nova/qemu user can use.
while nova/openstack support multi tenancy we do not expose that multi tenancy 
to hte underlying hypervior hosts. the agents are typicly
deploy as the 

Re: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd

2022-03-28 Thread Jason Gunthorpe via iommu
On Mon, Mar 28, 2022 at 09:53:27AM +0800, Jason Wang wrote:
> To me, it looks more easier to not answer this question by letting
> userspace know about the change,

That is not backwards compatbile, so I don't think it helps unless we
say if you open /dev/vfio/vfio you get old behavior and if you open
/dev/iommu you get new...

Nor does it answer if I can fix RDMA or not :\

So we really do need to know what exactly is the situation here.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread Halil Pasic
On Sun, 27 Mar 2022 17:30:01 -0700
Linus Torvalds  wrote:

> On Sun, Mar 27, 2022 at 4:52 PM Halil Pasic  wrote:
> >
> > I have no intention of pursuing this.  When fixing the information leak,
> > I happened to realize, that a somewhat similar situation can emerge when
> > mappings are reused. It seemed like an easy fix, so I asked the swiotlb
> > maintainers, and they agreed. It ain't my field of expertise, and the
> > drivers I'm interested in don't need this functionality.  
> 
> Ok.
> 
> That said, I think you are putting yourself down when you said in an
> earlier email that you aren't veryt knowledgeable in this area.
> 
> I think the fact that you *did* think of this other similar situation
> is actually very interesting, and it's something people probably
> _haven't_ been thinking about.

Thank you!

> 
> So I think your first commit fixes the straightforward and common case
> where you do that "map / partial dma / unmap" case.
> 
> And that straightforward case is probably all that the disk IO case
> ever really triggers, which is presumably why those "drivers I'm
> interested in don't need this functionality" don't need anything else?
> 

I agree.

> And yes, your second commit didn't work, but hey, whatever. The whole
> "multiple operations on the same double buffering allocation"
> situation is something I don't think people have necessarily thought
> about enough.
> 
> And by that I don't mean you. I mean very much the whole history of
> our dma mapping code.
> 

I agree. We are in the process of catching up! :) My idea was to aid
a process, as a relatively naive pair of eyes: somebody didn't read any
data sheets describing non-cache-coherent DMA, and never programmed
a DMA. It is a fairly common problem, that for the very knowledgeable
certain things seem obvious, self-explanatory or trivial, but for the
less knowledgeable the are not. And knowledge can create bias.

> I then get opinionated and probably too forceful, but please don't
> take it as being about you - it's about just my frustration with that
> code - and if it comes off too negative then please accept my
> apologies.

I have to admit, I did feel a little uncomfortable, and I did look for
an exit strategy. I do believe, that people in your position do have to
occasionally get forceful, and even abrasive to maintain efficiency. I
try to not ignore the social aspect of things, but I do get carried away
occasionally.

Your last especially paragraph is very encouraging and welcome. Thank
you!

Regards,
Halil

[..]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread Johannes Berg
On Mon, 2022-03-28 at 11:50 +0200, Johannes Berg wrote:
> No I worded that badly - the direction isn't useless, but thinking of it
> in terms of a buffer property rather than data movement is inaccurate.
> So then if we need something else to indicate how data was expected to
> be moved, the direction argument becomes useless, since it's not a
> buffer property but rather a temporal thing on a specific place that
> expected certain data movement.
> 

Yeah, umm. I should've read the whole thread of the weekend first, sorry
for the noise.

johannes
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread Johannes Berg
On Mon, 2022-03-28 at 11:48 +0200, Johannes Berg wrote:
> 
> However, this basically means that the direction argument to the flush
> APIs are completely useless, and we do have to define something
> new/else...
> 

No I worded that badly - the direction isn't useless, but thinking of it
in terms of a buffer property rather than data movement is inaccurate.
So then if we need something else to indicate how data was expected to
be moved, the direction argument becomes useless, since it's not a
buffer property but rather a temporal thing on a specific place that
expected certain data movement.

johannes
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread Johannes Berg
On Sun, 2022-03-27 at 05:15 +0200, Halil Pasic wrote:

> 
> The key here is "sync_sg API, all the parameters must be the same
> as those passed into the single mapping API", but I have to admit,
> I don't understand the *single* in here.
> 

Hah. So I wasn't imagining things after all.

However, as the rest of the thread arrives, this still means it's all
broken ... :)

> The intended meaning of the
> last sentence is that one can do partial sync by choose 
> dma_hande_sync, size_sync such that dma_handle_mapping <= dma_handle_sync
> < dma_handle_mapping + size_mapping and dma_handle_sync + size_sync <=
> dma_handle_mapping + size_mapping. But the direction has to remain the
> same.

Right.

> BTW, the current documented definition of the direction is about the
> data transfer direction between memory and the device, and how the CPU
> is interacting with the memory is not in scope. A quote form the
> documentation.
> 
> """
> === =
> DMA_NONEno direction (used for debugging)
> DMA_TO_DEVICE   data is going from the memory to the device
> DMA_FROM_DEVICE data is coming from the device to the memory
> DMA_BIDIRECTIONAL   direction isn't known
> === =
> """
> (Documentation/core-api/dma-api.rst)
> 
> My feeling is, that re-defining the dma direction is not a good idea. But
> I don't think my opinion has much weight here.

However, this basically means that the direction argument to the flush
APIs are completely useless, and we do have to define something
new/else...

johannes
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Remove redundant check

2022-03-28 Thread Vasant Hegde via iommu
Joerg,

Ping.

-Vasant

On 3/14/2022 12:32 PM, Vasant Hegde via iommu wrote:
> smatch static checker warning:
>   drivers/iommu/amd/init.c:1989 amd_iommu_init_pci()
>   warn: duplicate check 'ret' (previous on line 1978)
> 
> Reported-by: Dan Carpenter 
> Fixes: 06687a03805e ("iommu/amd: Improve error handling for 
> amd_iommu_init_pci")
> Signed-off-by: Vasant Hegde 
> ---
>  drivers/iommu/amd/init.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 2586e589e54e..8ed1f86fe93d 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2137,8 +2137,7 @@ static int __init amd_iommu_init_pci(void)
>   for_each_iommu(iommu)
>   iommu_flush_all_caches(iommu);
>  
> - if (!ret)
> - print_iommu_info();
> + print_iommu_info();
>  
>  out:
>   return ret;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Bug report: VFIO map/unmep mem subject to race and DMA data goes to incorrect page (4.18.0)

2022-03-28 Thread Lu Baolu

Hi Daniel,

On 2022/3/26 6:10, Alex Williamson wrote:

Hi Daniel,

On Fri, 25 Mar 2022 13:06:40 -0700
"Daniel F. Smith"  wrote:


This email is to document an insidious (incorrect data, no error or warning)
VFIO bug found when using the Intel IOMMU to perform DMA transfers; and the
associated workaround.

There may be security implications (unsure).

/sys/devices/virtual/iommu/dmar0/intel-iommu/version: 1:0
/sys/devices/virtual/iommu/dmar0/intel-iommu/cap: d2008c40660462
Linux x.ibm.com 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Tue Mar 8 12:56:54 EST 
2022 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.5 (Ootpa)

In our testing of VFIO DMA to an FPGA card in rootless mode, we discovered a
glitch where DMA data are transferred to/from the incorrect page.  It
appears timing based.  Under some specific conditions the test could trigger
the bug every loop.  Sometimes the bug would only emerge after 20+ minutes
of testing.

Basics of test:
Get memory with mmap(anonymous): size can change.
VFIO_IOMMU_MAP_DMA with a block of memory, fixed IOVA.
Fill memory with pattern.
Do DMA transfer to FPGA from memory at IOVA.
Do DMA transfer from FPGA to memory at IOVA+offset.
Compare memory to ensure match.  Miscompare is bug.
VFIO_IOMMU_UNMAP_DMA
unmap()
Repeat.

Using the fixed IOVA address* caused sporadic memory miscompares.  The
nature of the miscompares is that the received data was mixed with pages
that had been returned by mmap in a *previous* loop.

Workaround: Randomizing the IOVA eliminated the memory miscompares.

Hypothesis/conjecture: Possible race condition in UNMAP_DMA such that pages
can be released/munlocked *after* the MAP_DMA with the same IOVA has
occurred.


Coherency possibly.

There's a possible coherency issue at the compare depending on the
IOMMU capabilities which could affect whether DMA is coherent to memory
or requires an explicit flush.  I'm a little suspicious whether dmar0
is really the IOMMU controlling this device since you mention a 39bit
IOVA space, which is more typical of Intel client platforms which can
also have integrated graphics which often have a dedicated IOMMU at
dmar0 that isn't necessarily representative of the other IOMMUs in the
system, especially with regard to snoop-control.  Each dmar lists the
managed devices under it in sysfs to verify.  Support for snoop-control
would be identified in the ecap register rather than the cap register.
VFIO can also report coherency via the VFIO_DMA_CC_IOMMU extension
reported by VFIO_CHECK_EXTENSION ioctl.

However, CPU coherency might lead to a miscompare, but not necessarily a
miscompare matching the previous iteration.  Still, for completeness
let's make sure this isn't a gap in the test programming making invalid
assumptions about CPU/DMA coherency.

The fact that randomizing the IOVA provides a workaround though might
suggest something relative to the IOMMU page table coherency.  But for
the new mmap target to have the data from the previous iteration, the
IOMMU PTE would need to be stale on read, but correct on write in order
to land back in your new mmap.  That seems peculiar.  Are we sure the
FPGA device isn't caching the value at the IOVA or using any sort of
IOTLB caching such as ATS that might not be working correctly?


Suggestion: Document issue when using fixed IOVA, or fix if security
is a concern.


I don't know that there's enough information here to make any
conclusions.  Here are some further questions:

  * What size mappings are being used, both for the mmap and the VFIO
MAP/UNMAP operations.

  * If the above is venturing into super page support (2MB), does the
vfio_iommu_type1 module option disable_hugepages=1 affect the
results.

  * Along the same lines, does the kernel command line option
intel_iommu=sp_off produce different results.

  * Does this behavior also occur on upstream kernels (ie. v5.17)?

  * Do additional CPU cache flushes in the test program produce different
results?

  * Is this a consumer available FPGA device that others might be able
to reproduce this issue?  I've always wanted such a device for
testing, but also we can't rule out that the FPGA itself or its
programming is the source of the miscompare.

 From the vfio perspective, UNMAP_DMA should first unmap the pages at
the IOMMU to prevent device access before unpinning the pages.  We do
make use of batch unmapping to reduce iotlb flushing, but the result is
expected to be that the IOMMU PTE entries are invalidated before the
UNMAP_DMA operation completes.  A stale IOVA would not be expected or
correct operation.  Thanks,

Alex



As another suggestion, can you please try a patch posted here?

https://lore.kernel.org/linux-iommu/20220322063555.1422042-1-steve...@google.com/

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH v2] iommu/vt-d: calculate mask for non-aligned flushes

2022-03-28 Thread Lu Baolu

Hi David,

On 2022/3/22 14:35, David Stevens wrote:

From: David Stevens 

Calculate the appropriate mask for non-size-aligned page selective
invalidation. Since psi uses the mask value to mask out the lower order
bits of the target address, properly flushing the iotlb requires using a
mask value such that [pfn, pfn+pages) all lie within the flushed
size-aligned region.  This is not normally an issue because iova.c
always allocates iovas that are aligned to their size. However, iovas
which come from other sources (e.g. userspace via VFIO) may not be
aligned.


This is bug fix, right? Can you please add "Fixes" and "Cc stable" tags?



Signed-off-by: David Stevens 
---
v1 -> v2:
  - Calculate an appropriate mask for non-size-aligned iovas instead
of falling back to domain selective flush.

  drivers/iommu/intel/iommu.c | 27 ---
  1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5b196cfe9ed2..ab2273300346 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1717,7 +1717,8 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
  unsigned long pfn, unsigned int pages,
  int ih, int map)
  {
-   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
+   unsigned int aligned_pages = __roundup_pow_of_two(pages);
+   unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
u16 did = domain->iommu_did[iommu->seq_id];
  
@@ -1729,10 +1730,30 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,

if (domain_use_first_level(domain)) {
domain_flush_piotlb(iommu, domain, addr, pages, ih);
} else {
+   unsigned long bitmask = aligned_pages - 1;
+
+   /*
+* PSI masks the low order bits of the base address. If the
+* address isn't aligned to the mask, then compute a mask value
+* needed to ensure the target range is flushed.
+*/
+   if (unlikely(bitmask & pfn)) {
+   unsigned long end_pfn = pfn + pages - 1, shared_bits;
+
+   /*
+* Since end_pfn <= pfn + bitmask, the only way bits
+* higher than bitmask can differ in pfn and end_pfn is
+* by carrying. This means after masking out bitmask,
+* high bits starting with the first set bit in
+* shared_bits are all equal in both pfn and end_pfn.
+*/
+   shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
+   mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;


Can you please add some lines in the commit message to explain how this
magic line works? It's easier for people to understand it if you can
take a real example. :-)

Best regards,
baolu


+   }
+
/*
 * Fallback to domain selective flush if no PSI support or
-* the size is too big. PSI requires page size to be 2 ^ x,
-* and the base address is naturally aligned to the size.
+* the size is too big.
 */
if (!cap_pgsel_inv(iommu->cap) ||
mask > cap_max_amask_val(iommu->cap))

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread David Laight
From: Christoph Hellwig
> Sent: 28 March 2022 07:37
> 
> On Fri, Mar 25, 2022 at 11:46:09AM -0700, Linus Torvalds wrote:
> > I think my list of three different sync cases (not just two! It's not
> > just about whether to sync for the CPU or the device, it's also about
> > what direction the data itself is taking) is correct.
> >
> > But maybe I'm wrong.
> 
> At the high level you are correct.  It is all about which direction
> the data is taking.  That is the direction argument that all the
> map/unmap/sync call take.  The sync calls then just toggle the ownership.
> You seem to hate that ownership concept, but I don't see how things
> could work without that ownership concept as we're going to be in
> trouble without having that.  And yes, a peek operation could work in
> some cases, but it would have to be at the cache line granularity.

I don't think it is really 'ownership' but more about who has
write access.
Only one side can have write access (to a cache line [1]) at any
one time.

Read access is different.
You need a 'synchronise' action to pick up newly written data.
This might be a data copy, cache flush or cache invalidate.
It only need affect the area that needs to be read - not
full buffer.
Partial cache flush/invalidate will almost certainly speed
up receipt of short network packets that are copied into a
new skb - leaving the old one mapped for another receive.

[1] The cache line size might be a property of the device
and dma subsystem, not just the cpu.
I have used hardware when the effective size was 1kB.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

2022-03-28 Thread Christoph Hellwig
On Fri, Mar 25, 2022 at 11:46:09AM -0700, Linus Torvalds wrote:
> I think my list of three different sync cases (not just two! It's not
> just about whether to sync for the CPU or the device, it's also about
> what direction the data itself is taking) is correct.
> 
> But maybe I'm wrong.

At the high level you are correct.  It is all about which direction
the data is taking.  That is the direction argument that all the
map/unmap/sync call take.  The sync calls then just toggle the ownership.
You seem to hate that ownership concept, but I don't see how things
could work without that ownership concept as we're going to be in
trouble without having that.  And yes, a peek operation could work in
some cases, but it would have to be at the cache line granularity.

arch/arc/mm/dma.c has a really good comment how these transfers relate
to actual cache operations btw>

 *
 *  |   map  ==  for_device |   unmap ==  for_cpu
 *  |
 * TO_DEV   |   writebackwriteback  |   none  none
 * FROM_DEV |   invalidate   invalidate |   invalidate*   invalidate*
 * BIDIR|   writeback+invwriteback+inv  |   invalidateinvalidate
 *
 * [*] needed for CPU speculative prefetches
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu