date:20210302

[PATCH] swiotlb: Fix type of max_slot

2021-03-02 Thread Kunihiko Hayashi

After the refactoring phase, the type of max_slot has changed from unsigned
long to unsigned int. The return type of the function get_max_slots() and
the 4th argument type of iommu_is_span_boundary() are different from the
type of max_slot. Finally, asserts BUG_ON in iommu_is_span_boundary().

Cc: Christoph Hellwig 
Fixes: 567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
Signed-off-by: Kunihiko Hayashi 
---
 kernel/dma/swiotlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 369e4c3..c10e855 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -534,7 +534,7 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
-   unsigned int max_slots = get_max_slots(boundary_mask);
+   unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int iotlb_align_mask =
dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/4] Misc vSVA fixes for VT-d

2021-03-02 Thread Jacob Pan

Hi Baolu et al,

This is a collection of SVA-related fixes.

ChangeLog:

v2:
- For guest SVA, call pasid_set_wpe directly w/o checking host CR0.wp
  (Review comments by Kevin T.)
- Added fixes tag

Thanks,

Jacob

Jacob Pan (4):
  iommu/vt-d: Enable write protect for supervisor SVM
  iommu/vt-d: Enable write protect propagation from guest
  iommu/vt-d: Reject unsupported page request modes
  iommu/vt-d: Calculate and set flags for handle_mm_fault

 drivers/iommu/intel/pasid.c | 29 +
 drivers/iommu/intel/svm.c   | 21 +
 include/uapi/linux/iommu.h  |  3 ++-
 3 files changed, 48 insertions(+), 5 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 3/4] iommu/vt-d: Reject unsupported page request modes

2021-03-02 Thread Jacob Pan

When supervisor/privilige mode SVM is used, we bind init_mm.pgd with
a supervisor PASID. There should not be any page fault for init_mm.
Execution request with DMA read is also not supported.

This patch checks PRQ descriptor for both unsupported configurations,
reject them both with invalid responses.

Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable
mode")
Acked-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a1e4f58c54..ff7ae7cc17d5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1113,7 +1113,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
   ((unsigned long long *)req)[1]);
goto no_pasid;
}
-
+   /* We shall not receive page request for supervisor SVM */
+   if (req->pm_req && (req->rd_req | req->wr_req)) {
+   pr_err("Unexpected page request in Privilege Mode");
+   /* No need to find the matching sdev as for bad_req */
+   goto no_pasid;
+   }
+   /* DMA read with exec requeset is not supported. */
+   if (req->exe_req && req->rd_req) {
+   pr_err("Execution request not supported\n");
+   goto no_pasid;
+   }
if (!svm || svm->pasid != req->pasid) {
rcu_read_lock();
svm = ioasid_find(NULL, req->pasid, NULL);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-03-02 Thread Jacob Pan

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In guest supervisor shared virtual addressing (SVA), write-protect
should be honored upon guest bind supervisor PASID request.

This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 3 +++
 include/uapi/linux/iommu.h  | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0b7e0e726ade..b7e39239f539 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -763,6 +763,9 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, 
struct pasid_entry *pte,
return -EINVAL;
}
pasid_set_sre(pte);
+   /* Enable write protect WP if guest requested */
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE)
+   pasid_set_wpe(pte);
}
 
if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 35d48843acd8..3a9164cc9937 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
 #define IOMMU_SVA_VTD_GPASID_PWT   (1 << 3) /* page-level write through */
 #define IOMMU_SVA_VTD_GPASID_EMTE  (1 << 4) /* extended mem type enable */
 #define IOMMU_SVA_VTD_GPASID_CD(1 << 5) /* PASID-level cache 
disable */
-#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
+#define IOMMU_SVA_VTD_GPASID_WPE   (1 << 6) /* Write protect enable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 7)
__u64 flags;
__u32 pat;
__u32 emt;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 4/4] iommu/vt-d: Calculate and set flags for handle_mm_fault

2021-03-02 Thread Jacob Pan

Page requests are originated from the user page fault. Therefore, we
shall set FAULT_FLAG_USER. 

FAULT_FLAG_REMOTE indicates that we are walking an mm which is not
guaranteed to be the same as the current->mm and should not be subject
to protection key enforcement. Therefore, we should set FAULT_FLAG_REMOTE
to avoid faults when both SVM and PKEY are used.

References: commit 1b2ee1266ea6 ("mm/core: Do not enforce PKEY permissions on 
remote mm access")
Reviewed-by: Raj Ashok 
Acked-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ff7ae7cc17d5..7bfd20a24a60 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1086,6 +1086,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
int head, tail, handled = 0;
+   unsigned int flags = 0;
 
/* Clear PPR bit before reading head/tail registers, to
 * ensure that we get a new interrupt if needed. */
@@ -1186,9 +1187,11 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (access_error(vma, req))
goto invalid;
 
-   ret = handle_mm_fault(vma, address,
- req->wr_req ? FAULT_FLAG_WRITE : 0,
- NULL);
+   flags = FAULT_FLAG_USER | FAULT_FLAG_REMOTE;
+   if (req->wr_req)
+   flags |= FAULT_FLAG_WRITE;
+
+   ret = handle_mm_fault(vma, address, flags, NULL);
if (ret & VM_FAULT_ERROR)
goto invalid;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/4] iommu/vt-d: Enable write protect for supervisor SVM

2021-03-02 Thread Jacob Pan

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In supervisor shared virtual addressing (SVA), where page tables
are shared between CPU and DMA, IOMMU PASID entry WPE bit should match
CR0.WP bit in the CPU.
This patch sets WPE bit for supervisor PASIDs if CR0.WP is set.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0cceaabc3ce6..0b7e0e726ade 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -410,6 +410,15 @@ static inline void pasid_set_sre(struct pasid_entry *pe)
pasid_set_bits(>val[2], 1 << 0, 1);
 }
 
+/*
+ * Setup the WPE(Write Protect Enable) field (Bit 132) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_wpe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 4, 1 << 4);
+}
+
 /*
  * Setup the P(Present) field (Bit 0) of a scalable mode PASID
  * entry.
@@ -553,6 +562,20 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
}
 }
 
+static inline int pasid_enable_wpe(struct pasid_entry *pte)
+{
+   unsigned long cr0 = read_cr0();
+
+   /* CR0.WP is normally set but just to be sure */
+   if (unlikely(!(cr0 & X86_CR0_WP))) {
+   pr_err_ratelimited("No CPU write protect!\n");
+   return -EINVAL;
+   }
+   pasid_set_wpe(pte);
+
+   return 0;
+};
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -584,6 +607,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EINVAL;
}
pasid_set_sre(pte);
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+
}
 
if (flags & PASID_FLAG_FL5LP) {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jacob Pan

Hi Jason,

On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe  wrote:

> On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> >  
> > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +   struct domain_capsule *dc = (struct domain_capsule *)data;
> > +   unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > + (void __user *)arg);  
> 
> This arg buisness is really tortured. The type should be set at the
> ioctl, not constantly passed down as unsigned long or worse void *.
> 
> And why is this passing a __user pointer deep into an iommu_* API??
> 
The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
driver frameworks. The design is documented here:
Documentation/userspace-api/iommu.rst
IOMMU UAPI handles the type and sanitation of user provided data.

Could you be more specific about your concerns?

> > +/**
> > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> > + * struct vfio_iommu_type1_nesting_op)
> > + *
> > + * This interface allows userspace to utilize the nesting IOMMU
> > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> > + * _REQUEST.
> > + *
> > + * @data[] types defined for each op:
> > + * +=+===+
> > + * | NESTING OP  |  @data[]  |
> > + * +=+===+
> > + * | BIND_PGTBL  |  struct iommu_gpasid_bind_data|
> > + * +-+---+
> > + * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
> > + *
> > +-+---+  
> 
> If the type is known why does the struct have a flex array?
> 
This will be extended to other types in the next patches.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jason Gunthorpe

On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe  wrote:
> 
> > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > >  
> > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > + struct domain_capsule *dc = (struct domain_capsule *)data;
> > > + unsigned long arg = *(unsigned long *)dc->data;
> > > +
> > > + return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > +   (void __user *)arg);  
> > 
> > This arg buisness is really tortured. The type should be set at the
> > ioctl, not constantly passed down as unsigned long or worse void *.
> > 
> > And why is this passing a __user pointer deep into an iommu_* API??
> > 
> The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
> driver frameworks. The design is documented here:
> Documentation/userspace-api/iommu.rst
> IOMMU UAPI handles the type and sanitation of user provided data.

Why? If it is uapi it has defined types and those types should be
completely clear from the C code, not obfuscated.

I haven't looked at the design doc yet, but this is a just a big red
flag, you shouldn't be tunneling one subsytems uAPI through another
subsystem.

If you need to hook two subsystems together it should be more
directly, like VFIO takes in the IOMMU FD and 'registers' itself in
some way with the IOMMU then you can do the IOMMU actions through the
IOMMU FD and it can call back to VFIO as needed.

At least in this way we can swap VFIO for other things in the API.

Having every subsystem that wants to implement IOMMU also implement
tunneled ops seems very backwards.

> Could you be more specific about your concerns?

Avoid using unsigned long, void * and flex arrays to describe
concretely typed things.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-02 Thread Keqian Zhu

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> On attach_pasid_table() we program STE S1 related info set
> by the guest into the actual physical STEs. At minimum
> we need to program the context descriptor GPA and compute
> whether the stage1 is translated/bypassed or aborted.
> 
> On detach, the stage 1 config is unset and the abort flag is
> unset.
> 
> Signed-off-by: Eric Auger 
> 
[...]

> +
> + /*
> +  * we currently support a single CD so s1fmt and s1dss
> +  * fields are also ignored
> +  */
> + if (cfg->pasid_bits)
> + goto out;
> +
> + smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
specific cd using arm_smmu_get_cd_ptr().

Maybe we'd better use a specialized function to fill other fields of "cdcfg" or 
add a sanity check in arm_smmu_get_cd_ptr()
to prevent calling it under nested mode?

As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
found. Just a suggestion ;-)

Thanks,
Keqian


> + smmu_domain->s1_cfg.set = true;
> + smmu_domain->abort = false;
> + break;
> + default:
> + goto out;
> + }
> + spin_lock_irqsave(_domain->devices_lock, flags);
> + list_for_each_entry(master, _domain->devices, domain_head)
> + arm_smmu_install_ste_for_dev(master);
> + spin_unlock_irqrestore(_domain->devices_lock, flags);
> + ret = 0;
> +out:
> + mutex_unlock(_domain->init_mutex);
> + return ret;
> +}
> +
> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_master *master;
> + unsigned long flags;
> +
> + mutex_lock(_domain->init_mutex);
> +
> + if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> + goto unlock;
> +
> + smmu_domain->s1_cfg.set = false;
> + smmu_domain->abort = false;
> +
> + spin_lock_irqsave(_domain->devices_lock, flags);
> + list_for_each_entry(master, _domain->devices, domain_head)
> + arm_smmu_install_ste_for_dev(master);
> + spin_unlock_irqrestore(_domain->devices_lock, flags);
> +
> +unlock:
> + mutex_unlock(_domain->init_mutex);
> +}
> +
>  static bool arm_smmu_dev_has_feature(struct device *dev,
>enum iommu_dev_features feat)
>  {
> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>   .of_xlate   = arm_smmu_of_xlate,
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
> + .attach_pasid_table = arm_smmu_attach_pasid_table,
> + .detach_pasid_table = arm_smmu_detach_pasid_table,
>   .dev_has_feat   = arm_smmu_dev_has_feature,
>   .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>   .dev_enable_feat= arm_smmu_dev_enable_feature,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH AUTOSEL 5.10 21/47] iommu/amd: Fix performance counter initialization

2021-03-02 Thread Sasha Levin

From: Suravee Suthikulpanit 

[ Upstream commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b ]

Certain AMD platforms enable power gating feature for IOMMU PMC,
which prevents the IOMMU driver from updating the counter while
trying to validate the PMC functionality in the init_iommu_perf_ctr().
This results in disabling PMC support and the following error message:

"AMD-Vi: Unable to read/write to IOMMU perf counter"

To workaround this issue, disable power gating temporarily by programming
the counter source to non-zero value while validating the counter,
and restore the prior state afterward.

Signed-off-by: Suravee Suthikulpanit 
Tested-by: Tj (Elloe Linux) 
Link: 
https://lore.kernel.org/r/20210208122712.5048-1-suravee.suthikulpa...@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd/init.c | 45 ++--
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c842545368fd..3c215f0a6052 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -254,6 +255,8 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -1717,13 +1720,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
-
-static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
+   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
+   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
@@ -1731,17 +1732,39 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
amd_iommu_pc_present = true;
 
/* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
goto pc_false;
 
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
+   /*
+* Disable power gating by programing the performance counter
+* source to 20 (i.e. counts the reads and writes from/to IOMMU
+* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
+* which never get incremented during this init phase.
+* (Note: The event is also deprecated.)
+*/
+   val = 20;
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
goto pc_false;
 
+   /* Check if the performance counters can be written to */
+   val = 0xabcd;
+   for (retry = 5; retry; retry--) {
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
+   val2)
+   break;
+
+   /* Wait about 20 msec for power gating to disable and retry. */
+   msleep(20);
+   }
+
/* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   goto pc_false;
+
+   if (val != val2)
goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH AUTOSEL 5.10 07/47] iommu/vt-d: Clear PRQ overflow only when PRQ is empty

2021-03-02 Thread Sasha Levin

From: Lu Baolu 

[ Upstream commit 28a77185f1cd0650b664f546141433a7a615 ]

It is incorrect to always clear PRO when it's set w/o first checking
whether the overflow condition has been cleared. Current code assumes
that if an overflow condition occurs it must have been cleared by earlier
loop. However since the code runs in a threaded context, the overflow
condition could occur even after setting the head to the tail under some
extreme condition. To be sane, we should read both head/tail again when
seeing a pending PRO and only clear PRO after all pending PRs have been
handled.

Suggested-by: Kevin Tian 
Signed-off-by: Lu Baolu 
Link: 
https://lore.kernel.org/linux-iommu/mwhpr11mb18862d2ea5bd432bf22d99a48c...@mwhpr11mb1886.namprd11.prod.outlook.com/
Link: 
https://lore.kernel.org/r/20210126080730.2232859-2-baolu...@linux.intel.com
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/svm.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 43f392d27d31..b200a3acc6ed 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1079,8 +1079,17 @@ prq_advance:
 * Clear the page request overflow bit and wake up all threads that
 * are waiting for the completion of this handling.
 */
-   if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO)
-   writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+   if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+   pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n",
+   iommu->name);
+   head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+   tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+   if (head == tail) {
+   writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+   pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared",
+   iommu->name);
+   }
+   }
 
if (!completion_done(>prq_complete))
complete(>prq_complete);
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 06/10] iommu: Pass domain to sva_unbind_gpasid()

2021-03-02 Thread Liu Yi L

From: Yi Sun 

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may be attached with multiple aux-domains. Also, for guest unbind,
the PASID to be unbind should be allocated to the VM. This check requires
to know the ioasid_set which is associated with the domain.

So this interface needs to pass in domain info. Then the iommu driver is
able to know which domain will be used for the 2nd stage translation of
the nesting mode and also be able to do PASID ownership check. This patch
passes @domain per the above reason.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Yi Sun 
Signed-off-by: Liu Yi L 
---
v7 -> v8:
*) tweaked the commit message.

v6 -> v7:
*) correct the link for the details of modifying pasid prototype to bve "u32".
*) hold off r-b from Eric Auger as there is modification in this patch, will
   seek r-b in this version.

v5 -> v6:
*) use "u32" prototype for @pasid.
*) add review-by from Eric Auger.

v2 -> v3:
*) pass in domain info only
*) use u32 for pasid instead of int type

v1 -> v2:
*) added in v2.
---
 drivers/iommu/intel/svm.c   | 3 ++-
 drivers/iommu/iommu.c   | 2 +-
 include/linux/intel-iommu.h | 3 ++-
 include/linux/iommu.h   | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 561d011c7287..7521b4aefd16 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -496,7 +496,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, u32 pasid)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct intel_svm_dev *sdev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d46f103a1e4b..822e485683ae 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2185,7 +2185,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain *domain, 
struct device *dev,
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
-   return domain->ops->sva_unbind_gpasid(dev, pasid);
+   return domain->ops->sva_unbind_gpasid(domain, dev, pasid);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 554aa946f142..aaf403966444 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -755,7 +755,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..4840217a590b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -299,7 +299,8 @@ struct iommu_ops {
int (*sva_bind_gpasid)(struct iommu_domain *domain,
struct device *dev, struct iommu_gpasid_bind_data 
*data);
 
-   int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
+   int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+struct device *dev, u32 pasid);
 
int (*def_domain_type)(struct device *dev);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 01/10] iommu: Report domain nesting info

2021-03-02 Thread Liu Yi L

IOMMUs that support nesting translation needs report the capability info
to userspace. It gives information about requirements the userspace needs
to implement plus other features characterizing the physical implementation.

This patch introduces a new IOMMU UAPI struct that gives information about
the nesting capabilities and features. This struct is supposed to be returned
by iommu_domain_get_attr() with DOMAIN_ATTR_NESTING attribute parameter, with
one domain whose type has been set to DOMAIN_ATTR_NESTING.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v7 -> v8:
*) add padding in struct iommu_nesting_info_vtd
*) describe future extension rules for struct iommu_nesting_info in iommu.rst.
*) remove SYSWIDE_PASID

v6 -> v7:
*) rephrase the commit message, replace the @data[] field in struct
   iommu_nesting_info with union per comments from Eric Auger.

v5 -> v6:
*) rephrase the feature notes per comments from Eric Auger.
*) rename @size of struct iommu_nesting_info to @argsz.

v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 Documentation/userspace-api/iommu.rst |  5 +-
 include/uapi/linux/iommu.h| 72 +++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
index d3108c1519d5..ad06bb94aad5 100644
--- a/Documentation/userspace-api/iommu.rst
+++ b/Documentation/userspace-api/iommu.rst
@@ -26,6 +26,7 @@ supported user-kernel APIs are as follows:
 2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
 3. Invalidate IOMMU caches upon guest requests
 4. Report errors to the guest and serve page requests
+5. Read iommu_nesting_info from kernel
 
 Requirements
 
@@ -96,7 +97,9 @@ kernel. Simply recompiling existing code with newer kernel 
header should
 not be an issue in that only existing flags are used.
 
 IOMMU vendor driver should report the below features to IOMMU UAPI
-consumers (e.g. via VFIO).
+consumers (e.g. via VFIO). The feature list is passed by struct
+iommu_nesting_info. The future extension to this structure follows
+the rule defined in section "Extension Rules & Precautions".
 
 1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
 2. IOMMU_NESTING_FEAT_BIND_PGTBL
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e1d9e75f2c94..e924bfc091e8 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -338,4 +338,76 @@ struct iommu_gpasid_bind_data {
} vendor;
 };
 
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info.
+ *
+ * @flags: VT-d specific flags. Currently reserved for future
+ * extension. must be set to 0.
+ * @cap_reg:   Describe basic capabilities as defined in VT-d capability
+ * register.
+ * @ecap_reg:  Describe the extended capabilities as defined in VT-d
+ * extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+   __u32   flags;
+   __u8padding[12];
+   __u64   cap_reg;
+   __u64   ecap_reg;
+};
+
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *userspace should check it before using
+ *nesting capability.
+ *
+ * @argsz: size of the whole structure.
+ * @flags: currently reserved for future extension. must set to 0.
+ * @format:PASID table entry format, the same definition as struct
+ * iommu_gpasid_bind_data @format.
+ * @features:  supported nesting features.
+ * @addr_width:the output addr width of first level/stage translation
+ * @pasid_bits:maximum supported PASID bits, 0 represents no PASID
+ * support.
+ * @vendor:vendor specific data, structure type can be deduced from
+ * @format field.
+ *
+ * +===+==+
+ * | feature   |  Notes   |
+ * +===+==+
+ * | BIND_PGTBL|  IOMMU vendor driver sets it to mandate userspace to |
+ * |   |  bind the first level/stage page table to associated |
+ * |   |  PASID (either the one specified in bind request or  |
+ * |   |  the default PASID of iommu domain), through IOMMU   |
+ * |   |

[Patch v8 02/10] iommu/smmu: Report empty domain nesting info

2021-03-02 Thread Liu Yi L

This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle. For
now, return an empty nesting info struct for now as true nesting is not
yet supported by the SMMUs.

Note: this patch just ensure no compiling issue, to be functional ready
fro ARM platform, needs to apply patches from Vivek Gautam in below link.

https://lore.kernel.org/linux-iommu/20210212105859.8445-1-vivek.gau...@arm.com/

Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
v5 -> v6:
*) add review-by from Eric Auger.

v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 29 +++--
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..99ea3ee35826 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,6 +2449,32 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+   void *data)
+{
+   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+   unsigned int size;
+
+   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -ENODEV;
+
+   size = sizeof(struct iommu_nesting_info);
+
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->argsz < size) {
+   info->argsz = size;
+   return 0;
+   }
+
+   /* report an empty iommu_nesting_info for now */
+   memset(info, 0x0, size);
+   info->argsz = size;
+   return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
@@ -2458,8 +2484,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   return arm_smmu_domain_nesting_info(smmu_domain, data);
default:
return -ENODEV;
}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..d874c580ea80 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1481,6 +1481,32 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+   void *data)
+{
+   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+   unsigned int size;
+
+   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -ENODEV;
+
+   size = sizeof(struct iommu_nesting_info);
+
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->argsz < size) {
+   info->argsz = size;
+   return 0;
+   }
+
+   /* report an empty iommu_nesting_info for now */
+   memset(info, 0x0, size);
+   info->argsz = size;
+   return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
@@ -1490,8 +1516,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   return arm_smmu_domain_nesting_info(smmu_domain, data);
case DOMAIN_ATTR_IO_PGTABLE_CFG: {
struct io_pgtable_domain_attr *pgtbl_cfg = data;
*pgtbl_cfg = smmu_domain->pgtbl_cfg;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Liu Yi L

Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated by the userspace before
the binding request. e.g. allocated from /dev/ioasid. As the bind data
is parsed by iommu abstract layer, so this patch doesn't have the ownership
check against the PASID from userspace. It would be done in the iommu sub-
system.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v7 -> v8:
*) adapt to /dev/ioasid
*) address comments from Alex on v7.
*) adapt to latest iommu_sva_unbind_gpasid() implementation.
*) remove the OP check against VFIO_IOMMU_NESTING_OP_NUM as it's redundant
   to the default switch case in vfio_iommu_handle_pgtbl_op().

v6 -> v7:
*) introduced @user in struct domain_capsule to simplify the code per Eric's
   suggestion.
*) introduced VFIO_IOMMU_NESTING_OP_NUM for sanitizing op from userspace.
*) corrected the @argsz value of unbind_data in vfio_group_unbind_gpasid_fn().

v5 -> v6:
*) dropped vfio_find_nesting_group() and add vfio_get_nesting_domain_capsule().
   per comment from Eric.
*) use iommu_uapi_sva_bind/unbind_gpasid() and iommu_sva_unbind_gpasid() in
   linux/iommu.h for userspace operation and in-kernel operation.

v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   
https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 156 
 include/uapi/linux/vfio.h   |  35 +++
 2 files changed, 191 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3a5c84d4f19b..0044931b80dc 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -164,6 +164,34 @@ struct vfio_regions {
 
 #define WAITED 1
 
+struct domain_capsule {
+   struct vfio_group   *group;
+   struct iommu_domain *domain;
+   void*data;
+   /* set if @data contains a user pointer*/
+   booluser;
+};
+
+/* iommu->lock must be held */
+static int vfio_prepare_nesting_domain_capsule(struct vfio_iommu *iommu,
+  struct domain_capsule *dc)
+{
+   struct vfio_domain *domain;
+   struct vfio_group *group;
+
+   if (!iommu->nesting_info)
+   return -EINVAL;
+
+   domain = list_first_entry(>domain_list,
+ struct vfio_domain, next);
+   group = list_first_entry(>group_list,
+struct vfio_group, next);
+   dc->group = group;
+   dc->domain = domain->domain;
+   dc->user = true;
+   return 0;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2607,6 +2635,51 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+ (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+
+   /*
+* dc->user is a toggle for the unbind operation. When user
+* set, the dc->data passes in a __user pointer and requires
+* to use iommu_uapi_sva_unbind_gpasid(), in which it will
+* copy the unbind data from the user buffer. When user is
+* clear, the dc->data passes in a pasid which is going to
+* be unbind no need to copy data from userspace.
+*/
+   if (dc->user) {
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   iommu_uapi_sva_unbind_gpasid(dc->domain,
+dev, (void __user *)arg);
+   } else {
+   ioasid_t pasid =

[Patch v8 05/10] vfio/type1: Allow invalidating first-level/stage IOMMU cache

2021-03-02 Thread Liu Yi L

This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Eric Auger 
Signed-off-by: Jacob Pan 
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
---
 drivers/vfio/vfio_iommu_type1.c | 38 +
 include/uapi/linux/vfio.h   |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0044931b80dc..86b6d8f9789a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3291,6 +3291,41 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+   struct domain_capsule *dc = (struct domain_capsule *)data;
+   unsigned long arg = *(unsigned long *)dc->data;
+
+   iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+   return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+   unsigned long arg)
+{
+   struct domain_capsule dc = { .data =  };
+   struct iommu_nesting_info *info;
+   int ret;
+
+   mutex_lock(>lock);
+   info = iommu->nesting_info;
+   if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+   ret = -EOPNOTSUPP;
+   goto out_unlock;
+   }
+
+   ret = vfio_prepare_nesting_domain_capsule(iommu, );
+   if (ret)
+   goto out_unlock;
+
+   iommu_group_for_each_dev(dc.group->iommu_group, ,
+vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
unsigned long arg)
 {
@@ -3313,6 +3348,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu 
*iommu,
case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
break;
+   case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+   ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
+   break;
default:
ret = -EINVAL;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 985e6cf4c52d..08b8d236dfee 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1245,6 +1245,8 @@ struct vfio_iommu_type1_dirty_bitmap_get {
  * +-+---+
  * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
  * +-+---+
+ * | CACHE_INVLD |  struct iommu_cache_invalidate_info   |
+ * +-+---+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -1258,6 +1260,7 @@ struct vfio_iommu_type1_nesting_op {
 enum {
VFIO_IOMMU_NESTING_OP_BIND_PGTBL,
VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL,
+   VFIO_IOMMU_NESTING_OP_CACHE_INVLD,
 };
 
 #define VFIO_IOMMU_NESTING_OP  _IO(VFIO_TYPE, VFIO_BASE + 18)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace

2021-03-02 Thread Liu Yi L

This patch exports iommu nesting capability info to user space through
VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
bind page table, cache invalidation) and the vendor specific format
information for first level/stage page table that will be bound to.

The nesting info is available only after container set to be NESTED type.
Current implementation imposes one limitation - one nesting container
should include at most one iommu group. The philosophy of vfio container
is having all groups/devices within the container share the same IOMMU
context. When vSVA is enabled, one IOMMU context could include one 2nd-
level address space and multiple 1st-level address spaces. While the
2nd-level address space is reasonably sharable by multiple groups, blindly
sharing 1st-level address spaces across all groups within the container
might instead break the guest expectation. In the future sub/super container
concept might be introduced to allow partial address space sharing within
an IOMMU context. But for now let's go with this restriction by requiring
singleton container for using nesting iommu features. Below link has the
related discussion about this decision.

https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/

This patch also changes the NESTING type container behaviour. Something
that would have succeeded before will now fail: Before this series, if
user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
if the SMMU didn't support stage-2, as the driver would have silently
fallen back on stage-1 mappings (which work exactly the same as stage-2
only since there was no nesting supported). After the series, we do check
for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
fix and completely harmless. Detail can be found in below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
---
v7 -> v8:
*) tweak per Alex's comments against v7.
*) check "iommu->nesting_info->format == 0" in attach_group()

v6 -> v7:
*) using vfio_info_add_capability() for adding nesting cap per suggestion
   from Eric.

v5 -> v6:
*) address comments against v5 from Eric Auger.
*) don't report nesting cap to userspace if the nesting_info->format is
   invalid.

v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 102 +++-
 include/uapi/linux/vfio.h   |  19 ++
 2 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649..3a5c84d4f19b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-   struct list_headdomain_list;
-   struct list_headiova_list;
-   struct vfio_domain  *external_domain; /* domain for external user */
-   struct mutexlock;
-   struct rb_root  dma_list;
-   struct blocking_notifier_head notifier;
-   unsigned intdma_avail;
-   unsigned intvaddr_invalid_count;
-   uint64_tpgsize_bitmap;
-   uint64_tnum_non_pinned_groups;
-   wait_queue_head_t   vaddr_wait;
-   boolv2;
-   boolnesting;
-   booldirty_page_tracking;
-   boolpinned_page_dirty_scope;
-   boolcontainer_open;
+   struct list_headdomain_list;
+   struct list_headiova_list;
+   /* domain for external user */
+   struct vfio_domain  *external_domain;
+   struct mutexlock;
+   struct rb_root  dma_list;
+   struct blocking_notifier_head   notifier;
+   unsigned intdma_avail;
+   unsigned intvaddr_invalid_count;
+   uint64_tpgsize_bitmap;
+   uint64_tnum_non_pinned_groups;
+   wait_queue_head_t   vaddr_wait;
+   struct iommu_nesting_info   *nesting_info;
+   boolv2;
+   boolnesting;
+   booldirty_page_tracking;
+   bool

[Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs

2021-03-02 Thread Liu Yi L

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This patch series has been updated regards to the disscussion around PASID
allocation in v7 [1]. This series has removed the PASID allocation, and
adapted to the /dev/ioasid solution [2]. Therefore the patches in this series
has been re-ordered. And the Patch Overview is as below:

 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0010)
 2. vfio support for binding guest page table to host (patch 0004)
 3. vfio support for IOMMU cache invalidation from VMs (patch 0005)
 4. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0006, 0007)
 5. expose PASID capability to VM (patch 0008)
 6. add doc for VFIO dual stage control (patch 0009)

The complete vSVA kernel upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. IOMMU-backed Mediated Device assignment
3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2. And it has dependency on 
IOASID
extension from Jacobd Pan [3]. Complete set for current vSVA kernel and QEMU 
can be
found in [4] and [5].

[1] 
https://lore.kernel.org/kvm/dm5pr11mb14351121729909028d6eb365c3...@dm5pr11mb1435.namprd11.prod.outlook.com/
[2] 
https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun@linux.intel.com/
[3] 
https://lore.kernel.org/linux-iommu/1614463286-97618-1-git-send-email-jacob.jun@linux.intel.com/
[4] https://github.com/jacobpan/linux/tree/vsva-linux-5.12-rc1-v8
[5] https://github.com/luxis1999/qemu/tree/vsva_5.12_rc1_qemu_rfcv11

Regards,
Yi Liu

Changelog:
- Patch v7 -> Patch v8:
  a) removed the PASID allocation out of this series, it is covered by 
below patch:
 
https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun@linux.intel.com/
  Patch v7: 
https://lore.kernel.org/kvm/1599734733-6431-1-git-send-email-yi.l@intel.com/

- Patch v6 -> Patch v7:
  a) drop [PATCH v6 01/15] of v6 as it's merged by Alex.
  b) rebase on Jacob's v8 IOMMU uapi enhancement and v2 IOASID 
extension patchset.
  c) Address comments against v6 from Alex and Eric.
  Patch v6: 
https://lore.kernel.org/kvm/1595917664-33276-1-git-send-email-yi.l@intel.com/

- Patch v5 -> Patch v6:
  a) Address comments against v5 from Eric.
  b) rebase on Jacob's v6 IOMMU uapi enhancement
  Patch v5: 
https://lore.kernel.org/kvm/1594552870-55687-1-git-send-email-yi.l@intel.com/

- Patch v4 -> Patch v5:
  a) Address comments against v4
  Patch v4: 
https://lore.kernel.org/kvm/1593861989-35920-1-git-send-email-yi.l@intel.com/

- Patch v3 -> Patch v4:
  a) Address comments against v3
  b) Add rb from Stefan on patch 14/15
  Patch v3: 
https://lore.kernel.org/kvm/1592988927-48009-1-git-send-email-yi.l@intel.com/

- Patch v2 -> Patch v3:
  a) Rebase on top of Jacob's v3 iommu uapi patchset
  b) Address comments from Kevin and Stefan Hajnoczi
  c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
  d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
  Patch v2:

Re: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace

2021-03-02 Thread Jason Gunthorpe

On Wed, Mar 03, 2021 at 04:35:38AM +0800, Liu Yi L wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4bb162c1d649..3a5c84d4f19b 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
>"Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> - struct list_headdomain_list;
> - struct list_headiova_list;
> - struct vfio_domain  *external_domain; /* domain for external user */
> - struct mutexlock;
> - struct rb_root  dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned intdma_avail;
> - unsigned intvaddr_invalid_count;
> - uint64_tpgsize_bitmap;
> - uint64_tnum_non_pinned_groups;
> - wait_queue_head_t   vaddr_wait;
> - boolv2;
> - boolnesting;
> - booldirty_page_tracking;
> - boolpinned_page_dirty_scope;
> - boolcontainer_open;
> + struct list_headdomain_list;
> + struct list_headiova_list;
> + /* domain for external user */
> + struct vfio_domain  *external_domain;
> + struct mutexlock;
> + struct rb_root  dma_list;
> + struct blocking_notifier_head   notifier;
> + unsigned intdma_avail;
> + unsigned intvaddr_invalid_count;
> + uint64_tpgsize_bitmap;
> + uint64_tnum_non_pinned_groups;
> + wait_queue_head_t   vaddr_wait;
> + struct iommu_nesting_info   *nesting_info;
> + boolv2;
> + boolnesting;
> + booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
> + boolcontainer_open;
>  };

I always hate seeing one line patches done like this. If you want to
re-indent you should remove the horizontal whitespace, not add an
unreadable amount more.

Also, Linus has been unhappy before to see lists of bool's in structs
due to the huge amount of memory they waste.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH AUTOSEL 5.11 25/52] iommu/amd: Fix performance counter initialization

2021-03-02 Thread Sasha Levin

From: Suravee Suthikulpanit 

[ Upstream commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b ]

Certain AMD platforms enable power gating feature for IOMMU PMC,
which prevents the IOMMU driver from updating the counter while
trying to validate the PMC functionality in the init_iommu_perf_ctr().
This results in disabling PMC support and the following error message:

"AMD-Vi: Unable to read/write to IOMMU perf counter"

To workaround this issue, disable power gating temporarily by programming
the counter source to non-zero value while validating the counter,
and restore the prior state afterward.

Signed-off-by: Suravee Suthikulpanit 
Tested-by: Tj (Elloe Linux) 
Link: 
https://lore.kernel.org/r/20210208122712.5048-1-suravee.suthikulpa...@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd/init.c | 45 ++--
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 83d8ab2aed9f..01da76dc1caa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -254,6 +255,8 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -1712,13 +1715,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
-
-static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
+   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
+   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
@@ -1726,17 +1727,39 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
amd_iommu_pc_present = true;
 
/* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
goto pc_false;
 
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
+   /*
+* Disable power gating by programing the performance counter
+* source to 20 (i.e. counts the reads and writes from/to IOMMU
+* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
+* which never get incremented during this init phase.
+* (Note: The event is also deprecated.)
+*/
+   val = 20;
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
goto pc_false;
 
+   /* Check if the performance counters can be written to */
+   val = 0xabcd;
+   for (retry = 5; retry; retry--) {
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
+   val2)
+   break;
+
+   /* Wait about 20 msec for power gating to disable and retry. */
+   msleep(20);
+   }
+
/* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   goto pc_false;
+
+   if (val != val2)
goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH AUTOSEL 5.11 09/52] iommu/vt-d: Clear PRQ overflow only when PRQ is empty

2021-03-02 Thread Sasha Levin

From: Lu Baolu 

[ Upstream commit 28a77185f1cd0650b664f546141433a7a615 ]

It is incorrect to always clear PRO when it's set w/o first checking
whether the overflow condition has been cleared. Current code assumes
that if an overflow condition occurs it must have been cleared by earlier
loop. However since the code runs in a threaded context, the overflow
condition could occur even after setting the head to the tail under some
extreme condition. To be sane, we should read both head/tail again when
seeing a pending PRO and only clear PRO after all pending PRs have been
handled.

Suggested-by: Kevin Tian 
Signed-off-by: Lu Baolu 
Link: 
https://lore.kernel.org/linux-iommu/mwhpr11mb18862d2ea5bd432bf22d99a48c...@mwhpr11mb1886.namprd11.prod.outlook.com/
Link: 
https://lore.kernel.org/r/20210126080730.2232859-2-baolu...@linux.intel.com
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/svm.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 18a9f05df407..b3bcd6dec93e 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1079,8 +1079,17 @@ prq_advance:
 * Clear the page request overflow bit and wake up all threads that
 * are waiting for the completion of this handling.
 */
-   if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO)
-   writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+   if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+   pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n",
+   iommu->name);
+   head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+   tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+   if (head == tail) {
+   writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+   pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared",
+   iommu->name);
+   }
+   }
 
if (!completion_done(>prq_complete))
complete(>prq_complete);
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH AUTOSEL 5.4 15/33] iommu/amd: Fix performance counter initialization

2021-03-02 Thread Sasha Levin

From: Suravee Suthikulpanit 

[ Upstream commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b ]

Certain AMD platforms enable power gating feature for IOMMU PMC,
which prevents the IOMMU driver from updating the counter while
trying to validate the PMC functionality in the init_iommu_perf_ctr().
This results in disabling PMC support and the following error message:

"AMD-Vi: Unable to read/write to IOMMU perf counter"

To workaround this issue, disable power gating temporarily by programming
the counter source to non-zero value while validating the counter,
and restore the prior state afterward.

Signed-off-by: Suravee Suthikulpanit 
Tested-by: Tj (Elloe Linux) 
Link: 
https://lore.kernel.org/r/20210208122712.5048-1-suravee.suthikulpa...@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/amd_iommu_init.c | 45 +-
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 31d7e2d4f304..ad714ff375f8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -253,6 +254,8 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -1672,13 +1675,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
-
-static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
+   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
+   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
@@ -1686,17 +1687,39 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
amd_iommu_pc_present = true;
 
/* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
goto pc_false;
 
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
-   (val != val2))
+   /*
+* Disable power gating by programing the performance counter
+* source to 20 (i.e. counts the reads and writes from/to IOMMU
+* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
+* which never get incremented during this init phase.
+* (Note: The event is also deprecated.)
+*/
+   val = 20;
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
goto pc_false;
 
+   /* Check if the performance counters can be written to */
+   val = 0xabcd;
+   for (retry = 5; retry; retry--) {
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
+   val2)
+   break;
+
+   /* Wait about 20 msec for power gating to disable and retry. */
+   msleep(20);
+   }
+
/* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
+   iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+   goto pc_false;
+
+   if (val != val2)
goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers

2021-03-02 Thread Robin Murphy


On 2021-02-25 17:51, Jordan Crouse wrote:

Call report_iommu_fault() to allow upper-level drivers to register their
own fault handlers.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..0f3a9b5f3284 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
int idx = smmu_domain->cfg.cbndx;
+   int ret;
  
  	fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);

if (!(fsr & ARM_SMMU_FSR_FAULT))
@@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
  
-	dev_err_ratelimited(smmu->dev,

-   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   ret = report_iommu_fault(domain, dev, iova,


Beware that "dev" here is not a struct device, so this isn't right. I'm 
not entirely sure what we *should* be passing here, since we can't 
easily attribute a context fault to a specific client device, and 
passing the IOMMU device seems a bit dubious too, so maybe just NULL?


Robin.


+   fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : 
IOMMU_FAULT_READ);
+
+   if (ret == -ENOSYS)
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
fsr, iova, fsynr, cbfrsynra, idx);
  
  	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-03-02 Thread Keqian Zhu

Hi Jean,

Reviewed-by: Keqian Zhu 

On 2021/3/2 17:26, Jean-Philippe Brucker wrote:
> When handling faults from the event or PRI queue, we need to find the
> struct device associated with a SID. Add a rb_tree to keep track of
> SIDs.
> 
> Acked-by: Jonathan Cameron 
> Reviewed-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 157 
>  2 files changed, 140 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index f985817c967a..7b15b7580c6e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -639,6 +639,15 @@ struct arm_smmu_device {
>  
>   /* IOMMU core code handle */
>   struct iommu_device iommu;
> +
> + struct rb_root  streams;
> + struct mutexstreams_mutex;
> +};
> +
> +struct arm_smmu_stream {
> + u32 id;
> + struct arm_smmu_master  *master;
> + struct rb_node  node;
>  };
>  
>  /* SMMU private data for each master */
> @@ -647,8 +656,8 @@ struct arm_smmu_master {
>   struct device   *dev;
>   struct arm_smmu_domain  *domain;
>   struct list_headdomain_head;
> - u32 *sids;
> - unsigned intnum_sids;
> + struct arm_smmu_stream  *streams;
> + unsigned intnum_streams;
>   boolats_enabled;
>   boolsva_enabled;
>   struct list_headbonds;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 7edce914c45e..d148bb6d4289 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -909,8 +909,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
> *smmu_domain,
>  
>   spin_lock_irqsave(_domain->devices_lock, flags);
>   list_for_each_entry(master, _domain->devices, domain_head) {
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.cfgi.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.cfgi.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu, , );
>   }
>   }
> @@ -1355,6 +1355,28 @@ static int arm_smmu_init_l2_strtab(struct 
> arm_smmu_device *smmu, u32 sid)
>   return 0;
>  }
>  
> +/* smmu->streams_mutex must be held */
> +__maybe_unused
> +static struct arm_smmu_master *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> + struct rb_node *node;
> + struct arm_smmu_stream *stream;
> +
> + node = smmu->streams.rb_node;
> + while (node) {
> + stream = rb_entry(node, struct arm_smmu_stream, node);
> + if (stream->id < sid)
> + node = node->rb_right;
> + else if (stream->id > sid)
> + node = node->rb_left;
> + else
> + return stream->master;
> + }
> +
> + return NULL;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -1588,8 +1610,8 @@ static int arm_smmu_atc_inv_master(struct 
> arm_smmu_master *master)
>  
>   arm_smmu_atc_inv_to_cmd(0, 0, 0, );
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_issue_cmd(master->smmu, );
>   }
>  
> @@ -1632,8 +1654,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
> *smmu_domain, int ssid,
>   if (!master->ats_enabled)
>   continue;
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu_domain->smmu, , );
>   }
>   }
> @@ -2065,13 +2087,13 @@ static void arm_smmu_install_ste_for_dev(struct 
> arm_smmu_master *master)
>   int i, j;
>   struct arm_smmu_device *smmu = master->smmu;
>  
> - for (i = 0; i < master->num_sids; ++i) {
> - u32 sid = master->sids[i];
> + for (i = 0; i < master->num_streams; ++i) {
> + u32 sid = master->streams[i].id;
>   __le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
>  
>   /* Bridged PCI devices may end up with duplicated IDs */
>   for (j = 0; j < i; j++)
> -

Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

2021-03-02 Thread John Garry


On 01/03/2021 15:48, John Garry wrote:


While max32_alloc_size indirectly tracks the largest*contiguous* 
available space, one of the ideas from which it grew was to simply keep

count of the total number of free PFNs. If you're really spending
significant time determining that the tree is full, as opposed to just
taking longer to eventually succeed, then it might be relatively
innocuous to tack on that semi-redundant extra accounting as a
self-contained quick fix for that worst case.


Anyway, we see ~50% throughput regression, which is intolerable. As seen
in [0], I put this down to the fact that we have so many IOVA requests
which exceed the rcache size limit, which means many RB tree accesses
for non-cacheble IOVAs, which are now slower.


I will attempt to prove this by increasing RCACHE RANGE, such that all 
IOVA sizes may be cached.


About this one, as expected, we restore performance by increasing the 
RCACHE RANGE.


Some figures:
Baseline v5.12-rc1

strict mode:
600K IOPs

Revert "iommu/iova: Retry from last rb tree node if iova search fails":
1215K

Increase IOVA RCACHE range 6 -> 10 (All IOVAs size requests now 
cacheable for this experiment):

1400K

Reduce LLDD max SGE count 124 -> 16:
1288K

non-strict mode
1650K

So ideally we can work towards something for which IOVAs of all size 
could be cached.


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID

2021-03-02 Thread Jason Gunthorpe

On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
>  
> +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> +{
> + struct domain_capsule *dc = (struct domain_capsule *)data;
> + unsigned long arg = *(unsigned long *)dc->data;
> +
> + return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> +   (void __user *)arg);

This arg buisness is really tortured. The type should be set at the
ioctl, not constantly passed down as unsigned long or worse void *.

And why is this passing a __user pointer deep into an iommu_* API??

> +/**
> + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> + *   struct vfio_iommu_type1_nesting_op)
> + *
> + * This interface allows userspace to utilize the nesting IOMMU
> + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> + * _REQUEST.
> + *
> + * @data[] types defined for each op:
> + * +=+===+
> + * | NESTING OP  |  @data[]  |
> + * +=+===+
> + * | BIND_PGTBL  |  struct iommu_gpasid_bind_data|
> + * +-+---+
> + * | UNBIND_PGTBL|  struct iommu_gpasid_bind_data|
> + *
> +-+---+

If the type is known why does the struct have a flex array?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 00/18] IOASID extensions for guest SVA

2021-03-02 Thread Liu, Yi L

> From: Jacob Pan 
> Sent: Sunday, February 28, 2021 6:01 AM
>
> I/O Address Space ID (IOASID) core code was introduced in v5.5 as a generic
> kernel allocator service for both PCIe Process Address Space ID (PASID) and
> ARM SMMU's Substream ID. IOASIDs are used to associate DMA requests
> with
> virtual address spaces, including both host and guest.
> 
> In addition to providing basic ID allocation, ioasid_set was defined as a
> token that is shared by a group of IOASIDs. This set token can be used
> for permission checking, but lack some features to address the following
> needs by guest Shared Virtual Address (SVA).
> - Manage IOASIDs by group, group ownership, quota, etc.
> - State synchronization among IOASID users (e.g. IOMMU driver, KVM,
> device
> drivers)
> - Non-identity guest-host IOASID mapping
> - Lifecycle management
> 
> This patchset introduces the following extensions as solutions to the
> problems above.
> - Redefine and extend IOASID set such that IOASIDs can be managed by
> groups/pools.
> - Add notifications for IOASID state synchronization
> - Extend reference counting for life cycle alignment among multiple users
> - Support ioasid_set private IDs, which can be used as guest IOASIDs
> - Add a new cgroup controller for resource distribution
> 
> Please refer to Documentation/admin-guide/cgroup-v1/ioasids.rst and
> Documentation/driver-api/ioasid.rst in the enclosed patches for more
> details.
> 
> Based on discussions on LKML[1], a direction change was made in v4 such
> that
> the user interfaces for IOASID allocation are extracted from VFIO
> subsystem. The proposed IOASID subsystem now consists of three
> components:
> 1. IOASID core[01-14]: provides APIs for allocation, pool management,
>   notifications, and refcounting.
> 2. IOASID cgroup controller[RFC 15-17]: manage resource distribution[2].
> 3. IOASID user[RFC 18]:  provides user allocation interface via /dev/ioasid
> 
> This patchset only included VT-d driver as users of some of the new APIs.
> VFIO and KVM patches are coming up to fully utilize the APIs introduced
> here.
>
> [1] https://lore.kernel.org/linux-iommu/1599734733-6431-1-git-send-email-
> yi.l@intel.com/
> [2] Note that ioasid quota management code can be removed once the
> IOASIDs
> cgroup is ratified.
> 
> You can find this series, VFIO, KVM, and IOASID user at:
> https://github.com/jacobpan/linux.git ioasid_v4
> (VFIO and KVM patches will be available at this branch when published.)

VFIO and QEMU series are listed below:

VFIO: 
https://lore.kernel.org/linux-iommu/20210302203545.436623-1-yi.l@intel.com/
QEMU: 
https://lore.kernel.org/qemu-devel/20210302203827.437645-1-yi.l@intel.com/T/#t

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 07/10] vfio/type1: Add vSVA support for IOMMU-backed mdevs

2021-03-02 Thread Liu Yi L

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian 
CC: Jacob Pan 
CC: Jun Tian 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Reviewed-by: Eric Auger 
---
v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
 drivers/vfio/vfio_iommu_type1.c | 35 -
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 86b6d8f9789a..883a79f36c46 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2635,18 +2635,37 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
*iommu,
return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+   struct device *dev)
+{
+   if (group->mdev_group)
+   return vfio_mdev_get_iommu_device(dev);
+   else
+   return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *)dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+   return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device,
  (void __user *)arg);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
/*
 * dc->user is a toggle for the unbind operation. When user
@@ -2659,12 +2678,12 @@ static int vfio_dev_unbind_gpasid_fn(struct device 
*dev, void *data)
if (dc->user) {
unsigned long arg = *(unsigned long *)dc->data;
 
-   iommu_uapi_sva_unbind_gpasid(dc->domain,
-dev, (void __user *)arg);
+   iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device,
+(void __user *)arg);
} else {
ioasid_t pasid = *(ioasid_t *)dc->data;
 
-   iommu_sva_unbind_gpasid(dc->domain, dev, pasid);
+   iommu_sva_unbind_gpasid(dc->domain, iommu_device, pasid);
}
return 0;
 }
@@ -3295,8 +3314,14 @@ static int vfio_dev_cache_invalidate_fn(struct device 
*dev, void *data)
 {
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *)dc->data;
+   struct device *iommu_device;
+
+   iommu_device = vfio_get_iommu_device(dc->group, dev);
+   if (!iommu_device)
+   return -EINVAL;
 
-   iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+   iommu_uapi_cache_invalidate(dc->domain, iommu_device,
+   (void __user *)arg);
return 0;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 08/10] vfio/pci: Expose PCIe PASID capability to userspace

2021-03-02 Thread Liu Yi L

This patch exposes PCIe PASID capability to userspace and where to
emulate this capability if wants to further expose it to VM.

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. While for VFs, user
space still unable to see this capability as SR-IOV spec forbides VF to
implement PASID capability extended structure. It is a TODO in future.
Related discussion can be found in below link:

https://lore.kernel.org/kvm/20200407095801.648b1...@w520.home/

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Reviewed-by: Eric Auger 
---
v7 -> v8:
*) refine the commit message and the subject.

v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before
---
 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index a402adee8a21..95b5478f51ac 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = 
{
[PCI_EXT_CAP_ID_LTR]=   PCI_EXT_CAP_LTR_SIZEOF,
[PCI_EXT_CAP_ID_SECPCI] =   0,  /* not yet */
[PCI_EXT_CAP_ID_PMUX]   =   0,  /* not yet */
-   [PCI_EXT_CAP_ID_PASID]  =   0,  /* not yet */
+   [PCI_EXT_CAP_ID_PASID]  =   PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch v8 09/10] vfio: Document dual stage control

2021-03-02 Thread Liu Yi L

From: Eric Auger 

The VFIO API was enhanced to support nested stage control: a bunch of
new ioctls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Eric Auger 
Signed-off-by: Liu Yi L 
Reviewed-by: Stefan Hajnoczi 
---
v7 -> v8:
*) remove SYSWIDE_PASID description, point to /dev/ioasid when mentioning
   PASID allocation from host.

v6 -> v7:
*) tweak per Eric's comments.

v5 -> v6:
*) tweak per Eric's comments.

v3 -> v4:
*) add review-by from Stefan Hajnoczi

v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lore.kernel.org/kvm/20200320161911.27494-12-eric.au...@redhat.com/
---
 Documentation/driver-api/vfio.rst | 77 +++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst 
b/Documentation/driver-api/vfio.rst
index f1a4d3c3ba0b..9ccf9d63b72f 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,83 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds
+to the ARM terminology while level corresponds to Intel's terminology.
+In the following text, we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can
+use stage-1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage
+2 for VM isolation (GPA -> HPA).
+
+Under dual-stage translation, the guest gets ownership of the stage-1
+page tables or both the stage-1 configuration structures and page tables.
+This depends on vendor. e.g. on Intel platform, the guest owns stage-1
+page tables under nesting. While on ARM, the guest owns both the stage-1
+configuration structures and page tables under nesting. The hypervisor
+owns the root configuration structure (for security reasons), including
+stage-2 configuration. This works as long as configuration structures
+and page table formats are compatible between the virtual IOMMU and the
+physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage-2, leaving stage-1 available
+for guest usage.
+The stage-1 format and binding method are reported in nesting capability.
+(VFIO_IOMMU_TYPE1_INFO_CAP_NESTING) through VFIO_IOMMU_GET_INFO:
+
+ioctl(container->fd, VFIO_IOMMU_GET_INFO, _info);
+
+The nesting cap info is available only after NESTING_IOMMU is selected.
+If the underlying IOMMU doesn't support nesting, VFIO_SET_IOMMU fails and
+userspace should try other IOMMU types. Details of the nesting cap info
+can be found in Documentation/userspace-api/iommu.rst.
+
+Bind stage-1 page table to the IOMMU differs per platform. On Intel,
+the stage1 page table info are mediated by the userspace for each PASID.
+On ARM, the userspace directly passes the GPA of the whole PASID table.
+Currently only Intel's binding is supported (IOMMU_NESTING_FEAT_BIND_PGTBL)
+is supported:
+
+nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+memcpy(_op->data, _data, sizeof(bind_data));
+ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+When multiple stage-1 page tables are supported on a device, each page
+table is associated with a PASID (Process Address Space ID) to differentiate
+with each other. In such case, userspace should include PASID in the
+bind_data when issuing direct binding requests.
+
+PASID could be managed per-device or system-wide which, again, depends on
+IOMMU vendor. e.g. as by Intel platforms, userspace *must* allocate PASID
+from host before attempting binding of stage-1 page table, the allocation
+is done by the /dev/ioasid interface. For systems without /dev/ioasid,
+userspace should not go further binding page table and shall be failed
+by the kernel. For the usage of /dev/ioasid, please refer to below doc:
+
+Documentation/userspace-api/ioasid.rst
+
+Once the stage-1 page table is bound to the IOMMU, the guest is allowed to
+fully manage its mapping at its disposal. The IOMMU walks nested stage-1
+and stage-2 page tables when serving DMA requests from assigned device, and
+may cache the stage-1 mapping in the IOTLB. When required (IOMMU_NESTING_
+FEAT_CACHE_INVLD), userspace *must* forward guest stage-1 invalidation to
+the host, so the IOTLB is invalidated:
+
+nesting_op->flags =

[Patch v8 10/10] iommu/vt-d: Support reporting nesting capability info

2021-03-02 Thread Liu Yi L

This patch reports nesting info when iommu_domain_get_attr() is called with
DOMAIN_ATTR_NESTING and one domain with nesting set.

Cc: Kevin Tian 
CC: Jacob Pan 
Cc: Alex Williamson 
Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
v7 -> v8:
*) tweak per latest code base

v6 -> v7:
*) split the patch in v6 into two patches:
   [PATCH v7 15/16] iommu/vt-d: Only support nesting when nesting caps are 
consistent across iommu units
   [PATCH v7 16/16] iommu/vt-d: Support reporting nesting capability info

v2 -> v3:
*) remove cap/ecap_mask in iommu_nesting_info.
---
 drivers/iommu/intel/cap_audit.h |  7 
 drivers/iommu/intel/iommu.c | 68 -
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/cap_audit.h b/drivers/iommu/intel/cap_audit.h
index 74cfccae0e81..787e98282a02 100644
--- a/drivers/iommu/intel/cap_audit.h
+++ b/drivers/iommu/intel/cap_audit.h
@@ -60,6 +60,13 @@
 #define ECAP_QI_MASK   BIT_ULL(1)
 #define ECAP_C_MASKBIT_ULL(0)
 
+/* Capabilities related to nested translation */
+#define VTD_CAP_MASK   (CAP_FL1GP_MASK | CAP_FL5LP_MASK)
+
+#define VTD_ECAP_MASK  (ECAP_PRS_MASK | ECAP_ERS_MASK | \
+ECAP_SRS_MASK | ECAP_EAFS_MASK | \
+ECAP_PASID_MASK)
+
 /*
  * u64 intel_iommu_cap_sanity, intel_iommu_ecap_sanity will be adjusted as each
  * IOMMU gets audited.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4409d86b4e18..f7432fb1c6ea 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5508,13 +5508,79 @@ static bool domain_use_flush_queue(void)
return r;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+   struct iommu_nesting_info *info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+   struct device_domain_info *domain_info;
+   struct iommu_nesting_info_vtd vtd;
+   unsigned int size;
+
+   if (!info)
+   return -EINVAL;
+
+   if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+   return -ENODEV;
+
+   size = sizeof(struct iommu_nesting_info);
+   /*
+* if provided buffer size is smaller than expected, should
+* return 0 and also the expected buffer size to caller.
+*/
+   if (info->argsz < size) {
+   info->argsz = size;
+   return 0;
+   }
+
+   /*
+* arbitrary select the first domain_info as all nesting
+* related capabilities should be consistent across iommu
+* units.
+*/
+   domain_info = list_first_entry(_domain->devices,
+  struct device_domain_info, link);
+   cap &= domain_info->iommu->cap;
+   ecap &= domain_info->iommu->ecap;
+
+   info->addr_width = dmar_domain->gaw;
+   info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+   info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
+IOMMU_NESTING_FEAT_CACHE_INVLD;
+   info->pasid_bits = ilog2(intel_pasid_max_id);
+   memset(>padding, 0x0, 12);
+
+   vtd.flags = 0;
+   memset(, 0x0, 12);
+   vtd.cap_reg = cap & VTD_CAP_MASK;
+   vtd.ecap_reg = ecap & VTD_ECAP_MASK;
+
+   memcpy(>vendor.vtd, , sizeof(vtd));
+   return 0;
+}
+
 static int
 intel_iommu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
switch (domain->type) {
case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   {
+   struct iommu_nesting_info *info =
+   (struct iommu_nesting_info *)data;
+   unsigned long flags;
+   int ret;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   ret = intel_iommu_get_nesting_info(domain, info);
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return ret;
+   }
+   default:
+   return -ENODEV;
+   }
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] fork: Clear PASID for new mm

2021-03-02 Thread Jean-Philippe Brucker

On Mon, Mar 01, 2021 at 03:00:11PM -0800, Jacob Pan wrote:
> > functionality is not a problem without this patch on x86. But I think
> I feel the reason that x86 doesn't care is that mm->pasid is not used
> unless bind_mm is called.

I think vt-d also maintains the global_svm_list, that tells whether a
PASID is already allocated for the mm. The SMMU driver relies only on
iommu_sva_alloc_pasid() for this.

> For the fork children even mm->pasid is non-zero,
> it has no effect since it is not loaded onto MSRs.
> Perhaps you could also add a check or WARN_ON(!mm->pasid) in load_pasid()?
> 
> > we do need to have this patch in the kernel because PASID is per addr
> > space and two addr spaces shouldn't have the same PASID.
> > 
> Agreed.
> 
> > Who will accept this patch?

It's not clear from get_maintainers.pl, but I guess it should go via
linux-mm. Since the list wasn't cc'd on the original patch, I resent it.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-03-02 Thread Jean-Philippe Brucker

When handling faults from the event or PRI queue, we need to find the
struct device associated with a SID. Add a rb_tree to keep track of
SIDs.

Acked-by: Jonathan Cameron 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 157 
 2 files changed, 140 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a..7b15b7580c6e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -639,6 +639,15 @@ struct arm_smmu_device {
 
/* IOMMU core code handle */
struct iommu_device iommu;
+
+   struct rb_root  streams;
+   struct mutexstreams_mutex;
+};
+
+struct arm_smmu_stream {
+   u32 id;
+   struct arm_smmu_master  *master;
+   struct rb_node  node;
 };
 
 /* SMMU private data for each master */
@@ -647,8 +656,8 @@ struct arm_smmu_master {
struct device   *dev;
struct arm_smmu_domain  *domain;
struct list_headdomain_head;
-   u32 *sids;
-   unsigned intnum_sids;
+   struct arm_smmu_stream  *streams;
+   unsigned intnum_streams;
boolats_enabled;
boolsva_enabled;
struct list_headbonds;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7edce914c45e..d148bb6d4289 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -909,8 +909,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
*smmu_domain,
 
spin_lock_irqsave(_domain->devices_lock, flags);
list_for_each_entry(master, _domain->devices, domain_head) {
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.cfgi.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.cfgi.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu, , );
}
}
@@ -1355,6 +1355,28 @@ static int arm_smmu_init_l2_strtab(struct 
arm_smmu_device *smmu, u32 sid)
return 0;
 }
 
+/* smmu->streams_mutex must be held */
+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+   struct rb_node *node;
+   struct arm_smmu_stream *stream;
+
+   node = smmu->streams.rb_node;
+   while (node) {
+   stream = rb_entry(node, struct arm_smmu_stream, node);
+   if (stream->id < sid)
+   node = node->rb_right;
+   else if (stream->id > sid)
+   node = node->rb_left;
+   else
+   return stream->master;
+   }
+
+   return NULL;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -1588,8 +1610,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master 
*master)
 
arm_smmu_atc_inv_to_cmd(0, 0, 0, );
 
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.atc.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_issue_cmd(master->smmu, );
}
 
@@ -1632,8 +1654,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
*smmu_domain, int ssid,
if (!master->ats_enabled)
continue;
 
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.atc.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu_domain->smmu, , );
}
}
@@ -2065,13 +2087,13 @@ static void arm_smmu_install_ste_for_dev(struct 
arm_smmu_master *master)
int i, j;
struct arm_smmu_device *smmu = master->smmu;
 
-   for (i = 0; i < master->num_sids; ++i) {
-   u32 sid = master->sids[i];
+   for (i = 0; i < master->num_streams; ++i) {
+   u32 sid = master->streams[i].id;
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
 
/* Bridged PCI devices may end up with duplicated IDs */
for (j = 0; j < i; j++)
-   if (master->sids[j] == sid)
+   if (master->streams[j].id == sid)
break;
if (j < i)
continue;
@@ -2345,11 +2367,101 @@

[PATCH v13 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

2021-03-02 Thread Jean-Philippe Brucker

Some devices manage I/O Page Faults (IOPF) themselves instead of relying
on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without
mandating IOMMU-managed IOPF. The other device drivers now need to first
enable IOMMU_DEV_FEAT_IOPF before enabling IOMMU_DEV_FEAT_SVA. Enabling
IOMMU_DEV_FEAT_IOPF on its own doesn't have any effect visible to the
device driver, it is used in combination with other features.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
Cc: Arnd Bergmann 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: Will Deacon 
Cc: Zhangfei Gao 
Cc: Zhou Wang 
---
 include/linux/iommu.h | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 16ce75693d83..45c4eb372f56 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -156,10 +156,24 @@ struct iommu_resv_region {
enum iommu_resv_typetype;
 };
 
-/* Per device IOMMU features */
+/**
+ * enum iommu_dev_features - Per device IOMMU features
+ * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
+ * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
+ * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally
+ *  enabling %IOMMU_DEV_FEAT_SVA requires
+ *  %IOMMU_DEV_FEAT_IOPF, but some devices manage I/O Page
+ *  Faults themselves instead of relying on the IOMMU. When
+ *  supported, this feature must be enabled before and
+ *  disabled after %IOMMU_DEV_FEAT_SVA.
+ *
+ * Device drivers query whether a feature is supported using
+ * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
+ */
 enum iommu_dev_features {
-   IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
-   IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
+   IOMMU_DEV_FEAT_AUX,
+   IOMMU_DEV_FEAT_SVA,
+   IOMMU_DEV_FEAT_IOPF,
 };
 
 #define IOMMU_PASID_INVALID(-1U)
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 06/10] iommu: Add a page fault handler

2021-03-02 Thread Jean-Philippe Brucker

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCIe PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
fault report API"). Add a page fault handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them
to IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, most commonly in an IRQ handler, the
IOMMU driver reports the fault using iommu_report_device_fault(), which
calls the registered handler. The page fault handler then calls the mm
fault handler, and reports either success or failure with
iommu_page_response(). After the handler succeeds, the hardware retries
the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Reviewed-by: Eric Auger 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Makefile|   1 +
 drivers/iommu/iommu-sva-lib.h |  53 
 include/linux/iommu.h |   2 +
 drivers/iommu/io-pgfault.c| 461 ++
 4 files changed, 517 insertions(+)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 61bd30cd8369..60fafc23dee6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
+obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index b40990aef3fd..031155010ca8 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
min, ioasid_t max);
 void iommu_sva_free_pasid(struct mm_struct *mm);
 struct mm_struct *iommu_sva_find(ioasid_t pasid);
 
+/* I/O Page fault */
+struct device;
+struct iommu_fault;
+struct iopf_queue;
+
+#ifdef CONFIG_IOMMU_SVA_LIB
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_remove_device(struct iopf_queue *queue,
+struct device *dev);
+int iopf_queue_flush_dev(struct device *dev);
+struct iopf_queue *iopf_queue_alloc(const char *name);
+void iopf_queue_free(struct iopf_queue *queue);
+int iopf_queue_discard_partial(struct iopf_queue *queue);
+
+#else /* CONFIG_IOMMU_SVA_LIB */
+static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+   struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct iopf_queue *queue,
+  struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline struct iopf_queue *iopf_queue_alloc(const char *name)
+{
+   return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+   return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA_LIB */
 #endif /* _IOMMU_SVA_LIB_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 45c4eb372f56..86d688c4418f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -367,6 +367,7 @@ struct iommu_fault_param {
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @iopf_param: I/O Page Fault queue and data
  * @fwspec: IOMMU fwspec data
  * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
@@ -377,6 +378,7 @@ struct iommu_fault_param {
 struct dev_iommu {
struct mutex lock;
struct iommu_fault_param*fault_param;
+   struct iopf_device_param*iopf_param;
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index ..1df8c1dcae77
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,461 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ */
+
+#include 
+#include 
+#include

[PATCH v13 08/10] dt-bindings: document stall property for IOMMU masters

2021-03-02 Thread Jean-Philippe Brucker

On ARM systems, some platform devices behind an IOMMU may support stall,
which is the ability to recover from page faults. Let the firmware tell us
when a device supports stall.

Reviewed-by: Eric Auger 
Reviewed-by: Rob Herring 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/iommu/iommu.txt| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt 
b/Documentation/devicetree/bindings/iommu/iommu.txt
index 3c36334e4f94..26ba9e530f13 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -92,6 +92,24 @@ Optional properties:
   tagging DMA transactions with an address space identifier. By default,
   this is 0, which means that the device only has one address space.
 
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
 
 Notes:
 ==
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 09/10] ACPI/IORT: Enable stall support for platform devices

2021-03-02 Thread Jean-Philippe Brucker

Copy the "Stall supported" bit, that tells whether a named component
supports stall, into the dma-can-stall device property.

Acked-by: Jonathan Cameron 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/iort.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 3912a1f6058e..0828f70cb782 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -968,13 +968,15 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 
alias, void *data)
 static void iort_named_component_init(struct device *dev,
  struct acpi_iort_node *node)
 {
-   struct property_entry props[2] = {};
+   struct property_entry props[3] = {};
struct acpi_iort_named_component *nc;
 
nc = (struct acpi_iort_named_component *)node->node_data;
props[0] = PROPERTY_ENTRY_U32("pasid-num-bits",
  FIELD_GET(ACPI_IORT_NC_PASID_BITS,
nc->node_flags));
+   if (nc->node_flags & ACPI_IORT_NC_STALL_SUPPORTED)
+   props[1] = PROPERTY_ENTRY_BOOL("dma-can-stall");
 
if (device_add_properties(dev, props))
dev_warn(dev, "Could not add device properties\n");
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-03-02 Thread Jean-Philippe Brucker

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCIe PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked
and the OS is given a chance to fix the page tables and retry the
transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler.
If the fault is recoverable, it will call us back to terminate or
continue the stall.

To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which
initializes the fault queue for the device.

Tested-by: Zhangfei Gao 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  43 
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  59 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 196 +-
 3 files changed, 283 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 7b15b7580c6e..59af0bbd2f7b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -354,6 +354,13 @@
 #define CMDQ_PRI_1_GRPID   GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESPGENMASK_ULL(13, 12)
 
+#define CMDQ_RESUME_0_RESP_TERM0UL
+#define CMDQ_RESUME_0_RESP_RETRY   1UL
+#define CMDQ_RESUME_0_RESP_ABORT   2UL
+#define CMDQ_RESUME_0_RESP GENMASK_ULL(13, 12)
+#define CMDQ_RESUME_0_SID  GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_1_STAG GENMASK_ULL(15, 0)
+
 #define CMDQ_SYNC_0_CS GENMASK_ULL(13, 12)
 #define CMDQ_SYNC_0_CS_NONE0
 #define CMDQ_SYNC_0_CS_IRQ 1
@@ -370,6 +377,25 @@
 
 #define EVTQ_0_ID  GENMASK_ULL(7, 0)
 
+#define EVT_ID_TRANSLATION_FAULT   0x10
+#define EVT_ID_ADDR_SIZE_FAULT 0x11
+#define EVT_ID_ACCESS_FAULT0x12
+#define EVT_ID_PERMISSION_FAULT0x13
+
+#define EVTQ_0_SSV (1UL << 11)
+#define EVTQ_0_SSIDGENMASK_ULL(31, 12)
+#define EVTQ_0_SID GENMASK_ULL(63, 32)
+#define EVTQ_1_STAGGENMASK_ULL(15, 0)
+#define EVTQ_1_STALL   (1UL << 31)
+#define EVTQ_1_PnU (1UL << 33)
+#define EVTQ_1_InD (1UL << 34)
+#define EVTQ_1_RnW (1UL << 35)
+#define EVTQ_1_S2  (1UL << 39)
+#define EVTQ_1_CLASS   GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ (1UL << 44)
+#define EVTQ_2_ADDRGENMASK_ULL(63, 0)
+#define EVTQ_3_IPA GENMASK_ULL(51, 12)
+
 /* PRI queue */
 #define PRIQ_ENT_SZ_SHIFT  4
 #define PRIQ_ENT_DWORDS((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
@@ -464,6 +490,13 @@ struct arm_smmu_cmdq_ent {
enum pri_resp   resp;
} pri;
 
+   #define CMDQ_OP_RESUME  0x44
+   struct {
+   u32 sid;
+   u16 stag;
+   u8  resp;
+   } resume;
+
#define CMDQ_OP_CMD_SYNC0x46
struct {
u64 msiaddr;
@@ -522,6 +555,7 @@ struct arm_smmu_cmdq_batch {
 
 struct arm_smmu_evtq {
struct arm_smmu_queue   q;
+   struct iopf_queue   *iopf;
u32 max_stalls;
 };
 
@@ -659,7 +693,9 @@ struct arm_smmu_master {
struct arm_smmu_stream  *streams;
unsigned intnum_streams;
boolats_enabled;
+   boolstall_enabled;
boolsva_enabled;
+   booliopf_enabled;
struct list_headbonds;
unsigned intssid_bits;
 };
@@ -678,6 +714,7 @@ struct arm_smmu_domain {
 
struct io_pgtable_ops   *pgtbl_ops;
boolnon_strict;
+   boolstall_enabled;
atomic_tnr_ats_masters;
 
enum arm_smmu_domain_stage  stage;
@@ -719,6 +756,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
*master);
 bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
+bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
 struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
void *drvdata);
 void

[PATCH v13 01/10] iommu: Fix comment for struct iommu_fwspec

2021-03-02 Thread Jean-Philippe Brucker

Commit 986d5ecc5699 ("iommu: Move fwspec->iommu_priv to struct
dev_iommu") removed iommu_priv from fwspec and commit 5702ee24182f
("ACPI/IORT: Check ATS capability in root complex nodes") added @flags.
Update the struct doc.

Acked-by: Jonathan Cameron 
Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..1d422bf722a1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -571,7 +571,7 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
  * struct iommu_fwspec - per-device IOMMU instance data
  * @ops: ops for this device's IOMMU
  * @iommu_fwnode: firmware handle for this device's IOMMU
- * @iommu_priv: IOMMU driver private data for this device
+ * @flags: IOMMU_FWSPEC_* flags
  * @num_pasid_bits: number of PASID bits supported by this device
  * @num_ids: number of associated device IDs
  * @ids: IDs which this device may present to the IOMMU
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 00/10] iommu: I/O page faults for SMMUv3

2021-03-02 Thread Jean-Philippe Brucker

Add stall support to the SMMUv3 driver, along with a common I/O Page
Fault handler.

Since v12 [1]:
* Fixed failure path of arm_smmu_insert_master(), in patch 07 (Keqian
  Zhu)
* In arm_smmu_handle_evt(), patch 10, don't report IPA field on stage-1
  faults, and report accurate fault reason (Eric Auger)
* Fix possible use-after-free in arm_smmu_handle_evt(), patch 10: if a
  master is removed while we handle its events, we could in theory
  dereference a freed master struct. Hold streams_mutex while using a
  master struct obtained with arm_smmu_find_master().


Future work regarding IOPF:
* Keep stall disabled by default, only enable it per CD when drivers
  request it [2][3].
* Add PRI support to SMMUv3.
* Route all recoverable faults through io-pgfault.c, so we can track
  partial faults better [4].
* Nested IOPF [5].

[1] 
https://lore.kernel.org/linux-iommu/20210127154322.3959196-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/22fa4120-eadf-20d5-0d0a-9935aa0f1...@hisilicon.com/
[3] https://lore.kernel.org/linux-iommu/YAhui7UOw7743shI@myrica/
[4] https://lore.kernel.org/kvm/YB0f5Yno9frihQq4@myrica/
[5] 
https://lore.kernel.org/linux-acpi/mwhpr11mb188653af6efa0e55de17815f8c...@mwhpr11mb1886.namprd11.prod.outlook.com/

Jean-Philippe Brucker (10):
  iommu: Fix comment for struct iommu_fwspec
  iommu/arm-smmu-v3: Use device properties for pasid-num-bits
  iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA
  iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF
  uacce: Enable IOMMU_DEV_FEAT_IOPF
  iommu: Add a page fault handler
  iommu/arm-smmu-v3: Maintain a SID->device structure
  dt-bindings: document stall property for IOMMU masters
  ACPI/IORT: Enable stall support for platform devices
  iommu/arm-smmu-v3: Add stall support for platform devices

 drivers/iommu/Makefile|   1 +
 .../devicetree/bindings/iommu/iommu.txt   |  18 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  56 ++-
 drivers/iommu/iommu-sva-lib.h |  53 ++
 include/linux/iommu.h |  26 +-
 drivers/acpi/arm64/iort.c |  15 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  59 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 354 --
 drivers/iommu/intel/iommu.c   |  11 +-
 drivers/iommu/io-pgfault.c| 461 ++
 drivers/iommu/of_iommu.c  |   5 -
 drivers/misc/uacce/uacce.c|  39 +-
 12 files changed, 1024 insertions(+), 74 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 04/10] iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF

2021-03-02 Thread Jean-Philippe Brucker

Allow drivers to query and enable IOMMU_DEV_FEAT_IOPF, which amounts to
checking whether PRI is enabled.

Reviewed-by: Lu Baolu 
Signed-off-by: Jean-Philippe Brucker 
---
Cc: David Woodhouse 
Cc: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee0932307d64..c5c5fd444779 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5343,6 +5343,8 @@ static int siov_find_pci_dvsec(struct pci_dev *pdev)
 static bool
 intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat)
 {
+   struct device_domain_info *info = get_domain_info(dev);
+
if (feat == IOMMU_DEV_FEAT_AUX) {
int ret;
 
@@ -5357,13 +5359,13 @@ intel_iommu_dev_has_feat(struct device *dev, enum 
iommu_dev_features feat)
return !!siov_find_pci_dvsec(to_pci_dev(dev));
}
 
-   if (feat == IOMMU_DEV_FEAT_SVA) {
-   struct device_domain_info *info = get_domain_info(dev);
+   if (feat == IOMMU_DEV_FEAT_IOPF)
+   return info && info->pri_supported;
 
+   if (feat == IOMMU_DEV_FEAT_SVA)
return info && (info->iommu->flags & VTD_FLAG_SVM_CAPABLE) &&
info->pasid_supported && info->pri_supported &&
info->ats_supported;
-   }
 
return false;
 }
@@ -5374,6 +5376,9 @@ intel_iommu_dev_enable_feat(struct device *dev, enum 
iommu_dev_features feat)
if (feat == IOMMU_DEV_FEAT_AUX)
return intel_iommu_enable_auxd(dev);
 
+   if (feat == IOMMU_DEV_FEAT_IOPF)
+   return intel_iommu_dev_has_feat(dev, feat) ? 0 : -ENODEV;
+
if (feat == IOMMU_DEV_FEAT_SVA) {
struct device_domain_info *info = get_domain_info(dev);
 
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 05/10] uacce: Enable IOMMU_DEV_FEAT_IOPF

2021-03-02 Thread Jean-Philippe Brucker

The IOPF (I/O Page Fault) feature is now enabled independently from the
SVA feature, because some IOPF implementations are device-specific and
do not require IOMMU support for PCIe PRI or Arm SMMU stall.

Enable IOPF unconditionally when enabling SVA for now. In the future, if
a device driver implementing a uacce interface doesn't need IOPF
support, it will need to tell the uacce module, for example with a new
flag.

Acked-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
---
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Cc: Zhangfei Gao 
Cc: Zhou Wang 
---
 drivers/misc/uacce/uacce.c | 39 +-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index d07af4edfcac..6db7a98486ec 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -385,6 +385,33 @@ static void uacce_release(struct device *dev)
kfree(uacce);
 }
 
+static unsigned int uacce_enable_sva(struct device *parent, unsigned int flags)
+{
+   if (!(flags & UACCE_DEV_SVA))
+   return flags;
+
+   flags &= ~UACCE_DEV_SVA;
+
+   if (iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_IOPF))
+   return flags;
+
+   if (iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)) {
+   iommu_dev_disable_feature(parent, IOMMU_DEV_FEAT_IOPF);
+   return flags;
+   }
+
+   return flags | UACCE_DEV_SVA;
+}
+
+static void uacce_disable_sva(struct uacce_device *uacce)
+{
+   if (!(uacce->flags & UACCE_DEV_SVA))
+   return;
+
+   iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA);
+   iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_IOPF);
+}
+
 /**
  * uacce_alloc() - alloc an accelerator
  * @parent: pointer of uacce parent device
@@ -404,11 +431,7 @@ struct uacce_device *uacce_alloc(struct device *parent,
if (!uacce)
return ERR_PTR(-ENOMEM);
 
-   if (flags & UACCE_DEV_SVA) {
-   ret = iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA);
-   if (ret)
-   flags &= ~UACCE_DEV_SVA;
-   }
+   flags = uacce_enable_sva(parent, flags);
 
uacce->parent = parent;
uacce->flags = flags;
@@ -432,8 +455,7 @@ struct uacce_device *uacce_alloc(struct device *parent,
return uacce;
 
 err_with_uacce:
-   if (flags & UACCE_DEV_SVA)
-   iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA);
+   uacce_disable_sva(uacce);
kfree(uacce);
return ERR_PTR(ret);
 }
@@ -487,8 +509,7 @@ void uacce_remove(struct uacce_device *uacce)
mutex_unlock(>queues_lock);
 
/* disable sva now since no opened queues */
-   if (uacce->flags & UACCE_DEV_SVA)
-   iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA);
+   uacce_disable_sva(uacce);
 
if (uacce->cdev)
cdev_device_del(uacce->cdev, >dev);
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 02/10] iommu/arm-smmu-v3: Use device properties for pasid-num-bits

2021-03-02 Thread Jean-Philippe Brucker

The pasid-num-bits property shouldn't need a dedicated fwspec field,
it's a job for device properties. Add properties for IORT, and access
the number of PASID bits using device_property_read_u32().

Suggested-by: Robin Murphy 
Acked-by: Jonathan Cameron 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/iommu.h   |  2 --
 drivers/acpi/arm64/iort.c   | 13 +++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
 drivers/iommu/of_iommu.c|  5 -
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1d422bf722a1..16ce75693d83 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -572,7 +572,6 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
  * @ops: ops for this device's IOMMU
  * @iommu_fwnode: firmware handle for this device's IOMMU
  * @flags: IOMMU_FWSPEC_* flags
- * @num_pasid_bits: number of PASID bits supported by this device
  * @num_ids: number of associated device IDs
  * @ids: IDs which this device may present to the IOMMU
  */
@@ -580,7 +579,6 @@ struct iommu_fwspec {
const struct iommu_ops  *ops;
struct fwnode_handle*iommu_fwnode;
u32 flags;
-   u32 num_pasid_bits;
unsigned intnum_ids;
u32 ids[];
 };
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 2494138a6905..3912a1f6058e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -968,15 +968,16 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 
alias, void *data)
 static void iort_named_component_init(struct device *dev,
  struct acpi_iort_node *node)
 {
+   struct property_entry props[2] = {};
struct acpi_iort_named_component *nc;
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   if (!fwspec)
-   return;
 
nc = (struct acpi_iort_named_component *)node->node_data;
-   fwspec->num_pasid_bits = FIELD_GET(ACPI_IORT_NC_PASID_BITS,
-  nc->node_flags);
+   props[0] = PROPERTY_ENTRY_U32("pasid-num-bits",
+ FIELD_GET(ACPI_IORT_NC_PASID_BITS,
+   nc->node_flags));
+
+   if (device_add_properties(dev, props))
+   dev_warn(dev, "Could not add device properties\n");
 }
 
 static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..7edce914c45e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2392,7 +2392,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
}
}
 
-   master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+   device_property_read_u32(dev, "pasid-num-bits", >ssid_bits);
+   master->ssid_bits = min(smmu->ssid_bits, master->ssid_bits);
 
/*
 * Note that PASID must be enabled before, and disabled after ATS:
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e505b9130a1c..a9d2df001149 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -210,11 +210,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
 of_pci_iommu_init, );
} else {
err = of_iommu_configure_device(master_np, dev, id);
-
-   fwspec = dev_iommu_fwspec_get(dev);
-   if (!err && fwspec)
-   of_property_read_u32(master_np, "pasid-num-bits",
->num_pasid_bits);
}
 
/*
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] mm/fork: Clear PASID for new mm

2021-03-02 Thread Jean-Philippe Brucker

From: Fenghua Yu 

When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Reviewed-by: Tony Luck 
Signed-off-by: Fenghua Yu 
Signed-off-by: Jean-Philippe Brucker 
---
This patch was part of the series introducing mm->pasid, but got lost
along the way [1]. It still makes sense to have it, because each address
space has a different PASID. And the IOMMU code in iommu_sva_alloc_pasid()
expects the pasid field of a new mm struct to be cleared.

[1] 
https://lore.kernel.org/linux-iommu/ydgh53acqht+t...@otcwcpicx3.sc.intel.com/
---
 include/linux/mm_types.h | 1 +
 kernel/fork.c| 8 
 2 files changed, 9 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0974ad501a47..6613b26a8894 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,6 +23,7 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+#define INIT_PASID 0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211..808af2cc8ab6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -994,6 +994,13 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_IOMMU_SUPPORT
+   mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1024,6 +1031,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+   mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-- 
2.30.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers

2021-03-02 Thread Jordan Crouse

On Tue, Mar 02, 2021 at 12:17:24PM +, Robin Murphy wrote:
> On 2021-02-25 17:51, Jordan Crouse wrote:
> > Call report_iommu_fault() to allow upper-level drivers to register their
> > own fault handlers.
> > 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index d8c6bfde6a61..0f3a9b5f3284 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> > *dev)
> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > struct arm_smmu_device *smmu = smmu_domain->smmu;
> > int idx = smmu_domain->cfg.cbndx;
> > +   int ret;
> > fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > if (!(fsr & ARM_SMMU_FSR_FAULT))
> > @@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, 
> > void *dev)
> > iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > -   dev_err_ratelimited(smmu->dev,
> > -   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cbfrsynra=0x%x, cb=%d\n",
> > +   ret = report_iommu_fault(domain, dev, iova,
> 
> Beware that "dev" here is not a struct device, so this isn't right. I'm not
> entirely sure what we *should* be passing here, since we can't easily
> attribute a context fault to a specific client device, and passing the IOMMU
> device seems a bit dubious too, so maybe just NULL?

Agreed. The GPU doesn't use it and I doubt anything else would either since the
SMMU device is opaque to the leaf driver.

Jordan

> Robin.
> 
> > +   fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : 
> > IOMMU_FAULT_READ);
> > +
> > +   if (ret == -ENOSYS)
> > +   dev_err_ratelimited(smmu->dev,
> > +   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cbfrsynra=0x%x, cb=%d\n",
> > fsr, iova, fsynr, cbfrsynra, idx);
> > arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> > 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 06/10] iommu: Add a page fault handler

2021-03-02 Thread Jacob Pan

Hi Jean-Philippe,

A few comments from the p.o.v of converting VT-d to this framework. Mostly
about potential optimization. I think VT-d SVA code will be able to use this
work.
+Ashok provided many insight.

FWIW,
Reviewed-by:Jacob Pan 

On Tue,  2 Mar 2021 10:26:42 +0100, Jean-Philippe Brucker
 wrote:

> Some systems allow devices to handle I/O Page Faults in the core mm. For
> example systems implementing the PCIe PRI extension or Arm SMMU stall
> model. Infrastructure for reporting these recoverable page faults was
> added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
> fault report API"). Add a page fault handler for host SVA.
> 
> IOMMU driver can now instantiate several fault workqueues and link them
> to IOPF-capable devices. Drivers can choose between a single global
> workqueue, one per IOMMU device, one per low-level fault queue, one per
> domain, etc.
> 
> When it receives a fault event, most commonly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault(), which
> calls the registered handler. The page fault handler then calls the mm
> fault handler, and reports either success or failure with
> iommu_page_response(). After the handler succeeds, the hardware retries
> the access.
> 
> The iopf_param pointer could be embedded into iommu_fault_param. But
> putting iopf_param into the iommu_param structure allows us not to care
> about ordering between calls to iopf_queue_add_device() and
> iommu_register_device_fault_handler().
> 
> Reviewed-by: Eric Auger 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva-lib.h |  53 
>  include/linux/iommu.h |   2 +
>  drivers/iommu/io-pgfault.c| 461 ++
>  4 files changed, 517 insertions(+)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 61bd30cd8369..60fafc23dee6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
> +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index b40990aef3fd..031155010ca8 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm,
> ioasid_t min, ioasid_t max); void iommu_sva_free_pasid(struct mm_struct
> *mm); struct mm_struct *iommu_sva_find(ioasid_t pasid);
>  
> +/* I/O Page fault */
> +struct device;
> +struct iommu_fault;
> +struct iopf_queue;
> +
> +#ifdef CONFIG_IOMMU_SVA_LIB
> +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
> +
> +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
> +int iopf_queue_remove_device(struct iopf_queue *queue,
> +  struct device *dev);
> +int iopf_queue_flush_dev(struct device *dev);
> +struct iopf_queue *iopf_queue_alloc(const char *name);
> +void iopf_queue_free(struct iopf_queue *queue);
> +int iopf_queue_discard_partial(struct iopf_queue *queue);
> +
> +#else /* CONFIG_IOMMU_SVA_LIB */
> +static inline int iommu_queue_iopf(struct iommu_fault *fault, void
> *cookie) +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_add_device(struct iopf_queue *queue,
> + struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_remove_device(struct iopf_queue *queue,
> +struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_flush_dev(struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline struct iopf_queue *iopf_queue_alloc(const char *name)
> +{
> + return NULL;
> +}
> +
> +static inline void iopf_queue_free(struct iopf_queue *queue)
> +{
> +}
> +
> +static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
> +{
> + return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA_LIB */
>  #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 45c4eb372f56..86d688c4418f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -367,6 +367,7 @@ struct iommu_fault_param {
>   * struct dev_iommu - Collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
> + * @iopf_param:   I/O Page Fault queue and data
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
>   * @priv: IOMMU Driver private data
> @@ -377,6 +378,7 @@ struct iommu_fault_param {
>  struct dev_iommu {
>   struct mutex lock;
>   struct iommu_fault_param*fault_param;
> + struct

Re: [PATCH] swiotlb: Fix type of max_slot

2021-03-02 Thread Konrad Rzeszutek Wilk


On 3/2/21 12:21 PM, Kunihiko Hayashi wrote:

After the refactoring phase, the type of max_slot has changed from unsigned
long to unsigned int. The return type of the function get_max_slots() and
the 4th argument type of iommu_is_span_boundary() are different from the
type of max_slot. Finally, asserts BUG_ON in iommu_is_span_boundary().

Cc: Christoph Hellwig 
Fixes: 567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
Signed-off-by: Kunihiko Hayashi 


I think this is all good. Looking at Linus's master I see:


537 unsigned long max_slots = get_max_slots(boundary_mask);

?


---
  kernel/dma/swiotlb.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 369e4c3..c10e855 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -534,7 +534,7 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
-   unsigned int max_slots = get_max_slots(boundary_mask);
+   unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int iotlb_align_mask =
dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: Fix type of max_slot

2021-03-02 Thread Kunihiko Hayashi

On Tue, 2 Mar 2021 15:20:08 -0500
Konrad Rzeszutek Wilk  wrote:

> On 3/2/21 12:21 PM, Kunihiko Hayashi wrote:
> > After the refactoring phase, the type of max_slot has changed from unsigned
> > long to unsigned int. The return type of the function get_max_slots() and
> > the 4th argument type of iommu_is_span_boundary() are different from the
> > type of max_slot. Finally, asserts BUG_ON in iommu_is_span_boundary().
> > > Cc: Christoph Hellwig 
> > Fixes: 567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
> > Signed-off-by: Kunihiko Hayashi 
> 
> I think this is all good. Looking at Linus's master I see:
> 
> 
> 537 unsigned long max_slots = get_max_slots(boundary_mask);
> 
> ?

Thanks for the information, and sorry for that.

I found it in next-20210226:
567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")

And it has already fixed in next-20210301:
26a7e094783d ("swiotlb: refactor swiotlb_tbl_map_single")

$ git diff 567d877f9a7d..26a7e094783d
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 381c24e..6962cb4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -496,7 +496,7 @@ static int find_slots(struct device *dev, size_t alloc_size)
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
-   unsigned int max_slots = get_max_slots(boundary_mask);
+   unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int nslots = nr_slots(alloc_size), stride = 1;
unsigned int index, wrap, count = 0, i;
unsigned long flags;

Thank you,

---
Best Regards,
Kunihiko Hayashi

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-03-02 Thread Vivek Gautam

Hi Eric,

On Fri, Feb 12, 2021 at 11:44 PM Auger Eric  wrote:
>
> Hi Vivek,
>
> On 2/12/21 11:58 AM, Vivek Gautam wrote:
> > Update nested domain information required for stage1 page table.
>
> s/reuqired/required in the commit title

Oh! my bad.

> >
> > Signed-off-by: Vivek Gautam 
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index c11dd3940583..728018921fae 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> > arm_smmu_domain *smmu_domain,
> >   void *data)
> >  {
> >   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> > + struct arm_smmu_device *smmu = smmu_domain->smmu;
> >   unsigned int size;
> >
> >   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> > arm_smmu_domain *smmu_domain,
> >   return 0;
> >   }
> >
> > - /* report an empty iommu_nesting_info for now */
> > - memset(info, 0x0, size);
> > + /* Update the nesting info as required for stage1 page tables */
> > + info->addr_width = smmu->ias;
> > + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> > + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
> I understood IOMMU_NESTING_FEAT_BIND_PGTBL advertises the requirement to
> bind tables per PASID, ie. passing iommu_gpasid_bind_data.
> In ARM case I guess you plan to use attach/detach_pasid_table API with
> iommu_pasid_table_config struct. So I understood we should add a new
> feature here.

Right, the idea is to let vfio know that we support pasid table binding, and
I thought we could use the same flag. But clearly that's not the case.
Will add a new feature.

> > +  IOMMU_NESTING_FEAT_PAGE_RESP |
> > +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> > + info->pasid_bits = smmu->ssid_bits;
> > + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> > + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> > + memset(>padding, 0x0, 12);
> > + memset(>vendor.smmuv3.padding, 0x0, 9);
> > +
> >   info->argsz = size;
> > +
> spurious new line

Sure, will remove it.

Best regards
Vivek

> >   return 0;
> >  }
> >
> >
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

2021-03-02 Thread Lu Baolu


On 3/2/21 5:26 PM, Jean-Philippe Brucker wrote:

Some devices manage I/O Page Faults (IOPF) themselves instead of relying
on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without
mandating IOMMU-managed IOPF. The other device drivers now need to first
enable IOMMU_DEV_FEAT_IOPF before enabling IOMMU_DEV_FEAT_SVA. Enabling
IOMMU_DEV_FEAT_IOPF on its own doesn't have any effect visible to the
device driver, it is used in combination with other features.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
Cc: Arnd Bergmann 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: Will Deacon 
Cc: Zhangfei Gao 
Cc: Zhou Wang 
---
  include/linux/iommu.h | 20 +---
  1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 16ce75693d83..45c4eb372f56 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -156,10 +156,24 @@ struct iommu_resv_region {
enum iommu_resv_typetype;
  };
  
-/* Per device IOMMU features */

+/**
+ * enum iommu_dev_features - Per device IOMMU features
+ * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
+ * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
+ * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally
+ *  enabling %IOMMU_DEV_FEAT_SVA requires
+ *  %IOMMU_DEV_FEAT_IOPF, but some devices manage I/O Page
+ *  Faults themselves instead of relying on the IOMMU. When
+ *  supported, this feature must be enabled before and
+ *  disabled after %IOMMU_DEV_FEAT_SVA.
+ *
+ * Device drivers query whether a feature is supported using
+ * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
+ */
  enum iommu_dev_features {
-   IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
-   IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
+   IOMMU_DEV_FEAT_AUX,
+   IOMMU_DEV_FEAT_SVA,
+   IOMMU_DEV_FEAT_IOPF,
  };
  
  #define IOMMU_PASID_INVALID	(-1U)




Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 06/10] iommu: Add a page fault handler

2021-03-02 Thread Lu Baolu


Hi Jean,

On 3/2/21 5:26 PM, Jean-Philippe Brucker wrote:

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCIe PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
fault report API"). Add a page fault handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them
to IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, most commonly in an IRQ handler, the
IOMMU driver reports the fault using iommu_report_device_fault(), which
calls the registered handler. The page fault handler then calls the mm
fault handler, and reports either success or failure with
iommu_page_response(). After the handler succeeds, the hardware retries
the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Reviewed-by: Eric Auger 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Jean-Philippe Brucker 


I have tested this framework with the Intel VT-d implementation. It
works as expected. Hence,

Reviewed-by: Lu Baolu 
Tested-by: Lu Baolu 

One possible future optimization is that we could allow the system
administrators to choose between handle PRQs in a workqueue or handle
them synchronously. One research discovered that most of the software
latency of handling a single page fault exists in the schedule part.
Hence, synchronous processing will get shorter software latency if PRQs
are rare and limited.

Best regards,
baolu


---
  drivers/iommu/Makefile|   1 +
  drivers/iommu/iommu-sva-lib.h |  53 
  include/linux/iommu.h |   2 +
  drivers/iommu/io-pgfault.c| 461 ++
  4 files changed, 517 insertions(+)
  create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 61bd30cd8369..60fafc23dee6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
+obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index b40990aef3fd..031155010ca8 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
min, ioasid_t max);
  void iommu_sva_free_pasid(struct mm_struct *mm);
  struct mm_struct *iommu_sva_find(ioasid_t pasid);
  
+/* I/O Page fault */

+struct device;
+struct iommu_fault;
+struct iopf_queue;
+
+#ifdef CONFIG_IOMMU_SVA_LIB
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_remove_device(struct iopf_queue *queue,
+struct device *dev);
+int iopf_queue_flush_dev(struct device *dev);
+struct iopf_queue *iopf_queue_alloc(const char *name);
+void iopf_queue_free(struct iopf_queue *queue);
+int iopf_queue_discard_partial(struct iopf_queue *queue);
+
+#else /* CONFIG_IOMMU_SVA_LIB */
+static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+   struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct iopf_queue *queue,
+  struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline struct iopf_queue *iopf_queue_alloc(const char *name)
+{
+   return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+   return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA_LIB */
  #endif /* _IOMMU_SVA_LIB_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 45c4eb372f56..86d688c4418f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -367,6 +367,7 @@ struct iommu_fault_param {
   * struct dev_iommu - Collection of per-device IOMMU data
   *
   * @fault_param: IOMMU detected device fault reporting data
+ * @iopf_param: I/O Page Fault queue and data
   * @fwspec:IOMMU fwspec data
   * @iommu_dev: IOMMU device this device is linked to
   * @priv:  IOMMU Driver private data
@@ -377,6 +378,7 @@ struct iommu_fault_param {
  struct dev_iommu {

Re: [PATCH v13 06/10] iommu: Add a page fault handler

2021-03-02 Thread Raj, Ashok

On Tue, Mar 02, 2021 at 10:26:42AM +0100, Jean-Philippe Brucker wrote:
[snip]

> +
> +static enum iommu_page_response_code
> +iopf_handle_single(struct iopf_fault *iopf)
> +{
> + vm_fault_t ret;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> + unsigned int access_flags = 0;
> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
> + struct iommu_fault_page_request *prm = >fault.prm;
> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
> +
> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> + return status;
> +
> + mm = iommu_sva_find(prm->pasid);
> + if (IS_ERR_OR_NULL(mm))
> + return status;
> +
> + mmap_read_lock(mm);
> +
> + vma = find_extend_vma(mm, prm->addr);
> + if (!vma)
> + /* Unmapped area */
> + goto out_put_mm;
> +
> + if (prm->perm & IOMMU_FAULT_PERM_READ)
> + access_flags |= VM_READ;
> +
> + if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
> + access_flags |= VM_WRITE;
> + fault_flags |= FAULT_FLAG_WRITE;
> + }
> +
> + if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
> + access_flags |= VM_EXEC;
> + fault_flags |= FAULT_FLAG_INSTRUCTION;
> + }
> +
> + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
> + fault_flags |= FAULT_FLAG_USER;
> +
> + if (access_flags & ~vma->vm_flags)
> + /* Access fault */
> + goto out_put_mm;
> +
> + ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);

Should we add a trace similar to trace_page_fault_user() or kernel in
arch/x86/kernel/mm/fault.c 

or maybe add a perf_sw_event() for device faults? 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-03-02 Thread Lu Baolu


On 3/2/21 6:13 PM, Jacob Pan wrote:

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In guest supervisor shared virtual addressing (SVA), write-protect
should be honored upon guest bind supervisor PASID request.

This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/pasid.c | 3 +++
  include/uapi/linux/iommu.h  | 3 ++-
  2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0b7e0e726ade..b7e39239f539 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -763,6 +763,9 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, 
struct pasid_entry *pte,
return -EINVAL;
}
pasid_set_sre(pte);
+   /* Enable write protect WP if guest requested */
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE)
+   pasid_set_wpe(pte);
}
  
  	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 35d48843acd8..3a9164cc9937 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
  #define IOMMU_SVA_VTD_GPASID_PWT  (1 << 3) /* page-level write through */
  #define IOMMU_SVA_VTD_GPASID_EMTE (1 << 4) /* extended mem type enable */
  #define IOMMU_SVA_VTD_GPASID_CD   (1 << 5) /* PASID-level cache 
disable */
-#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
+#define IOMMU_SVA_VTD_GPASID_WPE   (1 << 6) /* Write protect enable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 7)
__u64 flags;
__u32 pat;
__u32 emt;



Acked-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/4] iommu/vt-d: Enable write protect for supervisor SVM

2021-03-02 Thread Lu Baolu


On 3/2/21 6:13 PM, Jacob Pan wrote:

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In supervisor shared virtual addressing (SVA), where page tables
are shared between CPU and DMA, IOMMU PASID entry WPE bit should match
CR0.WP bit in the CPU.
This patch sets WPE bit for supervisor PASIDs if CR0.WP is set.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/pasid.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0cceaabc3ce6..0b7e0e726ade 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -410,6 +410,15 @@ static inline void pasid_set_sre(struct pasid_entry *pe)
pasid_set_bits(>val[2], 1 << 0, 1);
  }
  
+/*

+ * Setup the WPE(Write Protect Enable) field (Bit 132) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_wpe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 4, 1 << 4);
+}
+
  /*
   * Setup the P(Present) field (Bit 0) of a scalable mode PASID
   * entry.
@@ -553,6 +562,20 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
}
  }
  
+static inline int pasid_enable_wpe(struct pasid_entry *pte)

+{
+   unsigned long cr0 = read_cr0();
+
+   /* CR0.WP is normally set but just to be sure */
+   if (unlikely(!(cr0 & X86_CR0_WP))) {
+   pr_err_ratelimited("No CPU write protect!\n");
+   return -EINVAL;
+   }
+   pasid_set_wpe(pte);
+
+   return 0;
+};
+
  /*
   * Set up the scalable mode pasid table entry for first only
   * translation type.
@@ -584,6 +607,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EINVAL;
}
pasid_set_sre(pte);
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+
}
  
  	if (flags & PASID_FLAG_FL5LP) {




Acked-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

54 matches

Mail list logo