date:20210510

Hi Jason,

On Mon, 10 May 2021 13:39:56 -0300, Jason Gunthorpe  wrote:

> I still think it is smarter to push a group of RID's into a global
> allocation group and accept there are potential downsides with that
> than to try to force a global allocation group on every RID and then
> try to fix the mess that makes for non-ENQCMD devices.
The proposed ioasid_set change in this set has a token for each set of
IOASIDs.

/**
 * struct ioasid_set - Meta data about ioasid_set
 * @nh: List of notifiers private to that set
 * @xa: XArray to store ioasid_set private IDs, can be used for
 *  guest-host IOASID mapping, or just a private IOASID namespace.
 * @token:  Unique to identify an IOASID set
 * @type:   Token types
 * @quota:  Max number of IOASIDs can be allocated within the set
 * @nr_ioasids: Number of IOASIDs currently allocated in the set
 * @id: ID of the set
 */
struct ioasid_set {
struct atomic_notifier_head nh;
struct xarray xa;
void *token;
int type;
int quota;
atomic_t nr_ioasids;
int id;
struct rcu_head rcu;
struct misc_cg *misc_cg; /* For misc cgroup accounting */
};

To satisfy your "give me a PASID for this RID" proposal, can we just use
the RID's struct device as the token? Also add a type field to explicitly
indicate global vs per-set(per-RID). i.e.

ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
   int type, void *private)
Where flags can be:
enum ioasid_hwid_type {
IOASID_HWID_GLOBAL,
IOASID_HWID_PER_SET,
};

We are really talking about the HW IOASID, just a reminder.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

On Mon, May 10, 2021 at 03:28:54PM -0700, Jacob Pan wrote:

> To satisfy your "give me a PASID for this RID" proposal, can we just use
> the RID's struct device as the token? Also add a type field to explicitly
> indicate global vs per-set(per-RID). i.e.

You've got it backwards, the main behavior should be to allocate PASID
per RID.

The special behavior is to bundle a bunch of PASIDs into a grouping
and then say the PASID number space is shared between all the group
members. 

/dev/ioasid should create and own this grouping either implicitly or
explicitly. Jumping ahead to in-kernel APIs has missed the critical
step of defining the uAPI and all the behaviors together in a
completed RFC proposal.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 0/2] Simplify and restrict IOMMU SVA APIs

A couple of small changes to simplify and restrict SVA APIs. The motivation
is to make PASID allocation palatable for cgroup consumptions. Misc cgroup
is merged for v5.13, it can be extended for IOASID as another scalar
resource.

I have not tested on ARM platforms due to availability. Would appreciate
if someone could help with the testing on uacce based SVA usages.

Thanks,

Jacob

ChangeLog:
V4  - fixed a cross-compile error
- rebased to v5.13-rc1 resolved a conflict in intel-svm code

V3  - stop passing mm to sva_bind IOMMU ops, no need to take mm refcount
in the common SVA code.
- deleted flag variable in idxd driver

V2
- retained mm argument in iommu_sva_alloc_pasid()
- keep generic supervisor flag separated from vt-d's SRE
- move flag declaration out of CONFIG_IOMMU_API



Jacob Pan (2):
  iommu/sva: Tighten SVA bind API with explicit flags
  iommu/sva: Remove mm parameter from SVA bind API

 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/dma/idxd/init.c   |  7 ++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 12 ++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  5 ++--
 drivers/iommu/intel/svm.c | 19 ---
 drivers/iommu/iommu-sva-lib.c | 11 +
 drivers/iommu/iommu-sva-lib.h |  2 +-
 drivers/iommu/iommu.c | 13 +--
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/intel-iommu.h   |  3 +--
 include/linux/intel-svm.h | 12 --
 include/linux/iommu.h | 23 ++-
 12 files changed, 54 insertions(+), 57 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:

> +/*
> + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
> + * for access to kernel addresses. No IOTLB flushes are automatically done
> + * for kernel mappings; it is valid only for access to the kernel's static
> + * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> + * A future API addition may permit the use of such ranges, by means of an
> + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> + *
> + * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> + * do such IOTLB flushes automatically.
> + */
> +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)

Huh? That isn't really SVA, can you call it something saner please?

Is it really a PASID that always has all of physical memory mapped
into it? Sounds dangerous. What is it for?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)

2021-05-10 Thread Florian Fainelli



On 5/10/2021 11:15 AM, Julien Grall wrote:
> Hi Christoph,
> 
> On 10/05/2021 09:40, Christoph Hellwig wrote:
>> On Sat, May 08, 2021 at 12:32:37AM +0100, Julien Grall wrote:
>>> The pointer dereferenced seems to suggest that the swiotlb hasn't been
>>> allocated. From what I can tell, this may be because swiotlb_force is
>>> set
>>> to SWIOTLB_NO_FORCE, we will still enable the swiotlb when running on
>>> top
>>> of Xen.
>>>
>>> I am not entirely sure what would be the correct fix. Any opinions?
>>
>> Can you try something like the patch below (not even compile tested, but
>> the intent should be obvious?
>>
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 16a2b2b1c54d..7671bc153fb1 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -44,6 +44,8 @@
>>   #include 
>>   #include 
>>   +#include 
>> +
>>   /*
>>    * We need to be able to catch inadvertent references to memstart_addr
>>    * that occur (potentially in generic code) before
>> arm64_memblock_init()
>> @@ -482,7 +484,7 @@ void __init mem_init(void)
>>   if (swiotlb_force == SWIOTLB_FORCE ||
>>   max_pfn > PFN_DOWN(arm64_dma_phys_limit))
>>   swiotlb_init(1);
>> -    else
>> +    else if (!IS_ENABLED(CONFIG_XEN) || !xen_swiotlb_detect())
>>   swiotlb_force = SWIOTLB_NO_FORCE;
>>     set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
> 
> I have applied the patch on top of 5.13-rc1 and can confirm I am able to
> boot dom0. Are you going to submit the patch?

Sorry about that Julien and thanks Christoph!
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 2/2] iommu/sva: Remove mm parameter from SVA bind API

The mm parameter in iommu_sva_bind_device() is intended for privileged
process perform bind() on behalf of other processes. This use case has
yet to be materialized, let alone potential security implications of
adding kernel hooks without explicit user consent.
In addition, with the agreement that IOASID allocation shall be subject
cgroup limit. It will be inline with misc cgroup proposal if IOASID
allocation as part of the SVA bind is limited to the current task.

Link: https://lore.kernel.org/linux-iommu/20210303160205.151d114e@jacob-builder/
Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c |  2 +-
 drivers/dma/idxd/init.c |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  9 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  5 ++---
 drivers/iommu/intel/svm.c   | 17 +++--
 drivers/iommu/iommu-sva-lib.c   | 11 ++-
 drivers/iommu/iommu-sva-lib.h   |  2 +-
 drivers/iommu/iommu.c   | 14 +-
 drivers/misc/uacce/uacce.c  |  2 +-
 include/linux/intel-iommu.h |  3 +--
 include/linux/iommu.h   |  8 +++-
 11 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 04e3813b613c..875864c832fc 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, 0);
+   sva = iommu_sva_bind_device(dev, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 6f75c29d3616..f7c43d03ad79 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -459,7 +459,7 @@ static int idxd_enable_system_pasid(struct idxd_device 
*idxd)
unsigned int pasid;
struct iommu_sva *sva;
 
-   sva = iommu_sva_bind_device(>pdev->dev, NULL,
+   sva = iommu_sva_bind_device(>pdev->dev,
IOMMU_SVA_BIND_SUPERVISOR);
if (IS_ERR(sva)) {
dev_warn(>pdev->dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 145ceb2fc5da..0c3014e64c77 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -305,10 +305,11 @@ static void arm_smmu_mmu_notifier_put(struct 
arm_smmu_mmu_notifier *smmu_mn)
 }
 
 static struct iommu_sva *
-__arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
+__arm_smmu_sva_bind(struct device *dev)
 {
int ret;
struct arm_smmu_bond *bond;
+   struct mm_struct *mm = current->mm;
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -329,7 +330,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
return ERR_PTR(-ENOMEM);
 
/* Allocate a PASID for this mm if necessary */
-   ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+   ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1);
if (ret)
goto err_free_bond;
 
@@ -354,7 +355,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }
 
 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
+arm_smmu_sva_bind(struct device *dev, unsigned int flags)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
@@ -367,7 +368,7 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, 
unsigned int flags)
return ERR_PTR(-EINVAL);
 
mutex_lock(_lock);
-   handle = __arm_smmu_sva_bind(dev, mm);
+   handle = __arm_smmu_sva_bind(dev);
mutex_unlock(_lock);
return handle;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3a4f24da59d9..485ead85a841 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -716,8 +716,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
*master);
 bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-

[PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

The void* drvdata parameter isn't really used in iommu_sva_bind_device()
API, the current IDXD code "borrows" the drvdata for a VT-d private flag
for supervisor SVA usage.

Supervisor/Privileged mode request is a generic feature. It should be
promoted from the VT-d vendor driver to the generic code.

This patch replaces void* drvdata with a unsigned int flags parameter
and adjusts callers accordingly.

Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/dma/idxd/init.c   |  7 ++-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 ++--
 drivers/iommu/intel/svm.c | 14 --
 drivers/iommu/iommu.c |  9 ++---
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/intel-iommu.h   |  2 +-
 include/linux/intel-svm.h | 12 
 include/linux/iommu.h | 19 ---
 10 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 302cba5ff779..04e3813b613c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, NULL);
+   sva = iommu_sva_bind_device(dev, current->mm, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 2a926bef87f2..6f75c29d3616 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -457,13 +456,11 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
unsigned int pasid;
struct iommu_sva *sva;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
+   sva = iommu_sva_bind_device(>pdev->dev, NULL,
+   IOMMU_SVA_BIND_SUPERVISOR);
if (IS_ERR(sva)) {
dev_warn(>pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..145ceb2fc5da 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,12 +354,15 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }
 
 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
+   if (flags)
+   return ERR_PTR(-EINVAL);
+
if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
return ERR_PTR(-EINVAL);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 46e8c49214a8..3a4f24da59d9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -717,7 +717,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
+   unsigned int flags);
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
@@ -748,7 +748,7 @@ static inline int arm_smmu_master_disable_sva(struct 
arm_smmu_master *master)
 }
 
 static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
return ERR_PTR(-ENODEV);
 }
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea90421..531f03d13bd5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -485,12 +485,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
} else
pasid_max = 1 << 20;
 
-

Re: Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)

2021-05-10 Thread Stefano Stabellini

On Mon, 10 May 2021, Christoph Hellwig wrote:
> On Sat, May 08, 2021 at 12:32:37AM +0100, Julien Grall wrote:
> > The pointer dereferenced seems to suggest that the swiotlb hasn't been 
> > allocated. From what I can tell, this may be because swiotlb_force is set 
> > to SWIOTLB_NO_FORCE, we will still enable the swiotlb when running on top 
> > of Xen.
> >
> > I am not entirely sure what would be the correct fix. Any opinions?
> 
> Can you try something like the patch below (not even compile tested, but
> the intent should be obvious?
> 
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 16a2b2b1c54d..7671bc153fb1 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -44,6 +44,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  /*
>   * We need to be able to catch inadvertent references to memstart_addr
>   * that occur (potentially in generic code) before arm64_memblock_init()
> @@ -482,7 +484,7 @@ void __init mem_init(void)
>   if (swiotlb_force == SWIOTLB_FORCE ||
>   max_pfn > PFN_DOWN(arm64_dma_phys_limit))
>   swiotlb_init(1);
> - else
> + else if (!IS_ENABLED(CONFIG_XEN) || !xen_swiotlb_detect())
>   swiotlb_force = SWIOTLB_NO_FORCE;
>  
>   set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);

The "IS_ENABLED(CONFIG_XEN)" is not needed as the check is already part
of xen_swiotlb_detect().

But let me ask another question first. Do you think it makes sense to have:

if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;

at the beginning of swiotlb_late_init_with_tbl? I am asking because
swiotlb_late_init_with_tbl is meant for special late initializations,
right? It shouldn't really matter the presence or absence of
SWIOTLB_NO_FORCE in regards to swiotlb_late_init_with_tbl. Also the
commit message for "swiotlb: Make SWIOTLB_NO_FORCE perform no
allocation" says that "If a platform was somehow setting
swiotlb_no_force and a later call to swiotlb_init() was to be made we
would still be proceeding with allocating the default SWIOTLB size
(64MB)." Our case here is very similar, right? So the allocation should
proceed?

Which brings me to a separate unrelated issue, still affecting the path
xen_swiotlb_init -> swiotlb_late_init_with_tbl. If swiotlb_init(1) is
called by mem_init then swiotlb_late_init_with_tbl will fail due to the
check:

/* protect against double initialization */
if (WARN_ON_ONCE(io_tlb_default_mem))
return -ENOMEM;

xen_swiotlb_init is meant to ask Xen to make a bunch of pages physically
contiguous. Then, it initializes the swiotlb buffer based on those
pages. So it is a problem that swiotlb_late_init_with_tbl refuses to
continue. However, in practice it is not a problem today because on ARM
we don't actually make any special requests to Xen to make the pages
physically contiguous (yet). See the empty implementation of
arch/arm/xen/mm.c:xen_create_contiguous_region. I don't know about x86.

So maybe we should instead do something like the appended?

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index f8f07469d259..f5a3638d1dee 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -152,6 +152,7 @@ static int __init xen_mm_init(void)
struct gnttab_cache_flush cflush;
if (!xen_swiotlb_detect())
return 0;
+   swiotlb_exit();
xen_swiotlb_init();

cflush.op = 0;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..f17be37298a7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -285,9 +285,6 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;

-   if (swiotlb_force == SWIOTLB_NO_FORCE)
-   return 0;
-
/* protect against double initialization */
if (WARN_ON_ONCE(io_tlb_default_mem))
return -ENOMEM;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe  wrote:

> On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> 
> > +/*
> > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > used only
> > + * for access to kernel addresses. No IOTLB flushes are automatically
> > done
> > + * for kernel mappings; it is valid only for access to the kernel's
> > static
> > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > mappings.
> > + * A future API addition may permit the use of such ranges, by means
> > of an
> > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > + *
> > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > to
> > + * do such IOTLB flushes automatically.
> > + */
> > +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)  
> 
> Huh? That isn't really SVA, can you call it something saner please?
> 
This is shared kernel virtual address, I am following the SVA lib naming
since this is where the flag will be used. Why this is not SVA? Kernel
virtual address is still virtual address. Is it due to direct map?

> Is it really a PASID that always has all of physical memory mapped
> into it? Sounds dangerous. What is it for?
> 

Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
bit.

Perhaps we should limit that to trusted device, e.g. RCIEP device?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-10 Thread Lu Baolu


Hi Keqian,

On 5/10/21 7:07 PM, Keqian Zhu wrote:

I suppose this interface is to ask the vendor IOMMU driver to check
whether each device/iommu in the domain supports dirty bit tracking.
But what will happen if new devices with different tracking capability
are added afterward?

Yep, this is considered in the vfio part. We will query again after attaching or
detaching devices from the domain.  When the domain becomes capable, we enable
dirty log for it. When it becomes not capable, we disable dirty log for it.

If that's the case, why not putting this logic in the iommu subsystem so
that it doesn't need to be duplicate in different upper layers?

For example, add something like dirty_page_trackable in the struct of
iommu_domain and ask the vendor iommu driver to update it once any
device is added/removed to/from the domain. It's also better to disallow

If we do it, the upper layer still needs to query the capability from domain 
and switch
dirty log tracking for it. Or do you mean the domain can switch dirty log 
tracking automatically
when its capability change? If so, I think we're lack of some flexibility. The 
upper layer
may have it's own policy, such as only enable dirty log tracking when all 
domains are capable,
and disable dirty log tracking when just one domain is not capable.


I may not get you.

Assume that dirty_page_trackable is an attribution of an iommu_domain.
This attribution might be changed once a new device (with different
capability) added or removed. So it should be updated every time a new
device is attached or detached. This work could be done by the vendor
iommu driver on the path of dev_attach/dev_detach callback.

For upper layers, before starting page tracking, they check the
dirty_page_trackable attribution of the domain and start it only it's
capable. Once the page tracking is switched on the vendor iommu driver
(or iommu core) should block further device attach/detach operations
until page tracking is stopped.




any domain attach/detach once the dirty page tracking is on.

Yep, this can greatly simplify our code logic, but I don't know whether our 
maintainers
agree that, as they may think that IOMMU dirty logging should not change 
original domain
behaviors.


The maintainer owns the last word, but we need to work out a generic and
self-contained API set.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

Hi Jason,

On Mon, 10 May 2021 20:45:00 -0300, Jason Gunthorpe  wrote:

> On Mon, May 10, 2021 at 03:28:54PM -0700, Jacob Pan wrote:
> 
> > To satisfy your "give me a PASID for this RID" proposal, can we just use
> > the RID's struct device as the token? Also add a type field to
> > explicitly indicate global vs per-set(per-RID). i.e.  
> 
> You've got it backwards, the main behavior should be to allocate PASID
> per RID.
> 
Sure, we can make the local PASID as default. My point was that the
ioasid_set infrastructure's opaque token can support RID-local allocation
scheme. Anyway, this is a small detail as compared to uAPI.

> The special behavior is to bundle a bunch of PASIDs into a grouping
> and then say the PASID number space is shared between all the group
> members. 
> 
> /dev/ioasid should create and own this grouping either implicitly or
> explicitly. Jumping ahead to in-kernel APIs has missed the critical
> step of defining the uAPI and all the behaviors together in a
> completed RFC proposal.
> 
Agreed, the requirements for kernel API should come from uAPI.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset

On Mon, May 10, 2021 at 05:30:57PM +0900, Chanho Park wrote:
> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr);

Please just move swiotlb_align_offset up to avoid the forward declaration.

>  /*
>   * Bounce: copy the swiotlb buffer from or back to the original dma location
>   */
> @@ -346,10 +347,17 @@ static void swiotlb_bounce(struct device *dev, 
> phys_addr_t tlb_addr, size_t size
>   size_t alloc_size = mem->slots[index].alloc_size;
>   unsigned long pfn = PFN_DOWN(orig_addr);
>   unsigned char *vaddr = phys_to_virt(tlb_addr);
> + unsigned int tlb_offset;
>  
>   if (orig_addr == INVALID_PHYS_ADDR)
>   return;
>  
> + tlb_offset = (unsigned int)tlb_addr & (IO_TLB_SIZE - 1);
> + tlb_offset -= swiotlb_align_offset(dev, orig_addr);

Nit: I'd write this as:

tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
swiotlb_align_offset(dev, orig_addr);

as there is no need for the cast, and just having a single assignment
is easier to follow.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions associated with a dev

2021-05-10 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 07 May 2021 11:02
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> j...@8bytes.org; wanghuiqiang ; Guohanjun
> (Hanjun Guo) ; steven.pr...@arm.com;
> sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com
> Subject: Re: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions
> associated with a dev
> 
> On 2021-04-20 09:27, Shameer Kolothum wrote:
> > Get RMR regions associated with a dev reserved so that there is
> > a unity mapping for them in SMMU.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29
> +
> >   1 file changed, 29 insertions(+)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 14e9c7034c04..8bacedf7bb34 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2531,6 +2531,34 @@ static int arm_smmu_of_xlate(struct device *dev,
> struct of_phandle_args *args)
> > return iommu_fwspec_add_ids(dev, args->args, 1);
> >   }
> >
> > +static bool arm_smmu_dev_has_rmr(struct arm_smmu_master *master,
> > +struct iommu_rmr *e)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < master->num_sids; i++) {
> > +   if (e->sid == master->sids[i])
> > +   return true;
> > +   }
> > +
> > +   return false;
> > +}
> > +
> > +static void arm_smmu_rmr_get_resv_regions(struct device *dev,
> > + struct list_head *head)
> > +{
> > +   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +   struct arm_smmu_device *smmu = master->smmu;
> > +   struct iommu_rmr *rmr;
> > +
> > +   list_for_each_entry(rmr, >rmr_list, list) {
> > +   if (!arm_smmu_dev_has_rmr(master, rmr))
> > +   continue;
> > +
> > +   iommu_dma_get_rmr_resv_regions(dev, rmr, head);
> > +   }
> > +}
> > +
> 
> TBH I wouldn't have thought we need a driver-specific hook for this, or
> is it too painful to correlate fwspec->iommu_fwnode back to the relevant
> IORT node generically?

From a quick look, I think I could get rid of the above with something like 
below,

--8<
+static bool iommu_dma_dev_has_rmr(struct iommu_fwspec *fwspec,
+ struct iommu_rmr *e)
+{
+   int i;
+
+   for (i = 0; i < fwspec->num_ids; i++) {
+if (e->sid == fwspec->ids[i])
+return true;
+}
+
+return false;
+}
+
+
+void iommu_dma_get_rmr_resv_regions(struct device *dev, struct list_head *list)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct list_head rmr_list;
+   struct iommu_rmr *rmr;
+
+   INIT_LIST_HEAD(_list);
+   if (iommu_dma_get_rmrs(fwspec->iommu_fwnode, _list))
+   return;
...
+   list_for_each_entry(rmr, _list, list) {
+ 
+   if (!iommu_dma_dev_has_rmr(fwspec, rmr)
+   continue;
+  ... 
+   region = iommu_alloc_resv_region(rmr->base_address,
+rmr->length, prot,
+type);
 ...
+   }
+}
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -188,10 +242,11 @@ void iommu_dma_get_resv_regions(struct device *dev, 
struct list_head *list)
if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
iort_iommu_msi_get_resv_regions(dev, list);
 
+   iommu_dma_get_rmr_resv_regions(dev, list);
 }

8<

But looking at the SMMUv2 code, the fwspec->ids is MASK:SID, so I am not
sure the RMR sid can be compared directly to fwspec->ids above. Right? Or
is there a better way here?

Thanks,
Shameer


> 
> >   static void arm_smmu_get_resv_regions(struct device *dev,
> >   struct list_head *head)
> >   {
> > @@ -2545,6 +2573,7 @@ static void arm_smmu_get_resv_regions(struct
> device *dev,
> > list_add_tail(>list, head);
> >
> > iommu_dma_get_resv_regions(dev, head);
> > +   arm_smmu_rmr_get_resv_regions(dev, head);
> >   }
> >
> >   static bool arm_smmu_dev_has_feature(struct device *dev,
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] swiotlb: manipulate orig_addr when tlb_addr has offset

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/6] vfio: remove the unused mdev iommu hook

The iommu_device field in struct mdev_device has never been used
since it was added more than 2 years ago.

Signed-off-by: Christoph Hellwig 
---
 drivers/vfio/vfio_iommu_type1.c | 132 ++--
 include/linux/mdev.h|  20 -
 2 files changed, 25 insertions(+), 127 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a0747c35a7781e..5974a3fb2e2e12 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -113,7 +113,6 @@ struct vfio_batch {
 struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
-   boolmdev_group; /* An mdev group */
boolpinned_page_dirty_scope;
 };
 
@@ -1932,61 +1931,6 @@ static bool vfio_iommu_has_sw_msi(struct list_head 
*group_resv_regions,
return ret;
 }
 
-static int vfio_mdev_attach_domain(struct device *dev, void *data)
-{
-   struct mdev_device *mdev = to_mdev_device(dev);
-   struct iommu_domain *domain = data;
-   struct device *iommu_device;
-
-   iommu_device = mdev_get_iommu_device(mdev);
-   if (iommu_device) {
-   if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
-   return iommu_aux_attach_device(domain, iommu_device);
-   else
-   return iommu_attach_device(domain, iommu_device);
-   }
-
-   return -EINVAL;
-}
-
-static int vfio_mdev_detach_domain(struct device *dev, void *data)
-{
-   struct mdev_device *mdev = to_mdev_device(dev);
-   struct iommu_domain *domain = data;
-   struct device *iommu_device;
-
-   iommu_device = mdev_get_iommu_device(mdev);
-   if (iommu_device) {
-   if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
-   iommu_aux_detach_device(domain, iommu_device);
-   else
-   iommu_detach_device(domain, iommu_device);
-   }
-
-   return 0;
-}
-
-static int vfio_iommu_attach_group(struct vfio_domain *domain,
-  struct vfio_group *group)
-{
-   if (group->mdev_group)
-   return iommu_group_for_each_dev(group->iommu_group,
-   domain->domain,
-   vfio_mdev_attach_domain);
-   else
-   return iommu_attach_group(domain->domain, group->iommu_group);
-}
-
-static void vfio_iommu_detach_group(struct vfio_domain *domain,
-   struct vfio_group *group)
-{
-   if (group->mdev_group)
-   iommu_group_for_each_dev(group->iommu_group, domain->domain,
-vfio_mdev_detach_domain);
-   else
-   iommu_detach_group(domain->domain, group->iommu_group);
-}
-
 static bool vfio_bus_is_mdev(struct bus_type *bus)
 {
struct bus_type *mdev_bus;
@@ -2001,20 +1945,6 @@ static bool vfio_bus_is_mdev(struct bus_type *bus)
return ret;
 }
 
-static int vfio_mdev_iommu_device(struct device *dev, void *data)
-{
-   struct mdev_device *mdev = to_mdev_device(dev);
-   struct device **old = data, *new;
-
-   new = mdev_get_iommu_device(mdev);
-   if (!new || (*old && *old != new))
-   return -EINVAL;
-
-   *old = new;
-
-   return 0;
-}
-
 /*
  * This is a helper function to insert an address range to iova list.
  * The list is initially created with a single entry corresponding to
@@ -2275,38 +2205,24 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
goto out_free;
 
if (vfio_bus_is_mdev(bus)) {
-   struct device *iommu_device = NULL;
-
-   group->mdev_group = true;
-
-   /* Determine the isolation type */
-   ret = iommu_group_for_each_dev(iommu_group, _device,
-  vfio_mdev_iommu_device);
-   if (ret || !iommu_device) {
-   if (!iommu->external_domain) {
-   INIT_LIST_HEAD(>group_list);
-   iommu->external_domain = domain;
-   vfio_update_pgsize_bitmap(iommu);
-   } else {
-   kfree(domain);
-   }
-
-   list_add(>next,
->external_domain->group_list);
-   /*
-* Non-iommu backed group cannot dirty memory directly,
-* it can only use interfaces that provide dirty
-* tracking.
-* The iommu scope can only be promoted with the
-* addition of a dirty tracking group.
-*/
-   group->pinned_page_dirty_scope = true;
-

[PATCH 1/6] iommu: remove the unused dev_has_feat method

This method is never actually called.  Simplify the instance in
intel-iommu that is directly called by inlining the relevant code.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  1 -
 drivers/iommu/intel/iommu.c | 67 ++---
 include/linux/iommu.h   |  4 +-
 3 files changed, 7 insertions(+), 65 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d439..9689563eec0f30 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2666,7 +2666,6 @@ static struct iommu_ops arm_smmu_ops = {
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-   .dev_has_feat   = arm_smmu_dev_has_feature,
.dev_feat_enabled   = arm_smmu_dev_feature_enabled,
.dev_enable_feat= arm_smmu_dev_enable_feature,
.dev_disable_feat   = arm_smmu_dev_disable_feature,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 708f430af1c440..ba1060f6785119 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5326,72 +5326,18 @@ static int intel_iommu_disable_auxd(struct device *dev)
return 0;
 }
 
-/*
- * A PCI express designated vendor specific extended capability is defined
- * in the section 3.7 of Intel scalable I/O virtualization technical spec
- * for system software and tools to detect endpoint devices supporting the
- * Intel scalable IO virtualization without host driver dependency.
- *
- * Returns the address of the matching extended capability structure within
- * the device's PCI configuration space or 0 if the device does not support
- * it.
- */
-static int siov_find_pci_dvsec(struct pci_dev *pdev)
-{
-   int pos;
-   u16 vendor, id;
-
-   pos = pci_find_next_ext_capability(pdev, 0, 0x23);
-   while (pos) {
-   pci_read_config_word(pdev, pos + 4, );
-   pci_read_config_word(pdev, pos + 8, );
-   if (vendor == PCI_VENDOR_ID_INTEL && id == 5)
-   return pos;
-
-   pos = pci_find_next_ext_capability(pdev, pos, 0x23);
-   }
-
-   return 0;
-}
-
-static bool
-intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat)
-{
-   struct device_domain_info *info = get_domain_info(dev);
-
-   if (feat == IOMMU_DEV_FEAT_AUX) {
-   int ret;
-
-   if (!dev_is_pci(dev) || dmar_disabled ||
-   !scalable_mode_support() || !pasid_mode_support())
-   return false;
-
-   ret = pci_pasid_features(to_pci_dev(dev));
-   if (ret < 0)
-   return false;
-
-   return !!siov_find_pci_dvsec(to_pci_dev(dev));
-   }
-
-   if (feat == IOMMU_DEV_FEAT_IOPF)
-   return info && info->pri_supported;
-
-   if (feat == IOMMU_DEV_FEAT_SVA)
-   return info && (info->iommu->flags & VTD_FLAG_SVM_CAPABLE) &&
-   info->pasid_supported && info->pri_supported &&
-   info->ats_supported;
-
-   return false;
-}
-
 static int
 intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
 {
if (feat == IOMMU_DEV_FEAT_AUX)
return intel_iommu_enable_auxd(dev);
 
-   if (feat == IOMMU_DEV_FEAT_IOPF)
-   return intel_iommu_dev_has_feat(dev, feat) ? 0 : -ENODEV;
+   if (feat == IOMMU_DEV_FEAT_IOPF) {
+   struct device_domain_info *info = get_domain_info(dev);
+
+   if (info &&  info->pri_supported)
+   return 0;
+   }
 
if (feat == IOMMU_DEV_FEAT_SVA) {
struct device_domain_info *info = get_domain_info(dev);
@@ -5552,7 +5498,6 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
.device_group   = intel_iommu_device_group,
-   .dev_has_feat   = intel_iommu_dev_has_feat,
.dev_feat_enabled   = intel_iommu_dev_feat_enabled,
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf76e..1092a7f967a5e8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -141,8 +141,7 @@ struct iommu_resv_region {
  *  supported, this feature must be enabled before and
  *  disabled after %IOMMU_DEV_FEAT_SVA.
  *
- * Device drivers query whether a feature is supported using
- * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
+ * Device drivers enable

[PATCH 4/6] iommu: remove iommu_aux_{attach,detach}_device

These are entirely unused now, and were only called by the previously
entirely unused vfio mdev iommu interaction.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/intel/iommu.c | 205 
 drivers/iommu/iommu.c   |  33 --
 include/linux/intel-iommu.h |  10 --
 include/linux/iommu.h   |  17 ---
 4 files changed, 265 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc07f316adcb18..6cae6e4d693226 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1497,18 +1497,6 @@ static void domain_update_iotlb(struct dmar_domain 
*domain)
break;
}
 
-   if (!has_iotlb_device) {
-   struct subdev_domain_info *sinfo;
-
-   list_for_each_entry(sinfo, >subdevices, link_domain) {
-   info = get_domain_info(sinfo->pdev);
-   if (info && info->ats_enabled) {
-   has_iotlb_device = true;
-   break;
-   }
-   }
-   }
-
domain->has_iotlb_device = has_iotlb_device;
 }
 
@@ -1606,7 +1594,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
 {
unsigned long flags;
struct device_domain_info *info;
-   struct subdev_domain_info *sinfo;
 
if (!domain->has_iotlb_device)
return;
@@ -1614,11 +1601,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
spin_lock_irqsave(_domain_lock, flags);
list_for_each_entry(info, >devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);
-
-   list_for_each_entry(sinfo, >subdevices, link_domain) {
-   info = get_domain_info(sinfo->pdev);
-   __iommu_flush_dev_iotlb(info, addr, mask);
-   }
spin_unlock_irqrestore(_domain_lock, flags);
 }
 
@@ -1903,7 +1885,6 @@ static struct dmar_domain *alloc_domain(int flags)
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(>devices);
-   INIT_LIST_HEAD(>subdevices);
 
return domain;
 }
@@ -2593,7 +2574,6 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
-   INIT_LIST_HEAD(>subdevices);
 
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -4579,168 +4559,6 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
domain->type == IOMMU_DOMAIN_UNMANAGED;
 }
 
-static inline struct subdev_domain_info *
-lookup_subdev_info(struct dmar_domain *domain, struct device *dev)
-{
-   struct subdev_domain_info *sinfo;
-
-   if (!list_empty(>subdevices)) {
-   list_for_each_entry(sinfo, >subdevices, link_domain) {
-   if (sinfo->pdev == dev)
-   return sinfo;
-   }
-   }
-
-   return NULL;
-}
-
-static int auxiliary_link_device(struct dmar_domain *domain,
-struct device *dev)
-{
-   struct device_domain_info *info = get_domain_info(dev);
-   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
-
-   assert_spin_locked(_domain_lock);
-   if (WARN_ON(!info))
-   return -EINVAL;
-
-   if (!sinfo) {
-   sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
-   sinfo->domain = domain;
-   sinfo->pdev = dev;
-   list_add(>link_phys, >subdevices);
-   list_add(>link_domain, >subdevices);
-   }
-
-   return ++sinfo->users;
-}
-
-static int auxiliary_unlink_device(struct dmar_domain *domain,
-  struct device *dev)
-{
-   struct device_domain_info *info = get_domain_info(dev);
-   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
-   int ret;
-
-   assert_spin_locked(_domain_lock);
-   if (WARN_ON(!info || !sinfo || sinfo->users <= 0))
-   return -EINVAL;
-
-   ret = --sinfo->users;
-   if (!ret) {
-   list_del(>link_phys);
-   list_del(>link_domain);
-   kfree(sinfo);
-   }
-
-   return ret;
-}
-
-static int aux_domain_add_dev(struct dmar_domain *domain,
- struct device *dev)
-{
-   int ret;
-   unsigned long flags;
-   struct intel_iommu *iommu;
-
-   iommu = device_to_iommu(dev, NULL, NULL);
-   if (!iommu)
-   return -ENODEV;
-
-   if (domain->default_pasid <= 0) {
-   u32 pasid;
-
-   /* No private data needed for the default pasid */
-   pasid = ioasid_alloc(NULL, PASID_MIN,
-pci_max_pasids(to_pci_dev(dev)) - 1,
-

[PATCH 5/6] iommu: remove IOMMU_DEV_FEAT_AUX

This feature is entirely unused, and was only ever called by unused code
either.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/intel/iommu.c | 82 -
 drivers/iommu/intel/svm.c   |  6 ---
 include/linux/intel-iommu.h |  1 -
 include/linux/iommu.h   |  2 -
 4 files changed, 91 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6cae6e4d693226..2603d95ecb4c0e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2573,7 +2573,6 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->domain = domain;
info->iommu = iommu;
info->pasid_table = NULL;
-   info->auxd_enabled = 0;
 
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -4546,19 +4545,6 @@ static void intel_iommu_domain_free(struct iommu_domain 
*domain)
domain_exit(to_dmar_domain(domain));
 }
 
-/*
- * Check whether a @domain could be attached to the @dev through the
- * aux-domain attach/detach APIs.
- */
-static inline bool
-is_aux_domain(struct device *dev, struct iommu_domain *domain)
-{
-   struct device_domain_info *info = get_domain_info(dev);
-
-   return info && info->auxd_enabled &&
-   domain->type == IOMMU_DOMAIN_UNMANAGED;
-}
-
 static int prepare_domain_attach_device(struct iommu_domain *domain,
struct device *dev)
 {
@@ -4612,9 +4598,6 @@ static int intel_iommu_attach_device(struct iommu_domain 
*domain,
return -EPERM;
}
 
-   if (is_aux_domain(dev, domain))
-   return -EPERM;
-
/* normally dev is not mapped */
if (unlikely(domain_context_mapped(dev))) {
struct dmar_domain *old_domain;
@@ -5083,52 +5066,9 @@ static struct iommu_group 
*intel_iommu_device_group(struct device *dev)
return generic_device_group(dev);
 }
 
-static int intel_iommu_enable_auxd(struct device *dev)
-{
-   struct device_domain_info *info;
-   struct intel_iommu *iommu;
-   unsigned long flags;
-   int ret;
-
-   iommu = device_to_iommu(dev, NULL, NULL);
-   if (!iommu || dmar_disabled)
-   return -EINVAL;
-
-   if (!sm_supported(iommu) || !pasid_supported(iommu))
-   return -EINVAL;
-
-   ret = intel_iommu_enable_pasid(iommu, dev);
-   if (ret)
-   return -ENODEV;
-
-   spin_lock_irqsave(_domain_lock, flags);
-   info = get_domain_info(dev);
-   info->auxd_enabled = 1;
-   spin_unlock_irqrestore(_domain_lock, flags);
-
-   return 0;
-}
-
-static int intel_iommu_disable_auxd(struct device *dev)
-{
-   struct device_domain_info *info;
-   unsigned long flags;
-
-   spin_lock_irqsave(_domain_lock, flags);
-   info = get_domain_info(dev);
-   if (!WARN_ON(!info))
-   info->auxd_enabled = 0;
-   spin_unlock_irqrestore(_domain_lock, flags);
-
-   return 0;
-}
-
 static int
 intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
 {
-   if (feat == IOMMU_DEV_FEAT_AUX)
-   return intel_iommu_enable_auxd(dev);
-
if (feat == IOMMU_DEV_FEAT_IOPF) {
struct device_domain_info *info = get_domain_info(dev);
 
@@ -5152,26 +5092,6 @@ intel_iommu_dev_enable_feat(struct device *dev, enum 
iommu_dev_features feat)
return -ENODEV;
 }
 
-static int
-intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
-{
-   if (feat == IOMMU_DEV_FEAT_AUX)
-   return intel_iommu_disable_auxd(dev);
-
-   return -ENODEV;
-}
-
-static bool
-intel_iommu_dev_feat_enabled(struct device *dev, enum iommu_dev_features feat)
-{
-   struct device_domain_info *info = get_domain_info(dev);
-
-   if (feat == IOMMU_DEV_FEAT_AUX)
-   return scalable_mode_support() && info && info->auxd_enabled;
-
-   return false;
-}
-
 static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
   struct device *dev)
 {
@@ -5283,9 +5203,7 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
.device_group   = intel_iommu_device_group,
-   .dev_feat_enabled   = intel_iommu_dev_feat_enabled,
.dev_enable_feat= intel_iommu_dev_enable_feat,
-   .dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
.def_domain_type= device_def_domain_type,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea904211a..54b55f867438c8 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -347,10 +347,6 @@ int

[PATCH 6/6] iommu: remove iommu_dev_feature_enabled

Remove the unused iommu_dev_feature_enabled function.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  1 -
 drivers/iommu/iommu.c   | 13 -
 include/linux/iommu.h   |  9 -
 3 files changed, 23 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9689563eec0f30..77deabf2ab9c83 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2666,7 +2666,6 @@ static struct iommu_ops arm_smmu_ops = {
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-   .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
.dev_enable_feat= arm_smmu_dev_enable_feature,
.dev_disable_feat   = arm_smmu_dev_disable_feature,
.sva_bind   = arm_smmu_sva_bind,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c4b92270946c2f..3d0d67b918a98f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2888,19 +2888,6 @@ int iommu_dev_disable_feature(struct device *dev, enum 
iommu_dev_features feat)
 }
 EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
 
-bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features 
feat)
-{
-   if (dev->iommu && dev->iommu->iommu_dev) {
-   const struct iommu_ops *ops = dev->iommu->iommu_dev->ops;
-
-   if (ops->dev_feat_enabled)
-   return ops->dev_feat_enabled(dev, feat);
-   }
-
-   return false;
-}
-EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
-
 /**
  * iommu_sva_bind_device() - Bind a process address space to a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ef9e7dcef76e90..c0685c88b3f8b5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -198,7 +198,6 @@ struct iommu_iotlb_gather {
  *  driver init to device driver init (default no)
  * @dev_has/enable/disable_feat: per device entries to check/enable/disable
  *   iommu specific features.
- * @dev_feat_enabled: check enabled feature
  * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
  * @sva_bind: Bind process address space to device
  * @sva_unbind: Unbind process address space from device
@@ -252,7 +251,6 @@ struct iommu_ops {
bool (*is_attach_deferred)(struct iommu_domain *domain, struct device 
*dev);
 
/* Per device IOMMU features */
-   bool (*dev_feat_enabled)(struct device *dev, enum iommu_dev_features f);
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
 
@@ -583,7 +581,6 @@ void iommu_release_device(struct device *dev);
 
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
-bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
 
 struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
@@ -905,12 +902,6 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct 
fwnode_handle *fwnode)
return NULL;
 }
 
-static inline bool
-iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
-{
-   return false;
-}
-
 static inline int
 iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
 {
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/6] iommu: remove the unused iommu_aux_get_pasid interface

This function was never used since it was added more than 2 years ago.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/intel/iommu.c | 10 --
 drivers/iommu/iommu.c   | 11 ---
 include/linux/iommu.h   |  9 -
 3 files changed, 30 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ba1060f6785119..cc07f316adcb18 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5375,15 +5375,6 @@ intel_iommu_dev_feat_enabled(struct device *dev, enum 
iommu_dev_features feat)
return false;
 }
 
-static int
-intel_iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-
-   return dmar_domain->default_pasid > 0 ?
-   dmar_domain->default_pasid : -EINVAL;
-}
-
 static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
   struct device *dev)
 {
@@ -5485,7 +5476,6 @@ const struct iommu_ops intel_iommu_ops = {
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
.aux_detach_dev = intel_iommu_aux_detach_device,
-   .aux_get_pasid  = intel_iommu_aux_get_pasid,
.map= intel_iommu_map,
.iotlb_sync_map = intel_iommu_iotlb_sync_map,
.unmap  = intel_iommu_unmap,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df50f..6721ac17baf29b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2934,17 +2934,6 @@ void iommu_aux_detach_device(struct iommu_domain 
*domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_aux_detach_device);
 
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-   int ret = -ENODEV;
-
-   if (domain->ops->aux_get_pasid)
-   ret = domain->ops->aux_get_pasid(domain, dev);
-
-   return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
-
 /**
  * iommu_sva_bind_device() - Bind a process address space to a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1092a7f967a5e8..d8aa5c8a5ba57a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -202,7 +202,6 @@ struct iommu_iotlb_gather {
  *   iommu specific features.
  * @dev_feat_enabled: check enabled feature
  * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
- * @aux_get_pasid: get the pasid given an aux-domain
  * @sva_bind: Bind process address space to device
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
@@ -262,7 +261,6 @@ struct iommu_ops {
/* Aux-domain specific attach/detach entries */
int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
-   int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
 
struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
  void *drvdata);
@@ -594,7 +592,6 @@ int iommu_dev_disable_feature(struct device *dev, enum 
iommu_dev_features f);
 bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
 int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev);
 void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
 
 struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
@@ -945,12 +942,6 @@ iommu_aux_detach_device(struct iommu_domain *domain, 
struct device *dev)
 {
 }
 
-static inline int
-iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
-   return -ENODEV;
-}
-
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

more iommu dead code removal

Hi all,

this is another series to remove dead code from the IOMMU subsystem,
this time mostly about the hacky code to pass an iommu device in
struct mdev_device and huge piles of support code.  All of this was
merged two years ago and (fortunately) never got used.

Note that the mdev.h changes might have minor contextual conflicts
with the pending work from Jason, but it is trivial to resolve.

Diffstat:
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |2 
 drivers/iommu/intel/iommu.c |  362 
 drivers/iommu/intel/svm.c   |6 
 drivers/iommu/iommu.c   |   57 
 drivers/vfio/vfio_iommu_type1.c |  132 +-
 include/linux/intel-iommu.h |   11 
 include/linux/iommu.h   |   41 ---
 include/linux/mdev.h|   20 -
 8 files changed, 31 insertions(+), 600 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info and install bypass SMR

2021-05-10 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Steven Price [mailto:steven.pr...@arm.com]
> Sent: 06 May 2021 16:17
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> j...@8bytes.org; robin.mur...@arm.com; wanghuiqiang
> ; Guohanjun (Hanjun Guo)
> ; sami.muja...@arm.com; j...@solid-run.com;
> eric.au...@redhat.com
> Subject: Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info
> and install bypass SMR
> 
> On 20/04/2021 09:27, Shameer Kolothum wrote:
> > From: Jon Nettleton 
> >
> > Check if there is any RMR info associated with the devices behind
> > the SMMU and if any, install bypass SMRs for them. This is to
> > keep any ongoing traffic associated with these devices alive
> > when we enable/reset SMMU during probe().
> >
> > Signed-off-by: Jon Nettleton 
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c | 42
> +++
> >   drivers/iommu/arm/arm-smmu/arm-smmu.h |  2 ++
> >   2 files changed, 44 insertions(+)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index d8c6bfde6a61..4d2f91626d87 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -2102,6 +2102,43 @@ err_reset_platform_ops: __maybe_unused;
> > return err;
> >   }
> >
> > +static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device
> *smmu)
> > +{
> > +   struct iommu_rmr *e;
> > +   int i, cnt = 0;
> > +   u32 smr;
> > +
> > +   for (i = 0; i < smmu->num_mapping_groups; i++) {
> > +   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
> > +   if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
> > +   continue;
> > +
> > +   list_for_each_entry(e, >rmr_list, list) {
> > +   if (FIELD_GET(ARM_SMMU_SMR_ID, smr) != e->sid)
> > +   continue;
> > +
> > +   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
> > +   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK,
> smr);
> > +   smmu->smrs[i].valid = true;
> > +
> > +   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
> > +   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
> > +   smmu->s2crs[i].cbndx = 0xff;
> > +
> > +   cnt++;
> > +   }
> > +   }
> 
> If I understand this correctly - this is looking at the current
> (hardware) configuration of the SMMU and attempting to preserve any
> bypass SMRs. However from what I can tell it suffers from the following
> two problems:
> 
>   (a) Only the ID of the SMR is being checked, not the MASK. So if the
> firmware has setup an SMR matching a number of streams this will break.
> 
>   (b) The SMMU might not be enabled at all (CLIENTPD==1) or bypass
> enabled for unmatched streams (USFCFG==0).
> 
> Certainly in my test setup case (b) applies and so this doesn't work.
> Perhaps something like the below would work better? (It works in the
> case of the SMMU not enabled - I've not tested case (a)).

Thanks Steve for taking a look and testing this on SMMUv2. My knowledge
on SMMUv2 is limited an don't have a setup to verify this. After reading
the code, agree that the current implementation addresses the hardware
configuration only and breaks all the scenarios explained above.

I will include the below snippet in next respin if that works.

Hi Jon,

Could you please take a look and see the below changes works for
you too?

Thanks,
Shameer

> 8<
> static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device
> *smmu)
> {
>   struct iommu_rmr *e;
>   int i, cnt = 0;
>   u32 smr;
>   u32 reg;
> 
>   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);
> 
>   if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg &
> ARM_SMMU_sCR0_CLIENTPD)) {
>   /*
>* SMMU is already enabled and disallowing bypass, so preserve
>* the existing SMRs
>*/
>   for (i = 0; i < smmu->num_mapping_groups; i++) {
>   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
>   if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
>   continue;
>   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
>   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK,
> smr);
>   smmu->smrs[i].valid = true;
>   }
>   }
> 
>   list_for_each_entry(e, >rmr_list, list) {
>   u32 sid = e->sid;
> 
>   i = arm_smmu_find_sme(smmu, sid, ~0);
>   if (i < 0)
>   continue;
>   if (smmu->s2crs[i].count == 0) {
>   smmu->smrs[i].id = sid;
>   smmu->smrs[i].mask = ~0;
>

Re: Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)

On Sat, May 08, 2021 at 12:32:37AM +0100, Julien Grall wrote:
> The pointer dereferenced seems to suggest that the swiotlb hasn't been 
> allocated. From what I can tell, this may be because swiotlb_force is set 
> to SWIOTLB_NO_FORCE, we will still enable the swiotlb when running on top 
> of Xen.
>
> I am not entirely sure what would be the correct fix. Any opinions?

Can you try something like the patch below (not even compile tested, but
the intent should be obvious?


diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 16a2b2b1c54d..7671bc153fb1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -482,7 +484,7 @@ void __init mem_init(void)
if (swiotlb_force == SWIOTLB_FORCE ||
max_pfn > PFN_DOWN(arm64_dma_phys_limit))
swiotlb_init(1);
-   else
+   else if (!IS_ENABLED(CONFIG_XEN) || !xen_swiotlb_detect())
swiotlb_force = SWIOTLB_NO_FORCE;
 
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

On Sat, May 08, 2021 at 09:56:59AM +, Tian, Kevin wrote:
> > From: Raj, Ashok 
> > Sent: Friday, May 7, 2021 12:33 AM
> > 
> > > Basically it means when the guest's top level IOASID is created for
> > > nesting that IOASID claims all PASID's on the RID and excludes any
> > > PASID IOASIDs from existing on the RID now or in future.
> > 
> > The way to look at it this is as follows:
> > 
> > For platforms that do not have a need to support shared work queue model
> > support for ENQCMD or similar, PASID space is naturally per RID. There is no
> > complication with this. Every RID has the full range of PASID's and no need
> > for host to track which PASIDs are allocated now or in future in the guest.
> > 
> > For platforms that support ENQCMD, it is required to mandate PASIDs are
> > global across the entire system. Maybe its better to call them gPASID for
> > guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> > wide mapping for ENQCMD and not a per-RID based mapping. (We covered
> > that
> > in earlier responses)
> > 
> > In our current implementation we actually don't separate this space, and
> > gPASID == hPASID. The iommu driver enforces that by using the custom
> > allocator and the architected interface that allows all guest vIOMMU
> > allocations to be proxied to host. Nothing but a glorified hypercall like
> > interface. In fact some OS's do use hypercall to get a hPASID vs using
> > the vCMD style interface.
> > 
> 
> After more thinking about the new interface, I feel gPASID==hPASID 
> actually causes some confusion in uAPI design. In concept an ioasid
> is not active until it's attached to a device, because it's just an ID
> if w/o a device. So supposedly an ioasid should reject all user commands
> before attach. However an guest likely asks for a new gPASID before
> attaching it to devices and vIOMMU. if gPASID==hPASID then Qemu 
> must request /dev/ioasid to allocate a hw_id for an ioasid which hasn't 
> been attached to any device, with the assumption on kernel knowledge 
> that this hw_id is from an global allocator w/o dependency on any 
> device. This doesn't sound a clean design, not to say it also conflicts 
> with live migration.

Everything must be explicit. The situation David pointed to of
qemu emulating a vIOMMU while running on a host with a different
platform/physical IOMMU must be considered.

If the vIOMMU needs specific behavior it must use /dev/iommu to ask
for it specifically and not just make wild assumptions about how the
platform works.

> gPASID!=hPASID has a problem when assigning a physical device which 
> supports both shared work queue (ENQCMD with PASID in MSR) 
> and dedicated work queue (PASID in device register) to a guest
> process which is associated to a gPASID. Say the host kernel has setup
> the hPASID entry with nested translation though /dev/ioasid. For 
> shared work queue the CPU is configured to translate gPASID in MSR 
> into **hPASID** before the payload goes out to the wire. However 
> for dedicated work queue the device MMIO register is directly mapped 
> to and programmed by the guest, thus containing a **gPASID** value
> implying DMA requests through this interface will hit IOMMU faults
> due to invalid gPASID entry. Having gPASID==hPASID is a simple 
> workaround here. mdev doesn't have this problem because the
> PASID register is in emulated control-path thus can be translated
> to hPASID manually by mdev driver.

This all must be explicit too.

If a PASID is allocated and it is going to be used with ENQCMD then
everything needs to know it is actually quite different than a PASID
that was allocated to be used with a normal SRIOV device, for
instance.

The former case can accept that the guest PASID is virtualized, while
the lattter can not.

This is also why PASID per RID has to be an option. When I assign a
full SRIOV function to the guest then that entire RID space needs to
also be assigned to the guest. Upon migration I need to take all the
physical PASIDs and rebuild them in another hypervisor exactly as is.

If you force all RIDs into a global PASID pool then normal SRIOV
migration w/PASID becomes impossible. ie ENQCMD breaks everything else
that should work.

This is why you need to sort all this out and why it feels like some
of the specs here have been mis-designed.

I'm not sure carving out ranges is really workable for migration.

I think the real answer is to carve out entire RIDs as being in the
global pool or not. Then the ENQCMD HW can be bundled together and
everything else can live in the natural PASID per RID world.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 13/15] dma-direct: Allocate memory from restricted DMA pool if available

The restricted DMA pool is preferred if available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index eb4098323bbc..0d521f78c7b9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (swiotlb_free(dev, page, size))
+   return;
+#endif
dma_free_contiguous(dev, page, size);
 }
 
@@ -92,7 +96,17 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   page = NULL;
+   }
+#endif
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -148,7 +162,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -161,18 +175,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -253,15 +272,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev)) {
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -289,7 +308,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
void *ret;
 
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle,

[PATCH v6 12/15] swiotlb: Add restricted DMA alloc/free support.

Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 35 +--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0c5a18d9cf89..e8cf49bd90c5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -134,6 +134,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fa11787e4b95..d99d5f902259 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -457,8 +457,9 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
 
index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
do {
-   if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
-   (orig_addr & iotlb_align_mask)) {
+   if (orig_addr &&
+   (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+   (orig_addr & iotlb_align_mask)) {
index = wrap_index(mem, index + 1);
continue;
}
@@ -701,6 +702,36 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   phys_addr_t tlb_addr;
+   int index;
+
+   if (!mem)
+   return NULL;
+
+   index = find_slots(dev, 0, size);
+   if (index == -1)
+   return NULL;
+
+   tlb_addr = slot_addr(mem->start, index);
+
+   return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+   phys_addr_t tlb_addr = page_to_phys(page);
+
+   if (!is_swiotlb_buffer(dev, tlb_addr))
+   return false;
+
+   release_slots(dev, tlb_addr);
+
+   return true;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
 {
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 15/15] of: Add plumbing for restricted DMA pool

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 25 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  5 +
 3 files changed, 33 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index aca94c348bd4..c562a9ff5f0b 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1112,6 +1113,30 @@ bool of_dma_is_coherent(struct device_node *np)
 }
 EXPORT_SYMBOL_GPL(of_dma_is_coherent);
 
+int of_dma_set_restricted_buffer(struct device *dev)
+{
+   struct device_node *node;
+   int count, i;
+
+   if (!dev->of_node)
+   return 0;
+
+   count = of_property_count_elems_of_size(dev->of_node, "memory-region",
+   sizeof(phandle));
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(dev->of_node, "memory-region", i);
+   /* There might be multiple memory regions, but only one
+* restriced-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(
+   dev, dev->of_node, i);
+   }
+
+   return 0;
+}
+
 /**
  * of_mmio_is_nonposted - Check if device uses non-posted MMIO
  * @np:device node
diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..d8d865223e51 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d717efbd637d..e9237f5eff48 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,17 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_get_restricted_buffer(struct device *dev)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 14/15] dt-bindings: of: Add restricted DMA pool

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 27 +++
 1 file changed, 27 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..284aea659015 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -120,6 +137,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_mem_reserved: restricted_dma_mem_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x40>;
+   };
};
 
/* ... */
@@ -138,4 +160,9 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 11/15] dma-direct: Add a new wrapper __dma_direct_free_pages()

Add a new wrapper __dma_direct_free_pages() that will be useful later
for swiotlb_free().

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 078f7087e466..eb4098323bbc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,12 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -237,7 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -273,7 +279,7 @@ void dma_direct_free(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
 
-   dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+   __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
 struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -310,7 +316,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -329,7 +335,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
 
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
 }
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 10/15] swiotlb: Refactor swiotlb_tbl_unmap_single

Add a new function, release_slots, to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ef04d8f7708f..fa11787e4b95 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -550,27 +550,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -605,6 +593,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: more iommu dead code removal

On Mon, May 10, 2021 at 08:53:59AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this is another series to remove dead code from the IOMMU subsystem,
> this time mostly about the hacky code to pass an iommu device in
> struct mdev_device and huge piles of support code.  All of this was
> merged two years ago and (fortunately) never got used.

Yes, I looked at this too. Intel has been merging dead code for a
while now. Ostensibly to prepare to get PASID support in.. But the
whole PASID thing looks to be redesigned from what was originally
imagined.

At least from VFIO I think the PASID support should not use this hacky
stuff, /dev/ioasid should provide a clean solution

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RESEND PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-10 Thread chenxiang

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
Reviewed-by: John Garry 
---
 drivers/iommu/iova.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b..b6cf5f1 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, unsigned 
long pfn)
return NULL;
 }
 
-static void private_free_iova(struct iova_domain *iovad, struct iova *iova)
+static void remove_iova(struct iova_domain *iovad, struct iova *iova)
 {
assert_spin_locked(>iova_rbtree_lock);
__cached_rbnode_delete_update(iovad, iova);
rb_erase(>node, >rbroot);
-   free_iova_mem(iova);
 }
 
 /**
@@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct iova *iova)
unsigned long flags;
 
spin_lock_irqsave(>iova_rbtree_lock, flags);
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(__free_iova);
 
@@ -472,10 +472,13 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)
 
spin_lock_irqsave(>iova_rbtree_lock, flags);
iova = private_find_iova(iovad, pfn);
-   if (iova)
-   private_free_iova(iovad, iova);
+   if (!iova) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return;
+   }
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
@@ -825,7 +828,8 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct 
iova_domain *iovad)
if (WARN_ON(!iova))
continue;
 
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
+   free_iova_mem(iova);
}
 
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-- 
2.8.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset

2021-05-10 Thread Chanho Park

From: Bumyong Lee 

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with offset
it makes data mismatch

it removed from "swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but it have to be recovered

1. Get dma_addr_t from dma_map_single()
dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

|<---vsize->|
+---+
|   | original buffer
+---+
  vaddr

 swiotlb_align_offset
 |<->|<---vsize->|
 +---+---+
 |   |   | swiotlb buffer
 +---+---+
  tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)
dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

  Error case.
copy data to original buffer.
but it is from base addr in original buffer

 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- size ->|
+---+
|##|| original buffer
+---+
  vaddr

  FIX. copy data to original buffer.
  but it is from base addr in original buffer

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- offset ->|<- size ->|
+---+
||##|   | original buffer
+---+
  vaddr

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in 
swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee 
Signed-off-by: Chanho Park 
---
 kernel/dma/swiotlb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..e8243725e298 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,7 @@ void __init swiotlb_exit(void)
io_tlb_default_mem = NULL;
 }
 
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr);
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -346,10 +347,17 @@ static void swiotlb_bounce(struct device *dev, 
phys_addr_t tlb_addr, size_t size
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned int tlb_offset;
 
if (orig_addr == INVALID_PHYS_ADDR)
return;
 
+   tlb_offset = (unsigned int)tlb_addr & (IO_TLB_SIZE - 1);
+   tlb_offset -= swiotlb_align_offset(dev, orig_addr);
+
+   orig_addr += tlb_offset;
+   alloc_size -= tlb_offset;
+
if (size > alloc_size) {
dev_WARN_ONCE(dev, 1,
"Buffer overflow detected. Allocation size: %zu. 
Mapping size: %zu.\n",
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset

2021-05-10 Thread Chanho Park

(RESEND due to wrong encrypted message setting)

Hi,

> On Mon, May 10, 2021 at 05:30:57PM +0900, Chanho Park wrote:
> > +static unsigned int swiotlb_align_offset(struct device *dev, u64
> > +addr);
> 
> Please just move swiotlb_align_offset up to avoid the forward declaration.

Okay. I'll move the position of the function next patch.

> 
> >  /*
> >   * Bounce: copy the swiotlb buffer from or back to the original dma
> location
> >   */
> > @@ -346,10 +347,17 @@ static void swiotlb_bounce(struct device *dev,
> phys_addr_t tlb_addr, size_t size
> > size_t alloc_size = mem->slots[index].alloc_size;
> > unsigned long pfn = PFN_DOWN(orig_addr);
> > unsigned char *vaddr = phys_to_virt(tlb_addr);
> > +   unsigned int tlb_offset;
> >
> > if (orig_addr == INVALID_PHYS_ADDR)
> > return;
> >
> > +   tlb_offset = (unsigned int)tlb_addr & (IO_TLB_SIZE - 1);
> > +   tlb_offset -= swiotlb_align_offset(dev, orig_addr);
> 
> Nit: I'd write this as:
> 
>   tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
>   swiotlb_align_offset(dev, orig_addr);
> 
> as there is no need for the cast, and just having a single assignment is
> easier to follow.

Great. It can be a single assignment as you suggested.

Best Regards,
Chanho Park

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] swiotlb: manipulate orig_addr when tlb_addr has offset

2021-05-10 Thread Chanho Park

From: Bumyong Lee 

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset it makes data mismatch

it removed from
- "swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but it have to be recovered due to below case.

1. Get dma_addr_t from dma_map_single()
dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

|<---vsize->|
+---+
|   | original buffer
+---+
  vaddr

 swiotlb_align_offset
 |<->|<---vsize->|
 +---+---+
 |   |   | swiotlb buffer
 +---+---+
  tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)
dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

  Error case.
copy data to original buffer.
but it is from base addr in original buffer

 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- size ->|
+---+
|##|| original buffer
+---+
  vaddr

  FIX. copy data to original buffer.
  but it is from base addr in original buffer

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- offset ->|<- size ->|
+---+
||##|   | original buffer
+---+
  vaddr

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in 
swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee 
Signed-off-by: Chanho Park 
---
Changes since v1:
- Move swiotlb_align_offset to avoid forward declaration
- Make tlb_offset calculation as single assignment

 kernel/dma/swiotlb.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..e50df8d8f87e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,14 @@ void __init swiotlb_exit(void)
io_tlb_default_mem = NULL;
 }
 
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+   return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
+}
+
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -346,10 +354,17 @@ static void swiotlb_bounce(struct device *dev, 
phys_addr_t tlb_addr, size_t size
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned int tlb_offset;
 
if (orig_addr == INVALID_PHYS_ADDR)
return;
 
+   tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
+swiotlb_align_offset(dev, orig_addr);
+
+   orig_addr += tlb_offset;
+   alloc_size -= tlb_offset;
+
if (size > alloc_size) {
dev_WARN_ONCE(dev, 1,
"Buffer overflow detected. Allocation size: %zu. 
Mapping size: %zu.\n",
@@ -390,14 +405,6 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
 
 #define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
 
-/*
- * Return the offset into a iotlb slot required to keep the device happy.
- */
-static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
-{
-   return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
-}
-
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
  */
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset

2021-05-10 Thread CHANHO PARK

삼성 보안 문서
삼성 보안 문서는 PDF로 암호화 됩니다.보안문서 열람을 위해 인증 방법 및 필수 설치프로그램을 확인하여 주십시오.

1. 보안문서를 조회하려면 본인인증이 필요합니다.
- Knox Portal 사용자 : Knox Portal 아이디 + 비밀번호
- 미사용자 : E-mail주소 + *사전등록 비밀번호
*사전등록 비밀번호 : 별도 수신한 '삼성 보안 문서 열람을 위한 등록안내' 메일 참고

2. 보안문서는 Adobe Reader로만 열람 가능합니다.
Adobe Reader 10.1.1 이상 설치되어 있어야 사용 가능합니다.
미 설치자의 경우 클릭하여 설치하세요.
* Windows 뷰어 앱으로 오픈 불가 [ windows 8.1 이상]

3. 보안문서를 열람하는 동안 화면캡쳐방지 프로그램이 동작합니다.
해당 프로그램은 화면캡쳐를 제한하는 기능만을 포함합니다.
자동 설치되지 않을 경우 클릭하여 설치하세요.

4. 보안문서의 답장, 전달 시 최초 수신인 외에는 열람이 불가합니다.

추가적인 도움이 필요하시면 FAQ 확인 또는서비스 데스크 (+82-(2)-1661-3311) 로 연락주십시오.

Security Document for Samsung
Samsung secure documents are encrypted into PDF.Check the authentication procedure and required installation program to view the security document

1. To view a secure document, you need to authenticate yourself using the following information.
- Knox Portal User : Knox Portal ID + Knox Portal Password
- Knox Portal Non-User : E-mail address + *pre-registered Password
*pre-registered Password : See the "Guide for Samsung Security Mail," which was sent separately.

2. Security documents can be viewed only with Adobe Reader.
It can be used only if the installed version is 10.1.1 or higher.
If you have not installed it yet, please click to install it.
* Cannot be opened with Windows Viewer App [ windows 8.1 or higher ]

3. The screen capture prevention program is active while viewing a secure document.
It only has a feature that blocks screen captures.
If it does not install automatically, please click to install it.

4. When replying or forwarding a secure document, it is not available for viewing except the first recipient.

If you need additional help, please check FAQ or contact us.Service Desk (+82-(2)-1661-3311)

body_20210510085719_1574384853.mht.pdf
Description: Binary data
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 03/15] swiotlb: Add DMA_RESTRICTED_POOL

Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/Kconfig | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 01/15] swiotlb: Refactor swiotlb init functions

Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Note that we now also call set_memory_decrypted in swiotlb_init_with_tbl.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 51 ++--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..d3232fc19385 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,30 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+
+   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +207,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,7 +295,6 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
@@ -297,20 +309,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
-   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 00/15] Restricted DMA

From: Claire Chang 

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v6:
Address the comments in v5

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/
*** BLURB HERE ***

Claire Chang (15):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Add DMA_RESTRICTED_POOL
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add a new get_io_tlb_mem getter
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Bounce data from/to restricted DMA pool if available
  swiotlb: Move alloc_size to find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add restricted DMA alloc/free support.
  dma-direct: Allocate memory from restricted DMA pool if available
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  27 ++
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  25 ++
 drivers/of/device.c   |   3 +
 drivers/of/of_private.h   |   5 +
 drivers/pci/xen-pcifront.c|   2 +-
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/device.h|   4 +
 include/linux/swiotlb.h   |  41 ++-
 kernel/dma/Kconfig|  14 +
 kernel/dma/direct.c   |  63 +++--
 kernel/dma/direct.h   |   9 +-
 kernel/dma/swiotlb.c  | 242 +-
 15 files changed, 356 insertions(+), 97 deletions(-)

-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 02/15] swiotlb: Refactor swiotlb_create_debugfs

Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3232fc19385..858475bd6923 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -64,6 +64,7 @@
 enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem *io_tlb_default_mem;
+static struct dentry *debugfs_dir;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -662,18 +663,27 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs(struct io_tlb_mem *mem, const char *name,
+  struct dentry *node)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
+   return;
+
+   mem->debugfs = debugfs_create_dir(name, node);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   swiotlb_create_debugfs(mem, "swiotlb", NULL);
+   debugfs_dir = mem->debugfs;
+
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 00/16] Restricted DMA

v6: https://lore.kernel.org/patchwork/cover/1423201/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-10 Thread Keqian Zhu

Hi Baolu,

On 2021/5/10 9:08, Lu Baolu wrote:
> Hi Keqian,
> 
> On 5/8/21 3:35 PM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> On 2021/5/8 11:46, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 5/7/21 6:21 PM, Keqian Zhu wrote:
 Some types of IOMMU are capable of tracking DMA dirty log, such as
 ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
 dirty log tracking framework in the IOMMU base layer.

 Four new essential interfaces are added, and we maintaince the status
 of dirty log tracking in iommu_domain.
 1. iommu_support_dirty_log: Check whether domain supports dirty log 
 tracking
 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

 Note: Don't concurrently call these interfaces with other ops that
 access underlying page table.

 Signed-off-by: Keqian Zhu
 Signed-off-by: Kunkun Jiang
 ---
drivers/iommu/iommu.c| 201 +++
include/linux/iommu.h|  63 +++
include/trace/events/iommu.h |  63 +++
3 files changed, 327 insertions(+)

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 808ab70d5df5..0d15620d1e90 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1940,6 +1940,7 @@ static struct iommu_domain 
 *__iommu_domain_alloc(struct bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later 
 */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 +mutex_init(>switch_log_lock);
  return domain;
}
 @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
 *domain,
}
EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
+bool iommu_support_dirty_log(struct iommu_domain *domain)
 +{
 +const struct iommu_ops *ops = domain->ops;
 +
 +return ops->support_dirty_log && ops->support_dirty_log(domain);
 +}
 +EXPORT_SYMBOL_GPL(iommu_support_dirty_log);
>>> I suppose this interface is to ask the vendor IOMMU driver to check
>>> whether each device/iommu in the domain supports dirty bit tracking.
>>> But what will happen if new devices with different tracking capability
>>> are added afterward?
>> Yep, this is considered in the vfio part. We will query again after 
>> attaching or
>> detaching devices from the domain.  When the domain becomes capable, we 
>> enable
>> dirty log for it. When it becomes not capable, we disable dirty log for it.
> 
> If that's the case, why not putting this logic in the iommu subsystem so
> that it doesn't need to be duplicate in different upper layers?
> 
> For example, add something like dirty_page_trackable in the struct of
> iommu_domain and ask the vendor iommu driver to update it once any
> device is added/removed to/from the domain. It's also better to disallow
If we do it, the upper layer still needs to query the capability from domain 
and switch
dirty log tracking for it. Or do you mean the domain can switch dirty log 
tracking automatically
when its capability change? If so, I think we're lack of some flexibility. The 
upper layer
may have it's own policy, such as only enable dirty log tracking when all 
domains are capable,
and disable dirty log tracking when just one domain is not capable.

> any domain attach/detach once the dirty page tracking is on.
Yep, this can greatly simplify our code logic, but I don't know whether our 
maintainers
agree that, as they may think that IOMMU dirty logging should not change 
original domain
behaviors.


Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 06/15] swiotlb: Update is_swiotlb_buffer to add a struct device argument

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  6 +++---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..a5df35bfd150 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -832,7 +832,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b469f04cca26..2a6cca07540b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -113,9 +113,9 @@ static inline struct io_tlb_mem *get_io_tlb_mem(struct 
device *dev)
return io_tlb_default_mem;
 }
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -127,7 +127,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,
   dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device

[PATCH v6 04/15] swiotlb: Add restricted DMA pool initialization

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Signed-off-by: Claire Chang 
---
 include/linux/device.h  |  4 +++
 include/linux/swiotlb.h |  3 +-
 kernel/dma/swiotlb.c| 79 +
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 38a2071cf776..4987608ea4ff 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -521,6 +522,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..03ad6e3b4056 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 858475bd6923..4ea027b75013 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -687,3 +694,75 @@ static int __init swiotlb_create_default_debugfs(void)
 late_initcall(swiotlb_create_default_debugfs);
 
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   if (dev->dma_io_tlb_mem)
+   return 0;
+
+   /* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+#ifdef CONFIG_ARM
+   if (!PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   kfree(mem);
+   return -EINVAL;
+   }
+#endif /* CONFIG_ARM */
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+
+   rmem->priv = mem;
+
+#ifdef CONFIG_DEBUG_FS
+   if (!debugfs_dir)
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+
+   swiotlb_create_debugfs(mem, rmem->name, debugfs_dir);
+#endif /* CONFIG_DEBUG_FS */
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   if (dev)
+   dev->dma_io_tlb_mem = NULL;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved memory: created device swiotlb memory pool at %pa, 
size %ld MiB\n",
+   >base, (unsigned long)rmem->size / SZ_1M);
+   return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list

[PATCH v6 05/15] swiotlb: Add a new get_io_tlb_mem getter

Add a new getter, get_io_tlb_mem, to help select the io_tlb_mem struct.
The restricted DMA pool is preferred if available.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 03ad6e3b4056..b469f04cca26 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -102,6 +103,16 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
+static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev && dev->dma_io_tlb_mem)
+   return dev->dma_io_tlb_mem;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
+   return io_tlb_default_mem;
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
struct io_tlb_mem *mem = io_tlb_default_mem;
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 08/15] swiotlb: Bounce data from/to restricted DMA pool if available

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that is_dev_swiotlb_force doesn't check if
swiotlb_force == SWIOTLB_FORCE. Otherwise the memory allocation behavior
with default swiotlb will be changed by the following patche
("dma-direct: Allocate memory from restricted DMA pool if available").

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 13 +
 kernel/dma/direct.c |  3 ++-
 kernel/dma/direct.h |  3 ++-
 kernel/dma/swiotlb.c|  8 
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index c530c976d18b..0c5a18d9cf89 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -120,6 +120,15 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev->dma_io_tlb_mem)
+   return true;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+   return false;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -131,6 +140,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..078f7087e466 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,8 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
+is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..f94813674e23 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,8 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (unlikely(swiotlb_force == SWIOTLB_FORCE) ||
+   is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 81bed3d2c771..3f1ad080a4bc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -347,7 +347,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -429,7 +429,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -506,7 +506,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -557,7 +557,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
unsigned long flags;
unsigned int offset =

[PATCH v6 09/15] swiotlb: Move alloc_size to find_slots

Move the maintenance of alloc_size to find_slots for better code
reusability later.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3f1ad080a4bc..ef04d8f7708f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -482,8 +482,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -538,11 +541,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 07/15] swiotlb: Update is_swiotlb_active to add a struct device argument

Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..7d48c433446b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(NULL)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index e8b506a6685b..2a2ae6d6cf6d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(NULL);
 #endif
 
ret = ttm_device_init(>ttm.bdev, _bo_driver, drm->dev->dev,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..6d548ce53ce7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(NULL)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2a6cca07540b..c530c976d18b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -123,7 +123,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -143,7 +143,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 4ea027b75013..81bed3d2c771 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -662,9 +662,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return get_io_tlb_mem(dev) != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.31.1.607.g51e8a6a459-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info and install bypass SMR

2021-05-10 Thread Jon Nettleton

On Mon, May 10, 2021 at 10:40 AM Shameerali Kolothum Thodi
 wrote:
>
>
>
> > -Original Message-
> > From: Steven Price [mailto:steven.pr...@arm.com]
> > Sent: 06 May 2021 16:17
> > To: Shameerali Kolothum Thodi ;
> > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> > iommu@lists.linux-foundation.org
> > Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> > j...@8bytes.org; robin.mur...@arm.com; wanghuiqiang
> > ; Guohanjun (Hanjun Guo)
> > ; sami.muja...@arm.com; j...@solid-run.com;
> > eric.au...@redhat.com
> > Subject: Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info
> > and install bypass SMR
> >
> > On 20/04/2021 09:27, Shameer Kolothum wrote:
> > > From: Jon Nettleton 
> > >
> > > Check if there is any RMR info associated with the devices behind
> > > the SMMU and if any, install bypass SMRs for them. This is to
> > > keep any ongoing traffic associated with these devices alive
> > > when we enable/reset SMMU during probe().
> > >
> > > Signed-off-by: Jon Nettleton 
> > > Signed-off-by: Shameer Kolothum
> > 
> > > ---
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c | 42
> > +++
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.h |  2 ++
> > >   2 files changed, 44 insertions(+)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > index d8c6bfde6a61..4d2f91626d87 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > @@ -2102,6 +2102,43 @@ err_reset_platform_ops: __maybe_unused;
> > > return err;
> > >   }
> > >
> > > +static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device
> > *smmu)
> > > +{
> > > +   struct iommu_rmr *e;
> > > +   int i, cnt = 0;
> > > +   u32 smr;
> > > +
> > > +   for (i = 0; i < smmu->num_mapping_groups; i++) {
> > > +   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
> > > +   if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
> > > +   continue;
> > > +
> > > +   list_for_each_entry(e, >rmr_list, list) {
> > > +   if (FIELD_GET(ARM_SMMU_SMR_ID, smr) != e->sid)
> > > +   continue;
> > > +
> > > +   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
> > > +   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK,
> > smr);
> > > +   smmu->smrs[i].valid = true;
> > > +
> > > +   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
> > > +   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
> > > +   smmu->s2crs[i].cbndx = 0xff;
> > > +
> > > +   cnt++;
> > > +   }
> > > +   }
> >
> > If I understand this correctly - this is looking at the current
> > (hardware) configuration of the SMMU and attempting to preserve any
> > bypass SMRs. However from what I can tell it suffers from the following
> > two problems:
> >
> >   (a) Only the ID of the SMR is being checked, not the MASK. So if the
> > firmware has setup an SMR matching a number of streams this will break.
> >
> >   (b) The SMMU might not be enabled at all (CLIENTPD==1) or bypass
> > enabled for unmatched streams (USFCFG==0).
> >
> > Certainly in my test setup case (b) applies and so this doesn't work.
> > Perhaps something like the below would work better? (It works in the
> > case of the SMMU not enabled - I've not tested case (a)).
>
> Thanks Steve for taking a look and testing this on SMMUv2. My knowledge
> on SMMUv2 is limited an don't have a setup to verify this. After reading
> the code, agree that the current implementation addresses the hardware
> configuration only and breaks all the scenarios explained above.
>
> I will include the below snippet in next respin if that works.
>
> Hi Jon,
>
> Could you please take a look and see the below changes works for
> you too?

My original code was derived from a solution that was proposed for
device-tree booted systems.  I will review and test the changes and
then report back.

-Jon


>
> Thanks,
> Shameer
>
> > 8<
> > static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device
> > *smmu)
> > {
> >   struct iommu_rmr *e;
> >   int i, cnt = 0;
> >   u32 smr;
> >   u32 reg;
> >
> >   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);
> >
> >   if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg &
> > ARM_SMMU_sCR0_CLIENTPD)) {
> >   /*
> >* SMMU is already enabled and disallowing bypass, so preserve
> >* the existing SMRs
> >*/
> >   for (i = 0; i < smmu->num_mapping_groups; i++) {
> >   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
> >   if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
> >   continue;
> >   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
> >   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK,
> > smr);
> >

Re: [PATCH v6 05/15] swiotlb: Add a new get_io_tlb_mem getter

> +static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
> +{
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> + if (dev && dev->dma_io_tlb_mem)
> + return dev->dma_io_tlb_mem;
> +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> +
> + return io_tlb_default_mem;

Given that we're also looking into a not addressing restricted pool
I'd rather always assign the active pool to dev->dma_io_tlb_mem and
do away with this helper.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-10 Thread Raj, Ashok

On Mon, May 10, 2021 at 09:37:29AM -0300, Jason Gunthorpe wrote:
> On Sat, May 08, 2021 at 09:56:59AM +, Tian, Kevin wrote:
> > > From: Raj, Ashok 
> > > Sent: Friday, May 7, 2021 12:33 AM
> > > 
> > > > Basically it means when the guest's top level IOASID is created for
> > > > nesting that IOASID claims all PASID's on the RID and excludes any
> > > > PASID IOASIDs from existing on the RID now or in future.
> > > 
> > > The way to look at it this is as follows:
> > > 
> > > For platforms that do not have a need to support shared work queue model
> > > support for ENQCMD or similar, PASID space is naturally per RID. There is 
> > > no
> > > complication with this. Every RID has the full range of PASID's and no 
> > > need
> > > for host to track which PASIDs are allocated now or in future in the 
> > > guest.
> > > 
> > > For platforms that support ENQCMD, it is required to mandate PASIDs are
> > > global across the entire system. Maybe its better to call them gPASID for
> > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered
> > > that
> > > in earlier responses)
> > > 
> > > In our current implementation we actually don't separate this space, and
> > > gPASID == hPASID. The iommu driver enforces that by using the custom
> > > allocator and the architected interface that allows all guest vIOMMU
> > > allocations to be proxied to host. Nothing but a glorified hypercall like
> > > interface. In fact some OS's do use hypercall to get a hPASID vs using
> > > the vCMD style interface.
> > > 
> > 
> > After more thinking about the new interface, I feel gPASID==hPASID 
> > actually causes some confusion in uAPI design. In concept an ioasid
> > is not active until it's attached to a device, because it's just an ID
> > if w/o a device. So supposedly an ioasid should reject all user commands
> > before attach. However an guest likely asks for a new gPASID before
> > attaching it to devices and vIOMMU. if gPASID==hPASID then Qemu 
> > must request /dev/ioasid to allocate a hw_id for an ioasid which hasn't 
> > been attached to any device, with the assumption on kernel knowledge 
> > that this hw_id is from an global allocator w/o dependency on any 
> > device. This doesn't sound a clean design, not to say it also conflicts 
> > with live migration.
> 
> Everything must be explicit. The situation David pointed to of
> qemu emulating a vIOMMU while running on a host with a different
> platform/physical IOMMU must be considered.
> 
> If the vIOMMU needs specific behavior it must use /dev/iommu to ask
> for it specifically and not just make wild assumptions about how the
> platform works.

I think the right way is for pIOMMU to enforce the right behavior. vIOMMU
can ask for a PASID and physical IOMMU driver would give what is optimal
for the platform. if vIOMMU says give me per-device PASID, but that can
lead to conflicts in PASID name space, its best to avoid it. 

Global PASID doesn't break anything, but giving that control to vIOMMU
doesn't seem right. When we have mixed uses cases like hardware that
supports shared wq and SRIOV devices that need PASIDs we need to 
comprehend how they will work without having a backend to migrate PASIDs 
to new destination.

for ENQCMD we have the gPASID->hPASID translation in the VMCS control.
For devices that support SIOV, programming a PASID to  a device is also
mediated, so its possible for something like the mediated interface to
assist with that migration for the dedicated WQ.


When we have both SRIOV and shared WQ exposed to the same guest, we 
do have an issue. The simplest way that I thought was to have a guest 
and host PASID separation.  Where the guest has its own PASID  space 
and host has its own carved out.  Guest can do what ever it wants within 
that allocated space without fear of any collition with any other device. 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

i915 and swiotlb_max_segment

Hi all,

swiotlb_max_segment is a rather strange "API" export by swiotlb.c,
and i915 is the only (remaining) user.

swiotlb_max_segment returns 0 if swiotlb is not in use, 1 if
SWIOTLB_FORCE is set or swiotlb-zen is set, and the swiotlb segment
size when swiotlb is otherwise enabled.

i915 then uses it to:

 a) decided on the max order in i915_gem_object_get_pages_internal
 b) decide on a max segment size in i915_sg_segment_size

for a) it really seems i915 should switch to dma_alloc_noncoherent
or dma_alloc_noncontigous ASAP instead of using alloc_page and
streaming DMA mappings.  Any chance I could trick one of the i915
maintaines into doing just that given that the callchain is not
exactly trivial?

For b) I'm not sure swiotlb and i915 really agree on the meaning
of the value.  swiotlb_set_max_segment basically returns the entire
size of the swiotlb buffer, while i915 seems to use it to limit
the size each scatterlist entry.  It seems like dma_max_mapping_size
might be the best value to use here.

Once that is fixed I'd like to kill off swiotlb_max_segment as it is
a horribly confusing API.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

On Mon, May 10, 2021 at 08:25:02AM -0700, Raj, Ashok wrote:

> Global PASID doesn't break anything, but giving that control to vIOMMU
> doesn't seem right. When we have mixed uses cases like hardware that
> supports shared wq and SRIOV devices that need PASIDs we need to 
> comprehend how they will work without having a backend to migrate PASIDs 
> to new destination.

Why wouldn't there be a backend? SRIOV live migration is a real thing
now (see Max's VFIO patches). The PASID space of the entire dedicated
RID needs to be migratable, which means the destination vIOMMU must be
able to program its local hardware with the same PASID numbers and any
kind of global PASID scheme at all will interfere with it.

> When we have both SRIOV and shared WQ exposed to the same guest, we 
> do have an issue. The simplest way that I thought was to have a guest 
> and host PASID separation.  Where the guest has its own PASID  space 
> and host has its own carved out.  Guest can do what ever it wants within 
> that allocated space without fear of any collition with any other device. 

And how do you reliably migrate if the target kernel has a PASID
already allocated in that range?

ENQCMD must not assume it is the only thing in the platform. It needs
to be compartmentalized to specific participating RIDs and made
explicit because it has a bad special requirement for cross-device
PASIDs

Jason 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 04/15] swiotlb: Add restricted DMA pool initialization

> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#endif

I don't think any of this belongs into swiotlb.c.  Marking
swiotlb_init_io_tlb_mem non-static and having all this code in a separate
file is probably a better idea.

> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
> +
> + if (dev->dma_io_tlb_mem)
> + return 0;
> +
> + /* Since multiple devices can share the same pool, the private data,
> +  * io_tlb_mem struct, will be initialized by the first device attached
> +  * to it.
> +  */

This is not the normal kernel comment style.

> +#ifdef CONFIG_ARM
> + if (!PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
> + kfree(mem);
> + return -EINVAL;
> + }
> +#endif /* CONFIG_ARM */

And this is weird.  Why would ARM have such a restriction?  And if we have
such rstrictions it absolutely belongs into an arch helper.

> + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
> +
> + rmem->priv = mem;
> +
> +#ifdef CONFIG_DEBUG_FS
> + if (!debugfs_dir)
> + debugfs_dir = debugfs_create_dir("swiotlb", NULL);
> +
> + swiotlb_create_debugfs(mem, rmem->name, debugfs_dir);

Doesn't the debugfs_create_dir belong into swiotlb_create_debugfs?  Also
please use IS_ENABLEd or a stub to avoid ifdefs like this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions associated with a dev

2021-05-10 Thread Robin Murphy


On 2021-05-10 10:19, Shameerali Kolothum Thodi wrote:

Hi Robin,


-Original Message-
From: Robin Murphy [mailto:robin.mur...@arm.com]
Sent: 07 May 2021 11:02
To: Shameerali Kolothum Thodi ;
linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: Linuxarm ; lorenzo.pieral...@arm.com;
j...@8bytes.org; wanghuiqiang ; Guohanjun
(Hanjun Guo) ; steven.pr...@arm.com;
sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com
Subject: Re: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions
associated with a dev

On 2021-04-20 09:27, Shameer Kolothum wrote:

Get RMR regions associated with a dev reserved so that there is
a unity mapping for them in SMMU.

Signed-off-by: Shameer Kolothum



---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29

+

   1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

index 14e9c7034c04..8bacedf7bb34 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2531,6 +2531,34 @@ static int arm_smmu_of_xlate(struct device *dev,

struct of_phandle_args *args)

return iommu_fwspec_add_ids(dev, args->args, 1);
   }

+static bool arm_smmu_dev_has_rmr(struct arm_smmu_master *master,
+struct iommu_rmr *e)
+{
+   int i;
+
+   for (i = 0; i < master->num_sids; i++) {
+   if (e->sid == master->sids[i])
+   return true;
+   }
+
+   return false;
+}
+
+static void arm_smmu_rmr_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+   struct arm_smmu_device *smmu = master->smmu;
+   struct iommu_rmr *rmr;
+
+   list_for_each_entry(rmr, >rmr_list, list) {
+   if (!arm_smmu_dev_has_rmr(master, rmr))
+   continue;
+
+   iommu_dma_get_rmr_resv_regions(dev, rmr, head);
+   }
+}
+


TBH I wouldn't have thought we need a driver-specific hook for this, or
is it too painful to correlate fwspec->iommu_fwnode back to the relevant
IORT node generically?


 From a quick look, I think I could get rid of the above with something like 
below,

--8<
+static bool iommu_dma_dev_has_rmr(struct iommu_fwspec *fwspec,
+ struct iommu_rmr *e)
+{
+   int i;
+
+   for (i = 0; i < fwspec->num_ids; i++) {
+if (e->sid == fwspec->ids[i])
+return true;
+}
+
+return false;
+}
+
+
+void iommu_dma_get_rmr_resv_regions(struct device *dev, struct list_head *list)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct list_head rmr_list;
+   struct iommu_rmr *rmr;
+
+   INIT_LIST_HEAD(_list);
+   if (iommu_dma_get_rmrs(fwspec->iommu_fwnode, _list))
+   return;
 ...
+   list_for_each_entry(rmr, _list, list) {
+
+   if (!iommu_dma_dev_has_rmr(fwspec, rmr)
+   continue;
+  ...
+   region = iommu_alloc_resv_region(rmr->base_address,
+rmr->length, prot,
+type);
  ...
+   }
+}
  /**
   * iommu_dma_get_resv_regions - Reserved region driver helper
   * @dev: Device from iommu_get_resv_regions()
@@ -188,10 +242,11 @@ void iommu_dma_get_resv_regions(struct device *dev, 
struct list_head *list)
 if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
 iort_iommu_msi_get_resv_regions(dev, list);
  
+   iommu_dma_get_rmr_resv_regions(dev, list);

  }

8<

But looking at the SMMUv2 code, the fwspec->ids is MASK:SID, so I am not
sure the RMR sid can be compared directly to fwspec->ids above. Right? Or
is there a better way here?


Ah, but consider how the IDs got there in the first place ;)

A mask will never be set on ACPI systems, since IORT (intentionally) 
only caters for straightforward mappings rather than arbitrary 
complexity, so the assumption of fwspec ID == SID is already baked in by 
virtue of arm_smmu_iort_xlate(). The IORT code is free to assume its own 
behaviour!


Robin.



Thanks,
Shameer





   static void arm_smmu_get_resv_regions(struct device *dev,
  struct list_head *head)
   {
@@ -2545,6 +2573,7 @@ static void arm_smmu_get_resv_regions(struct

device *dev,

list_add_tail(>list, head);

iommu_dma_get_resv_regions(dev, head);
+   arm_smmu_rmr_get_resv_regions(dev, head);
   }

   static bool arm_smmu_dev_has_feature(struct device *dev,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 00/15] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured

For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced. 

This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry
from last rb tree node if iova search fails"), as discussed at [0].

IOVAs which cannot be cached are highly involved in the IOVA aging issue,
as discussed at [1].

This series allows the IOVA rcache range be configured, so that we may
cache all IOVAs per domain, thus improving performance.

A new IOMMU group sysfs file is added - max_opt_dma_size - which is used
indirectly to configure the IOVA rcache range:
/sys/kernel/iommu_groups/X/max_opt_dma_size

This file is updated same as how the IOMMU group default domain type is
updated, i.e. must unbind the only device in the group first. However, the
IOMMU default domain is reallocated in the device driver reprobe, and not
immediately.

In addition, we keep (from v1 series) the DMA mapping API to allow DMA max
optimised size be set from a LLDD. How it works is a lot different. When
the LLDD calls this during probe, once the value is successfully recorded, we
return -EDEFER_PROBE. In the reprobe, the IOMM group default domain is
reallocated, and the new IOVA domain rcache upper limit is set according
to that DMA max optimised size. As such, we don't operate on a live IOMMU
domain.

Note that the DMA mapping API frontend is not strictly required, but saves
the LLDD calling IOMMU APIs directly, that being not preferred.

Some figures for storage scenario:
v5.13-rc1 baseline: 1200K IOPS
With series:1800K IOPS

All above are for IOMMU strict mode. Non-strict mode gives ~1800K IOPS in
all scenarios.

Patch breakdown:
1-11: Add support for setting DMA max optimised size via sysfs
12-15: Add support for setting DMA max optimised size from LLDD

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.ga...@huawei.com/

Differences to v1:
- Many
- Change method to not operate on a 'live' IOMMU domain:
- rather, force device driver to be re-probed once
  dma_max_opt_size is set, and reconfig a new IOMMU group then
- Add iommu sysfs max_dma_opt_size file, and allow updating same as how
  group type is changed 

John Garry (15):
  iommu: Reactor iommu_group_store_type()
  iova: Allow rcache range upper limit to be flexible
  iommu: Allow max opt DMA len be set for a group via sysfs
  iommu: Add iommu_group_get_max_opt_dma_size()
  iova: Add iova_domain_len_is_cached()
  iommu: Allow iommu_change_dev_def_domain() realloc default domain for
same type
  iommu: Add iommu_realloc_dev_group()
  dma-iommu: Add iommu_reconfig_dev_group_dma()
  iova: Add init_iova_domain_ext()
  dma-iommu: Use init_iova_domain_ext() for IOVA domain init
  dma-iommu: Reconfig group domain
  iommu: Add iommu_set_dev_dma_opt_size()
  dma-mapping: Add dma_set_max_opt_size()
  dma-iommu: Add iommu_dma_set_opt_size()
  scsi: hisi_sas: Set max optimal DMA size for v3 hw

 drivers/iommu/dma-iommu.c  |  51 +-
 drivers/iommu/iommu.c  | 231 +++--
 drivers/iommu/iova.c   |  61 +--
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   5 +
 include/linux/dma-iommu.h  |   4 +
 include/linux/dma-map-ops.h|   1 +
 include/linux/dma-mapping.h|   8 +
 include/linux/iommu.h  |  19 ++
 include/linux/iova.h   |  21 ++-
 kernel/dma/mapping.c   |  11 ++
 10 files changed, 344 insertions(+), 68 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 07/15] iommu: Add iommu_realloc_dev_group()

Add a function to re-alloc IOMMU group default domain.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 12 
 include/linux/iommu.h |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f7253a973ab9..bdb9aa47dfca 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3191,6 +3191,18 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
return ret;
 }
 
+int iommu_realloc_dev_group(struct device *dev)
+{
+   struct iommu_group *group;
+   int ret;
+
+   group = iommu_group_get(dev);
+   ret = iommu_change_dev_def_domain(group, dev, 0, true);
+   iommu_group_put(group);
+
+   return ret;
+}
+
 /*
  * Changing the default domain or any other IOMMU group attribute through sysfs
  * requires the users to unbind the drivers from the devices in the IOMMU 
group.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e26abda94792..6e187746af0f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -482,6 +482,7 @@ bool iommu_get_dma_strict(struct iommu_domain *domain);
 
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
  unsigned long iova, int flags);
+extern int iommu_realloc_dev_group(struct device *dev);
 
 static inline void iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
@@ -699,6 +700,11 @@ static inline size_t iommu_map_sg_atomic(struct 
iommu_domain *domain,
return 0;
 }
 
+static inline int iommu_realloc_dev_group(struct device *dev)
+{
+   return -ENODEV;
+}
+
 static inline void iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
 }
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 04/15] iommu: Add iommu_group_get_max_opt_dma_size()

Add a function to return the max optimised DMA size for an IOMMU group.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 5 +
 include/linux/iommu.h | 6 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63cdfb11ebed..62e4491f32e0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2296,6 +2296,11 @@ struct iommu_domain *iommu_get_dma_domain(struct device 
*dev)
return dev->iommu_group->default_domain;
 }
 
+size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group)
+{
+   return group->max_opt_dma_size;
+}
+
 /*
  * IOMMU groups are really the natural working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..e26abda94792 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -405,6 +405,7 @@ extern int iommu_sva_unbind_gpasid(struct iommu_domain 
*domain,
   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
+extern size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
 extern int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
@@ -653,6 +654,11 @@ static inline struct iommu_domain 
*iommu_get_domain_for_dev(struct device *dev)
return NULL;
 }
 
+static inline size_t iommu_group_get_max_opt_dma_size(struct iommu_group 
*group)
+{
+   return 0;
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
 {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 03/15] iommu: Allow max opt DMA len be set for a group via sysfs

Add support to allow the maximum optimised DMA len be set for an IOMMU
group via sysfs.

This much the same with the method to change the default domain type for a
group.

However, unlike changing the default domain type, the new domains will be
allocated on a member device reprobe path.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4d12b607918c..63cdfb11ebed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -45,6 +45,7 @@ struct iommu_group {
struct iommu_domain *default_domain;
struct iommu_domain *domain;
struct list_head entry;
+   size_t max_opt_dma_size;
 };
 
 struct group_device {
@@ -86,6 +87,9 @@ static int iommu_create_device_direct_mappings(struct 
iommu_group *group,
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
  const char *buf, size_t count);
+static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group,
+ const char *buf,
+ size_t count);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -554,6 +558,12 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
return strlen(type);
 }
 
+static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group,
+char *buf)
+{
+   return sprintf(buf, "%zu\n", group->max_opt_dma_size);
+}
+
 static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
 
 static IOMMU_GROUP_ATTR(reserved_regions, 0444,
@@ -562,6 +572,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444,
 static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
iommu_group_store_type);
 
+static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, 
iommu_group_show_max_opt_dma_size,
+   iommu_group_store_max_opt_dma_size);
+
 static void iommu_group_release(struct kobject *kobj)
 {
struct iommu_group *group = to_iommu_group(kobj);
@@ -648,6 +661,10 @@ struct iommu_group *iommu_group_alloc(void)
if (ret)
return ERR_PTR(ret);
 
+   ret = iommu_group_create_file(group, 
_group_attr_max_opt_dma_size);
+   if (ret)
+   return ERR_PTR(ret);
+
pr_debug("Allocated group %d\n", group->id);
 
return group;
@@ -3279,3 +3296,29 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
return iommu_group_store_common(group, buf, count,
iommu_group_store_type_cb);
 }
+
+static int iommu_group_store_max_opt_dma_size_cb(const char *buf,
+struct iommu_group *group,
+struct device *dev)
+{
+   unsigned long val;
+   char *endp;
+
+   val = simple_strtoul(buf, , 0);
+   if (endp == buf)
+   return -EINVAL;
+
+   mutex_lock(>mutex);
+   group->max_opt_dma_size = val;
+   mutex_unlock(>mutex);
+
+   return 0;
+}
+
+static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group,
+ const char *buf,
+ size_t count)
+{
+   return iommu_group_store_common(group, buf, count,
+   iommu_group_store_max_opt_dma_size_cb);
+}
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 01/15] iommu: Reactor iommu_group_store_type()

Function iommu_group_store_type() supports changing the default domain of
an IOMMU group.

Many conditions need to be satisfied and steps taken for this action to be
successful.

Satisfying these conditions and steps will be required for setting other
IOMMU group attributes, so factor into a common part and a part specific
to update the IOMMU group attribute.

No functional change intended.

Some code comments are tidied up also.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 73 +++
 1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..4d12b607918c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3169,20 +3169,23 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
 }
 
 /*
- * Changing the default domain through sysfs requires the users to ubind the
- * drivers from the devices in the iommu group. Return failure if this doesn't
- * meet.
+ * Changing the default domain or any other IOMMU group attribute through sysfs
+ * requires the users to unbind the drivers from the devices in the IOMMU 
group.
+ * Return failure if this precondition is not met.
  *
  * We need to consider the race between this and the device release path.
  * device_lock(dev) is used here to guarantee that the device release path
  * will not be entered at the same time.
  */
-static ssize_t iommu_group_store_type(struct iommu_group *group,
- const char *buf, size_t count)
+static ssize_t iommu_group_store_common(struct iommu_group *group,
+   const char *buf, size_t count,
+   int (*cb)(const char *buf,
+ struct iommu_group *group,
+ struct device *dev))
 {
struct group_device *grp_dev;
struct device *dev;
-   int ret, req_type;
+   int ret;
 
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
return -EACCES;
@@ -3190,25 +3193,16 @@ static ssize_t iommu_group_store_type(struct 
iommu_group *group,
if (WARN_ON(!group))
return -EINVAL;
 
-   if (sysfs_streq(buf, "identity"))
-   req_type = IOMMU_DOMAIN_IDENTITY;
-   else if (sysfs_streq(buf, "DMA"))
-   req_type = IOMMU_DOMAIN_DMA;
-   else if (sysfs_streq(buf, "auto"))
-   req_type = 0;
-   else
-   return -EINVAL;
-
/*
 * Lock/Unlock the group mutex here before device lock to
-* 1. Make sure that the iommu group has only one device (this is a
+* 1. Make sure that the IOMMU group has only one device (this is a
 *prerequisite for step 2)
 * 2. Get struct *dev which is needed to lock device
 */
mutex_lock(>mutex);
if (iommu_group_device_count(group) != 1) {
mutex_unlock(>mutex);
-   pr_err_ratelimited("Cannot change default domain: Group has 
more than one device\n");
+   pr_err_ratelimited("Cannot change IOMMU group default domain 
attribute: Group has more than one device\n");
return -EINVAL;
}
 
@@ -3220,16 +3214,16 @@ static ssize_t iommu_group_store_type(struct 
iommu_group *group,
/*
 * Don't hold the group mutex because taking group mutex first and then
 * the device lock could potentially cause a deadlock as below. Assume
-* two threads T1 and T2. T1 is trying to change default domain of an
-* iommu group and T2 is trying to hot unplug a device or release [1] VF
-* of a PCIe device which is in the same iommu group. T1 takes group
-* mutex and before it could take device lock assume T2 has taken device
-* lock and is yet to take group mutex. Now, both the threads will be
-* waiting for the other thread to release lock. Below, lock order was
-* suggested.
+* two threads, T1 and T2. T1 is trying to change default domain
+* attribute of an IOMMU group and T2 is trying to hot unplug a device
+* or release [1] VF of a PCIe device which is in the same IOMMU group.
+* T1 takes the group mutex and before it could take device lock T2 may
+* have taken device lock and is yet to take group mutex. Now, both the
+* threads will be waiting for the other thread to release lock. Below,
+* lock order was suggested.
 * device_lock(dev);
 *  mutex_lock(>mutex);
-*  iommu_change_dev_def_domain();
+*  cb->iommu_change_dev_def_domain(); [example cb]
 *  mutex_unlock(>mutex);
 * device_unlock(dev);
 *
@@ -3243,7 +3237,7 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
 */
mutex_unlock(>mutex);
 
-   /*

[PATCH v2 08/15] dma-iommu: Add iommu_reconfig_dev_group_dma()

Add a function to reconfigure the IOMMU group for a device, if necessary.

IOVAs are cached in power-of-2 granules, so there is no point in allocating
a new IOMMU domain if the current range is suitable.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 25 +
 include/linux/dma-iommu.h |  4 
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f6d3302bb829..4fb82c554ede 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -315,6 +315,31 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+void iommu_reconfig_dev_group_dma(struct device *dev)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   unsigned long shift, iova_len;
+   struct iova_domain *iovad;
+   size_t max_opt_dma_size;
+
+   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+   return;
+
+   max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group);
+   if (!max_opt_dma_size)
+   return;
+
+   iovad = >iovad;
+   shift = iova_shift(iovad);
+   iova_len = max_opt_dma_size >> shift;
+
+   if (iova_domain_len_is_cached(iovad, iova_len))
+   return;
+
+   iommu_realloc_dev_group(dev);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..097398b76dcc 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_reconfig_dev_group_dma(struct device *dev);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -53,6 +54,9 @@ static inline void iommu_setup_dma_ops(struct device *dev, 
u64 dma_base,
u64 size)
 {
 }
+static inline void iommu_reconfig_dev_group_dma(struct device *dev)
+{
+}
 
 static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 09/15] iova: Add init_iova_domain_ext()

Add extended version of init_iova_domain() which accepts an max opt
iova length argument, and use it to set the rcaches range.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 29 +++--
 include/linux/iova.h |  9 +
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 95892a0433cc..273a689006c3 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -23,7 +23,7 @@ static bool iova_rcache_insert(struct iova_domain *iovad,
 static unsigned long iova_rcache_get(struct iova_domain *iovad,
 unsigned long size,
 unsigned long limit_pfn);
-static void init_iova_rcaches(struct iova_domain *iovad);
+static void init_iova_rcaches(struct iova_domain *iovad, unsigned long 
iova_len);
 static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 static void fq_destroy_all_entries(struct iova_domain *iovad);
@@ -47,8 +47,8 @@ static struct iova *to_iova(struct rb_node *node)
 }
 
 void
-init_iova_domain(struct iova_domain *iovad, unsigned long granule,
-   unsigned long start_pfn)
+__init_iova_domain(struct iova_domain *iovad, unsigned long granule,
+   unsigned long start_pfn, unsigned long iova_len)
 {
/*
 * IOVA granularity will normally be equal to the smallest
@@ -71,7 +71,21 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
rb_insert_color(>anchor.node, >rbroot);
cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
>cpuhp_dead);
-   init_iova_rcaches(iovad);
+   init_iova_rcaches(iovad, iova_len);
+}
+
+void
+init_iova_domain_ext(struct iova_domain *iovad, unsigned long granule,
+   unsigned long start_pfn, unsigned long iova_len)
+{
+   __init_iova_domain(iovad, granule, start_pfn, iova_len);
+}
+
+void
+init_iova_domain(struct iova_domain *iovad, unsigned long granule,
+   unsigned long start_pfn)
+{
+   __init_iova_domain(iovad, granule, start_pfn, 0);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
@@ -883,14 +897,17 @@ bool iova_domain_len_is_cached(struct iova_domain *iovad, 
unsigned long iova_len
return iova_len_to_rcache_max(iova_len) == iovad->rcache_max_size;
 }
 
-static void init_iova_rcaches(struct iova_domain *iovad)
+static void init_iova_rcaches(struct iova_domain *iovad, unsigned long 
iova_len)
 {
struct iova_cpu_rcache *cpu_rcache;
struct iova_rcache *rcache;
unsigned int cpu;
int i;
 
-   iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
+   if (iova_len)
+   iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);
+   else
+   iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
 
iovad->rcaches = kcalloc(iovad->rcache_max_size,
 sizeof(*iovad->rcaches), GFP_KERNEL);
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 04cc8eb6de38..cfe416b6a8c7 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -154,6 +154,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, 
unsigned long pfn_lo,
unsigned long pfn_hi);
 void init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn);
+void init_iova_domain_ext(struct iova_domain *iovad, unsigned long granule,
+   unsigned long start_pfn, unsigned long iova_len);
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor);
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
@@ -222,6 +224,13 @@ static inline void init_iova_domain(struct iova_domain 
*iovad,
 {
 }
 
+static inline void init_iova_domain_ext(struct iova_domain *iovad,
+   unsigned long granule,
+   unsigned long start_pfn,
+   unsigned long iova_len)
+{
+}
+
 static inline int init_iova_flush_queue(struct iova_domain *iovad,
iova_flush_cb flush_cb,
iova_entry_dtor entry_dtor)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 02/15] iova: Allow rcache range upper limit to be flexible

Some LLDs may request DMA mappings whose IOVA length exceeds that of the
current rcache upper limit.

This means that allocations for those IOVAs will never be cached, and
always must be allocated and freed from the RB tree per DMA mapping cycle.
This has a significant effect on performance, more so since commit
4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
fails"), as discussed at [0].

As a first step towards allowing the rcache range upper limit be configured,
hold this value in the IOVA rcache structure, and allocate the rcaches
separately.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c |  2 +-
 drivers/iommu/iova.c  | 23 +--
 include/linux/iova.h  |  4 ++--
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..f6d3302bb829 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -432,7 +432,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 * rounding up anything cacheable to make sure that can't happen. The
 * order of the unadjusted size will still match upon freeing.
 */
-   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+   if (iova_len < (1 << (iovad->rcache_max_size - 1)))
iova_len = roundup_pow_of_two(iova_len);
 
dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b08039..0e4c0e55178a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,6 +15,8 @@
 /* The anchor node sits above the top of the usable address space */
 #define IOVA_ANCHOR~0UL
 
+#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size 
(in pages) */
+
 static bool iova_rcache_insert(struct iova_domain *iovad,
   unsigned long pfn,
   unsigned long size);
@@ -877,7 +879,14 @@ static void init_iova_rcaches(struct iova_domain *iovad)
unsigned int cpu;
int i;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
+
+   iovad->rcaches = kcalloc(iovad->rcache_max_size,
+sizeof(*iovad->rcaches), GFP_KERNEL);
+   if (!iovad->rcaches)
+   return;
+
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
spin_lock_init(>lock);
rcache->depot_size = 0;
@@ -952,7 +961,7 @@ static bool iova_rcache_insert(struct iova_domain *iovad, 
unsigned long pfn,
 {
unsigned int log_size = order_base_2(size);
 
-   if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
+   if (log_size >= iovad->rcache_max_size)
return false;
 
return __iova_rcache_insert(iovad, >rcaches[log_size], pfn);
@@ -1008,7 +1017,7 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 {
unsigned int log_size = order_base_2(size);
 
-   if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
+   if (log_size >= iovad->rcache_max_size)
return 0;
 
return __iova_rcache_get(>rcaches[log_size], limit_pfn - size);
@@ -1024,7 +1033,7 @@ static void free_iova_rcaches(struct iova_domain *iovad)
unsigned int cpu;
int i, j;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
for_each_possible_cpu(cpu) {
cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
@@ -1035,6 +1044,8 @@ static void free_iova_rcaches(struct iova_domain *iovad)
for (j = 0; j < rcache->depot_size; ++j)
iova_magazine_free(rcache->depot[j]);
}
+
+   kfree(iovad->rcaches);
 }
 
 /*
@@ -1047,7 +1058,7 @@ static void free_cpu_cached_iovas(unsigned int cpu, 
struct iova_domain *iovad)
unsigned long flags;
int i;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
spin_lock_irqsave(_rcache->lock, flags);
@@ -1066,7 +1077,7 @@ static void free_global_cached_iovas(struct iova_domain 
*iovad)
unsigned long flags;
int i, j;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
spin_lock_irqsave(>lock, flags);
for (j = 0; j < rcache->depot_size; ++j) {
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 71d8a2de6635..9974e1d3e2bc 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -25,7 +25,6

[PATCH v2 12/15] iommu: Add iommu_set_dev_dma_opt_size()

Add a function which allows the max optimised IOMMU DMA size to be set.

When set successfully, return -EPROBE_DEFER to inform the caller that the
device driver needs to be reprobed to take effect.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 47 +++
 include/linux/iommu.h |  7 +++
 2 files changed, 54 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bdb9aa47dfca..263d78e26c48 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3058,6 +3058,53 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
 
+int iommu_set_dev_dma_opt_size(struct device *dev, size_t size)
+{
+   struct iommu_group *group = iommu_group_get(dev);
+   struct group_device *grp_dev;
+   struct device *_dev;
+   int ret, group_count;
+
+   if (!group)
+   return 0;
+
+   mutex_lock(>mutex);
+
+   /*
+* If already set, then ignore. We may have been set via sysfs, so
+* honour that.
+*/
+   if (group->max_opt_dma_size) {
+   ret = 0;
+   goto out;
+   }
+
+   group_count = iommu_group_device_count(group);
+   if (group_count != 1) {
+   dev_err_ratelimited(dev, "Cannot change DMA opt size: Group has 
more than one device group_count=%d\n",
+   group_count);
+   ret = -EINVAL;
+   goto out;
+   }
+
+   /* Since group has only one device */
+   grp_dev = list_first_entry(>devices, struct group_device, list);
+   _dev = grp_dev->dev;
+
+   if (_dev != dev) {
+   dev_err_ratelimited(dev, "Cannot set DMA max opt size - device 
has changed\n");
+   ret = -EBUSY;
+   goto out;
+   }
+
+   group->max_opt_dma_size = size;
+   ret = -EPROBE_DEFER;
+out:
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+   return ret;
+}
+
 /*
  * Changes the default domain of an iommu group that has *only* one device
  *
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6e187746af0f..36871e8ae636 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -428,6 +428,7 @@ extern void iommu_get_resv_regions(struct device *dev, 
struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern void generic_iommu_put_resv_regions(struct device *dev,
   struct list_head *list);
+extern int iommu_set_dev_dma_opt_size(struct device *dev, size_t size);
 extern void iommu_set_default_passthrough(bool cmd_line);
 extern void iommu_set_default_translated(bool cmd_line);
 extern bool iommu_default_passthrough(void);
@@ -740,6 +741,12 @@ static inline int iommu_get_group_resv_regions(struct 
iommu_group *group,
return -ENODEV;
 }
 
+static inline int iommu_set_dev_dma_opt_size(struct device *dev,
+size_t size)
+{
+   return -ENODEV;
+}
+
 static inline void iommu_set_default_passthrough(bool cmd_line)
 {
 }
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 14/15] dma-iommu: Add iommu_dma_set_opt_size()

Add iommu_dma_set_opt_size(), which is a frontend for
iommu_set_dev_dma_opt_size().

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1d58c7a2d85d..fd62afe7c7d0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -490,6 +490,11 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
return (dma_addr_t)iova << shift;
 }
 
+static int iommu_dma_set_opt_size(struct device *dev, size_t size)
+{
+   return iommu_set_dev_dma_opt_size(dev, size);
+}
+
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
dma_addr_t iova, size_t size, struct page *freelist)
 {
@@ -1340,6 +1345,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
+   .set_max_opt_size   = iommu_dma_set_opt_size,
 };
 
 /*
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 11/15] dma-iommu: Reconfig group domain

Call iommu_reconfig_dev_group_dma() from iommu_setup_dma_ops() to reconfig
the group domain, if necessary.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 574d7a901fd2..1d58c7a2d85d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1358,6 +1358,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size)
 * underlying IOMMU driver needs to support via the dma-iommu layer.
 */
if (domain->type == IOMMU_DOMAIN_DMA) {
+   iommu_reconfig_dev_group_dma(dev);
+   /* domain may be stale ... */
+   domain = iommu_get_domain_for_dev(dev);
if (iommu_dma_init_domain(domain, dma_base, size, dev))
goto out_err;
dev->dma_ops = _dma_ops;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 13/15] dma-mapping: Add dma_set_max_opt_size()

Add a function to allow the max size which we want to optimise DMA mappings
for.

Signed-off-by: John Garry 
---
 include/linux/dma-map-ops.h |  1 +
 include/linux/dma-mapping.h |  8 
 kernel/dma/mapping.c| 11 +++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..7f9857da87d8 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,7 @@ struct dma_map_ops {
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
unsigned long (*get_merge_boundary)(struct device *dev);
+   int (*set_max_opt_size)(struct device *dev, size_t size);
 };
 
 #ifdef CONFIG_DMA_OPS
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 183e7103a66d..41681db93580 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -153,6 +153,7 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t 
size,
 void dma_vunmap_noncontiguous(struct device *dev, void *vaddr);
 int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
size_t size, struct sg_table *sgt);
+int dma_set_max_opt_size(struct device *dev, size_t size);
 #else /* CONFIG_HAS_DMA */
 static inline dma_addr_t dma_map_page_attrs(struct device *dev,
struct page *page, size_t offset, size_t size,
@@ -266,6 +267,7 @@ static inline unsigned long dma_get_merge_boundary(struct 
device *dev)
 {
return 0;
 }
+
 static inline struct sg_table *dma_alloc_noncontiguous(struct device *dev,
size_t size, enum dma_data_direction dir, gfp_t gfp,
unsigned long attrs)
@@ -289,6 +291,12 @@ static inline int dma_mmap_noncontiguous(struct device 
*dev,
 {
return -EINVAL;
 }
+
+static inline int dma_set_max_opt_size(struct device *dev, size_t size)
+{
+   return -EINVAL;
+}
+
 #endif /* CONFIG_HAS_DMA */
 
 struct page *dma_alloc_pages(struct device *dev, size_t size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..1e4820b08f7e 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -736,3 +736,14 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+int dma_set_max_opt_size(struct device *dev, size_t size)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   if (!ops || !ops->set_max_opt_size)
+   return 0;
+
+   return ops->set_max_opt_size(dev, size);
+}
+EXPORT_SYMBOL_GPL(dma_set_max_opt_size);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 06/15] iommu: Allow iommu_change_dev_def_domain() realloc default domain for same type

Allow iommu_change_dev_def_domain() to create a new default domain, keeping
the same as current in case type argument is not set.

Also remove comment about function purpose, which will become stale.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 53 ++-
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 62e4491f32e0..f7253a973ab9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3065,16 +3065,13 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
  * @prev_dev: The device in the group (this is used to make sure that the 
device
  *  hasn't changed after the caller has called this function)
  * @type: The type of the new default domain that gets associated with the 
group
+ * @new: Allocate new default domain, keeping same type when no type passed
  *
  * Returns 0 on success and error code on failure
  *
- * Note:
- * 1. Presently, this function is called only when user requests to change the
- *group's default domain type through 
/sys/kernel/iommu_groups//type
- *Please take a closer look if intended to use for other purposes.
  */
 static int iommu_change_dev_def_domain(struct iommu_group *group,
-  struct device *prev_dev, int type)
+  struct device *prev_dev, int type, bool 
new)
 {
struct iommu_domain *prev_dom;
struct group_device *grp_dev;
@@ -3127,28 +3124,32 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
goto out;
}
 
-   dev_def_dom = iommu_get_def_domain_type(dev);
-   if (!type) {
+   if (new && !type) {
+   type = prev_dom->type;
+   } else {
+   dev_def_dom = iommu_get_def_domain_type(dev);
+   if (!type) {
+   /*
+* If the user hasn't requested any specific type of 
domain and
+* if the device supports both the domains, then 
default to the
+* domain the device was booted with
+*/
+   type = dev_def_dom ? : iommu_def_domain_type;
+   } else if (dev_def_dom && type != dev_def_dom) {
+   dev_err_ratelimited(prev_dev, "Device cannot be in %s 
domain\n",
+   iommu_domain_type_str(type));
+   ret = -EINVAL;
+   goto out;
+   }
+
/*
-* If the user hasn't requested any specific type of domain and
-* if the device supports both the domains, then default to the
-* domain the device was booted with
+* Switch to a new domain only if the requested domain type is 
different
+* from the existing default domain type
 */
-   type = dev_def_dom ? : iommu_def_domain_type;
-   } else if (dev_def_dom && type != dev_def_dom) {
-   dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n",
-   iommu_domain_type_str(type));
-   ret = -EINVAL;
-   goto out;
-   }
-
-   /*
-* Switch to a new domain only if the requested domain type is different
-* from the existing default domain type
-*/
-   if (prev_dom->type == type) {
-   ret = 0;
-   goto out;
+   if (prev_dom->type == type) {
+   ret = 0;
+   goto out;
+   }
}
 
/* Sets group->default_domain to the newly allocated domain */
@@ -3292,7 +3293,7 @@ static int iommu_group_store_type_cb(const char *buf,
else
return -EINVAL;
 
-   return iommu_change_dev_def_domain(group, dev, type);
+   return iommu_change_dev_def_domain(group, dev, type, false);
 }
 
 static ssize_t iommu_group_store_type(struct iommu_group *group,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 10/15] dma-iommu: Use init_iova_domain_ext() for IOVA domain init

Pass the max opt iova len to init the IOVA domain, if set.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4fb82c554ede..574d7a901fd2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -358,6 +358,8 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   size_t max_opt_dma_size;
+   unsigned long iova_len;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
return -EINVAL;
@@ -391,7 +393,18 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return 0;
}
 
-   init_iova_domain(iovad, 1UL << order, base_pfn);
+   max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group);
+
+   if (max_opt_dma_size) {
+   unsigned long shift = __ffs(1UL << order);
+
+   iova_len = max_opt_dma_size >> shift;
+   iova_len = roundup_pow_of_two(iova_len);
+   } else {
+   iova_len = 0;
+   }
+
+   init_iova_domain_ext(iovad, 1UL << order, base_pfn, iova_len);
 
if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 05/15] iova: Add iova_domain_len_is_cached()

Add a function to check whether an IOVA domain currently caches a given
upper IOVA len exactly.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 11 +++
 include/linux/iova.h |  8 +++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e4c0e55178a..95892a0433cc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -872,6 +872,17 @@ static void iova_magazine_push(struct iova_magazine *mag, 
unsigned long pfn)
mag->pfns[mag->size++] = pfn;
 }
 
+static unsigned long iova_len_to_rcache_max(unsigned long iova_len)
+{
+   return order_base_2(iova_len) + 1;
+}
+
+/* Test if iova_len range cached upper limit matches that of IOVA domain */
+bool iova_domain_len_is_cached(struct iova_domain *iovad, unsigned long 
iova_len)
+{
+   return iova_len_to_rcache_max(iova_len) == iovad->rcache_max_size;
+}
+
 static void init_iova_rcaches(struct iova_domain *iovad)
 {
struct iova_cpu_rcache *cpu_rcache;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 9974e1d3e2bc..04cc8eb6de38 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -136,7 +136,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
*iovad, dma_addr_t iova)
 #if IS_ENABLED(CONFIG_IOMMU_IOVA)
 int iova_cache_get(void);
 void iova_cache_put(void);
-
+bool iova_domain_len_is_cached(struct iova_domain *iovad,
+  unsigned long iova_len);
 void free_iova(struct iova_domain *iovad, unsigned long pfn);
 void __free_iova(struct iova_domain *iovad, struct iova *iova);
 struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
@@ -158,6 +159,11 @@ int init_iova_flush_queue(struct iova_domain *iovad,
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
 #else
+static inline bool iova_domain_len_is_cached(struct iova_domain *iovad,
+unsigned long iova_len)
+{
+   return false;
+}
 static inline int iova_cache_get(void)
 {
return -ENOTSUPP;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 15/15] scsi: hisi_sas: Set max optimal DMA size for v3 hw

For IOMMU strict mode, gives a big performance boost in some scenarios.

Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 499c770d405c..365fc9de4be2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -4675,6 +4675,11 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
struct sas_ha_struct *sha;
int rc, phy_nr, port_nr, i;
 
+   rc = dma_set_max_opt_size(dev, PAGE_SIZE * HISI_SAS_SGE_PAGE_CNT);
+   /* We can live with other errors */
+   if (rc == -EPROBE_DEFER)
+   return rc;
+
rc = pci_enable_device(pdev);
if (rc)
goto err_out;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 02/15] swiotlb: Refactor swiotlb_create_debugfs

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 01/15] swiotlb: Refactor swiotlb init functions

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/15] swiotlb: Bounce data from/to restricted DMA pool if available

> +static inline bool is_dev_swiotlb_force(struct device *dev)
> +{
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> + if (dev->dma_io_tlb_mem)
> + return true;
> +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> + return false;
> +}
> +

>   /* If SWIOTLB is active, use its maximum mapping size */
>   if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> + (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
> +  is_dev_swiotlb_force(dev)))

This is a mess.  I think the right way is to have an always_bounce flag
in the io_tlb_mem structure instead.  Then the global swiotlb_force can
go away and be replace with this and the fact that having no
io_tlb_mem structure at all means forced no buffering (after a little
refactoring).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH] iova: __init_iova_domain can be static

2021-05-10 Thread kernel test robot

drivers/iommu/iova.c:50:1: warning: symbol '__init_iova_domain' was not 
declared. Should it be static?

Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---
 iova.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 273a689006c36..ae4901073a98a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -46,7 +46,7 @@ static struct iova *to_iova(struct rb_node *node)
return rb_entry(node, struct iova, node);
 }
 
-void
+static void
 __init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn, unsigned long iova_len)
 {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 09/15] iova: Add init_iova_domain_ext()

2021-05-10 Thread kernel test robot

Hi John,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.13-rc1 next-20210510]
[cannot apply to iommu/next mkp-scsi/for-next scsi/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/John-Garry/dma-mapping-iommu-Allow-IOMMU-IOVA-rcache-range-to-be-configured/20210510-222805
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
6efb943b8616ec53a5e444193dccf1af9ad627b5
config: i386-randconfig-s002-20210510 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# 
https://github.com/0day-ci/linux/commit/d9a8e3a4ac0070cd3094af3b41e9c6277faf3ea6
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
John-Garry/dma-mapping-iommu-Allow-IOMMU-IOVA-rcache-range-to-be-configured/20210510-222805
git checkout d9a8e3a4ac0070cd3094af3b41e9c6277faf3ea6
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/iommu/iova.c:50:1: warning: no previous prototype for 
>> '__init_iova_domain' [-Wmissing-prototypes]
  50 | __init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 | ^~


sparse warnings: (new ones prefixed by >>)
>> drivers/iommu/iova.c:50:1: sparse: sparse: symbol '__init_iova_domain' was 
>> not declared. Should it be static?

Please review and possibly fold the followup patch.

vim +/__init_iova_domain +50 drivers/iommu/iova.c

48  
49  void
  > 50  __init_iova_domain(struct iova_domain *iovad, unsigned long granule,
51  unsigned long start_pfn, unsigned long iova_len)
52  {
53  /*
54   * IOVA granularity will normally be equal to the smallest
55   * supported IOMMU page size; both *must* be capable of
56   * representing individual CPU pages exactly.
57   */
58  BUG_ON((granule > PAGE_SIZE) || !is_power_of_2(granule));
59  
60  spin_lock_init(>iova_rbtree_lock);
61  iovad->rbroot = RB_ROOT;
62  iovad->cached_node = >anchor.node;
63  iovad->cached32_node = >anchor.node;
64  iovad->granule = granule;
65  iovad->start_pfn = start_pfn;
66  iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
67  iovad->max32_alloc_size = iovad->dma_32bit_pfn;
68  iovad->flush_cb = NULL;
69  iovad->fq = NULL;
70  iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
71  rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
72  rb_insert_color(>anchor.node, >rbroot);
73  cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
>cpuhp_dead);
74  init_iova_rcaches(iovad, iova_len);
75  }
76  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/6] vfio: remove the unused mdev iommu hook

On Mon, May 10, 2021 at 08:54:02AM +0200, Christoph Hellwig wrote:
> The iommu_device field in struct mdev_device has never been used
> since it was added more than 2 years ago.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 132 ++--
>  include/linux/mdev.h|  20 -
>  2 files changed, 25 insertions(+), 127 deletions(-)

I asked Intel folks to deal with this a month ago:

https://lore.kernel.org/kvm/20210406200030.ga425...@nvidia.com/

So lets just remove it, it is clearly a bad idea

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-10 Thread Raj, Ashok

On Mon, May 10, 2021 at 12:31:11PM -0300, Jason Gunthorpe wrote:
> On Mon, May 10, 2021 at 08:25:02AM -0700, Raj, Ashok wrote:
> 
> > Global PASID doesn't break anything, but giving that control to vIOMMU
> > doesn't seem right. When we have mixed uses cases like hardware that
> > supports shared wq and SRIOV devices that need PASIDs we need to 
> > comprehend how they will work without having a backend to migrate PASIDs 
> > to new destination.
> 
> Why wouldn't there be a backend? SRIOV live migration is a real thing
> now (see Max's VFIO patches). The PASID space of the entire dedicated
> RID needs to be migratable, which means the destination vIOMMU must be
> able to program its local hardware with the same PASID numbers and any
> kind of global PASID scheme at all will interfere with it.

The way I'm imagining it works is as follows. We have 2 types of platforms.
Let me know if i missed something.

- no shared wq, meaning RID local PASID allocation is all that's needed.
  Say platform type p1
- Shared WQ configurations that require global PASIDs.
  Say platform type p2

vIOMMU might have a capability to indicate if vIOMMU has a PASID allocation
capability. If there is none, guest is free to obtain its own PASID per
RID since they are fully isolated. For p1 type platforms this should work.
Since there is no Qemu interface even required or /dev/iommu for that
instance. Guest kernel can manage it fully per-RID based.

Platform type p2 that has both SIOV (with enqcmd) and SRIOV that requires
PASID. My thought was to reserve say some number of PASID's for per-RID
use. When you request a RID local you get PASID in that range. When you ask
for global, you get in the upper PASID range. 

Say 0-4k is reserved for any RID local allocation. This is also the guest
PASID range.  4k->2^19 are for shared WQ. (I'm not implying the size, but 
just for discussion sake that we need a separation.)

Those vIOMMU's will have a capability that it supports PASID allocation
interface. When you allocate you can say what type of PASID you need
(shared vs local) and Qemu will obtain either within the local range, or in
the shared range. When they are allocated in the shared range, you still
end up using one in guest PASID range, but mapped to a different
host-pasid using the VMCS like PASID redirection per-guest (not-perRID).

Only the shared allocation requires /dev/iommu interface. All allocation in
the guest range is fully in Qemu control.

For supporting migration, the target also needs to have this capability for
migration. 
> 
> > When we have both SRIOV and shared WQ exposed to the same guest, we 
> > do have an issue. The simplest way that I thought was to have a guest 
> > and host PASID separation.  Where the guest has its own PASID  space 
> > and host has its own carved out.  Guest can do what ever it wants within 
> > that allocated space without fear of any collition with any other device. 
> 
> And how do you reliably migrate if the target kernel has a PASID
> already allocated in that range?

For any shared range, remember there is a mapping table. And since those
ranges are always reserved in the new host it isn't a problem. For shared
WQ that requires a PASID remapping to new host PASID, the VMCS remapping
per guest that does gPASID->hPASID does this job. So the guest PASID
remains unchanged. 

Does this make sense? 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs