Re: [PATCH] iommu/arm-smmu-v3: add nr_ats_masters to avoid unnecessary operations

2019-08-14 Thread Leizhen (ThunderTown)



On 2019/8/14 19:14, Will Deacon wrote:
> Hi,
> 
> I've been struggling with the memory ordering requirements here. More below.
> 
> On Thu, Aug 01, 2019 at 08:20:40PM +0800, Zhen Lei wrote:
>> When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a
>> smmu domain does not contain any ats master, the operations of
>> arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain()
>> are always executed. This will impact performance, especially in
>> multi-core and stress scenarios. For my FIO test scenario, about 8%
>> performance reduced.
>>
>> In fact, we can use a atomic member to record how many ats masters the
>> smmu contains. And check that without traverse the list and check all
>> masters one by one in the lock protection.
>>
>> Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS")
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index a9a9fabd396804a..1b370d9aca95f94 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -631,6 +631,7 @@ struct arm_smmu_domain {
>>  
>>  struct io_pgtable_ops   *pgtbl_ops;
>>  boolnon_strict;
>> +atomic_tnr_ats_masters;
>>  
>>  enum arm_smmu_domain_stage  stage;
>>  union {
>> @@ -1531,7 +1532,7 @@ static int arm_smmu_atc_inv_domain(struct 
>> arm_smmu_domain *smmu_domain,
>>  struct arm_smmu_cmdq_ent cmd;
>>  struct arm_smmu_master *master;
>>  
>> -if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
>> +if (!atomic_read(_domain->nr_ats_masters))
>>  return 0;
> 
> This feels wrong to me: the CPU can speculate ahead of time that
> 'nr_ats_masters' is 0, but we could have a concurrent call to '->attach()'
> for an ATS-enabled device. Wouldn't it then be possible for the new device
> to populate its ATC as a result of speculative accesses for the mapping that
> we're tearing down?
> 
> The devices lock solves this problem by serialising invalidation with
> '->attach()/->detach()' operations.
> 
> John's suggestion of RCU might work better, but I think you'll need to call
> synchronize_rcu() between adding yourself to the 'devices' list and enabling
> ATS.
> 
> What do you think?

I have updated my patch and just sent, below it's my thoughts.

-   if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
+   /*
+* The protectiom of spinlock(_domain->devices_lock) is omitted.
+* Because for a given master, its map/unmap operations should only be
+* happened after it has been attached and before it has been detached.
+* So that, if at least one master need to be atc invalidated, the
+* value of smmu_domain->nr_ats_masters can not be zero.
+*
+* This can alleviate performance loss in multi-core scenarios.
+*/
+   if (!smmu_domain->nr_ats_masters)

> 
>>  arm_smmu_atc_inv_to_cmd(ssid, iova, size, );
>> @@ -1869,6 +1870,7 @@ static int arm_smmu_enable_ats(struct arm_smmu_master 
>> *master)
>>  size_t stu;
>>  struct pci_dev *pdev;
>>  struct arm_smmu_device *smmu = master->smmu;
>> +struct arm_smmu_domain *smmu_domain = master->domain;
>>  struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>>  
>>  if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
>> @@ -1887,12 +1889,15 @@ static int arm_smmu_enable_ats(struct 
>> arm_smmu_master *master)
>>  return ret;
>>  
>>  master->ats_enabled = true;
>> +atomic_inc(_domain->nr_ats_masters);
> 
> Here, we need to make sure that concurrent invalidation sees the updated
> 'nr_ats_masters' value before ATS is enabled for the device, otherwise we
> could miss an ATC invalidation.
> 
> I think the code above gets this guarantee because of the way that ATS is
> enabled in the STE, which ensures that we issue invalidation commands before
> making the STE 'live'; this has the side-effect of a write barrier before
> updating PROD, which I think we also rely on for installing the CD pointer.
> 
> Put another way: writes are ordered before a subsequent command insertion.
> 
> Do you agree? If so, I'll add a comment because this is subtle and easily
> overlooked.
> 
>>  static void arm_smmu_disable_ats(struct arm_smmu_master *master)
>>  {
>>  struct arm_smmu_cmdq_ent cmd;
>> +struct arm_smmu_domain *smmu_domain = master->domain;
>>  
>>  if (!master->ats_enabled || !dev_is_pci(master->dev))
>>  return;
>> @@ -1901,6 +1906,7 @@ static void arm_smmu_disable_ats(struct 
>> arm_smmu_master *master)
>>  arm_smmu_atc_inv_master(master, );
>>  pci_disable_ats(to_pci_dev(master->dev));
>>  master->ats_enabled = false;
>> +atomic_dec(_domain->nr_ats_masters);
> 
> This part is the other way around: 

[PATCH v2 0/2] add nr_ats_masters for quickly check

2019-08-14 Thread Zhen Lei
v1 --> v2:
1. change the type of nr_ats_masters from atomic_t to int, and move its
   ind/dec operation from arm_smmu_enable_ats()/arm_smmu_disable_ats() to
   arm_smmu_attach_dev()/arm_smmu_detach_dev(), and protected by
   "spin_lock_irqsave(_domain->devices_lock, flags);"

Zhen Lei (2):
  iommu/arm-smmu-v3: don't add a master into smmu_domain before it's
ready
  iommu/arm-smmu-v3: add nr_ats_masters for quickly check

 drivers/iommu/arm-smmu-v3.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
1.8.3




[PATCH v2 2/2] iommu/arm-smmu-v3: add nr_ats_masters for quickly check

2019-08-14 Thread Zhen Lei
When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a
smmu domain does not contain any ats master, the operations of
arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain()
are always executed. This will impact performance, especially in
multi-core and stress scenarios. For my FIO test scenario, about 8%
performance reduced.

In fact, we can use a struct member to record how many ats masters that
the smmu contains. And check that without traverse the list and check all
masters one by one in the lock protection.

Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS")
Signed-off-by: Zhen Lei 
---
 drivers/iommu/arm-smmu-v3.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 29056d9bb12aa01..154334d3310c9b8 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -631,6 +631,7 @@ struct arm_smmu_domain {
 
struct io_pgtable_ops   *pgtbl_ops;
boolnon_strict;
+   int nr_ats_masters;
 
enum arm_smmu_domain_stage  stage;
union {
@@ -1531,7 +1532,16 @@ static int arm_smmu_atc_inv_domain(struct 
arm_smmu_domain *smmu_domain,
struct arm_smmu_cmdq_ent cmd;
struct arm_smmu_master *master;
 
-   if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
+   /*
+* The protectiom of spinlock(_domain->devices_lock) is omitted.
+* Because for a given master, its map/unmap operations should only be
+* happened after it has been attached and before it has been detached.
+* So that, if at least one master need to be atc invalidated, the
+* value of smmu_domain->nr_ats_masters can not be zero.
+*
+* This can alleviate performance loss in multi-core scenarios.
+*/
+   if (!smmu_domain->nr_ats_masters)
return 0;
 
arm_smmu_atc_inv_to_cmd(ssid, iova, size, );
@@ -1913,6 +1923,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master 
*master)
 
spin_lock_irqsave(_domain->devices_lock, flags);
list_del(>domain_head);
+   smmu_domain->nr_ats_masters--;
spin_unlock_irqrestore(_domain->devices_lock, flags);
 
master->domain = NULL;
@@ -1968,6 +1979,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
spin_lock_irqsave(_domain->devices_lock, flags);
list_add(>domain_head, _domain->devices);
+   smmu_domain->nr_ats_masters++;
spin_unlock_irqrestore(_domain->devices_lock, flags);
 out_unlock:
mutex_unlock(_domain->init_mutex);
-- 
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] iommu/arm-smmu-v3: don't add a master into smmu_domain before it's ready

2019-08-14 Thread Zhen Lei
Once a master has been added into smmu_domain->devices, it may immediately
be scaned in arm_smmu_unmap()-->arm_smmu_atc_inv_domain(). From a logical
point of view, the master should be added into smmu_domain after it has
been completely initialized.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/arm-smmu-v3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a9a9fabd396804a..29056d9bb12aa01 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1958,10 +1958,6 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
master->domain = smmu_domain;
 
-   spin_lock_irqsave(_domain->devices_lock, flags);
-   list_add(>domain_head, _domain->devices);
-   spin_unlock_irqrestore(_domain->devices_lock, flags);
-
if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
arm_smmu_enable_ats(master);
 
@@ -1969,6 +1965,10 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
arm_smmu_write_ctx_desc(smmu, _domain->s1_cfg);
 
arm_smmu_install_ste_for_dev(master);
+
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_add(>domain_head, _domain->devices);
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
 out_unlock:
mutex_unlock(_domain->init_mutex);
return ret;
-- 
1.8.3




Re: [PATCH 07/10] iommu: Print default domain type on boot

2019-08-14 Thread Lu Baolu

Hi,

On 8/14/19 9:38 PM, Joerg Roedel wrote:

From: Joerg Roedel 

Introduce a subsys_initcall for IOMMU code and use it to
print the default domain type at boot.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 30 +-
  1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e1feb4061b8b..233bc22b487e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -93,12 +93,40 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
  static LIST_HEAD(iommu_device_list);
  static DEFINE_SPINLOCK(iommu_device_lock);
  
+/*

+ * Use a function instead of an array here because the domain-type is a
+ * bit-field, so an array would waste memory.
+ */
+static const char *iommu_domain_type_str(unsigned int t)
+{
+   switch (t) {
+   case IOMMU_DOMAIN_BLOCKED:
+   return "Blocked";
+   case IOMMU_DOMAIN_IDENTITY:
+   return "Passthrough";
+   case IOMMU_DOMAIN_UNMANAGED:
+   return "Unmanaged";
+   case IOMMU_DOMAIN_DMA:
+   return "Translated";
+   default:
+   return "Unknown";
+   }
+}


Run scripts/checkpatch.pl:

ERROR: switch and case should be at the same indent
#28: FILE: drivers/iommu/iommu.c:102:
+   switch (t) {
+   case IOMMU_DOMAIN_BLOCKED:
[...]
+   case IOMMU_DOMAIN_IDENTITY:
[...]
+   case IOMMU_DOMAIN_UNMANAGED:
[...]
+   case IOMMU_DOMAIN_DMA:
[...]
+   default:

Best regards,
Lu Baolu


+
+static int __init iommu_subsys_init(void)
+{
+   pr_info("Default domain type: %s\n",
+   iommu_domain_type_str(iommu_def_domain_type));
+
+   return 0;
+}
+subsys_initcall(iommu_subsys_init);
+
  int iommu_device_register(struct iommu_device *iommu)
  {
spin_lock(_device_lock);
list_add_tail(>list, _device_list);
spin_unlock(_device_lock);
-
return 0;
  }
  



Re: [PATCH 06/10] iommu: Remember when default domain type was set on kernel command line

2019-08-14 Thread Lu Baolu

Hi,

On 8/14/19 9:38 PM, Joerg Roedel wrote:

From: Joerg Roedel 

Introduce an extensible concept to remember when certain
configuration settings for the IOMMU code have been set on
the kernel command line.

This will be used later to prevent overwriting these
settings with other defaults.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f187e85a074b..e1feb4061b8b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@ static unsigned int iommu_def_domain_type = 
IOMMU_DOMAIN_IDENTITY;
  static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
  #endif
  static bool iommu_dma_strict __read_mostly = true;
+static u32 iommu_cmd_line __read_mostly;
  
  struct iommu_group {

struct kobject kobj;
@@ -68,6 +69,18 @@ static const char * const iommu_group_resv_type_string[] = {
[IOMMU_RESV_SW_MSI] = "msi",
  };
  
+#define IOMMU_CMD_LINE_DMA_API		(1 << 0)


Prefer BIT() micro?


+
+static void iommu_set_cmd_line_dma_api(void)
+{
+   iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
+}
+
+static bool __maybe_unused iommu_cmd_line_dma_api(void)
+{
+   return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
+}
+
  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \
  struct iommu_group_attribute iommu_group_attr_##_name =   \
__ATTR(_name, _mode, _show, _store)
@@ -165,6 +178,8 @@ static int __init iommu_set_def_domain_type(char *str)
if (ret)
return ret;
  
+	iommu_set_cmd_line_dma_api();


IOMMU command line is also set in other places, for example,
iommu_setup() (arch/x86/kernel/pci-dma.c). Need to call this there as
well?

Best regards,
Lu Baolu


+
iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA;
return 0;
  }



Re: [PATCH 04/10] x86/dma: Get rid of iommu_pass_through

2019-08-14 Thread Lu Baolu

Hi,

On 8/14/19 9:38 PM, Joerg Roedel wrote:

From: Joerg Roedel 

This variable has no users anymore. Remove it and tell the
IOMMU code via its new functions about requested DMA modes.

Signed-off-by: Joerg Roedel 


This will also simplify the procedures in iommu_probe_device() on x86
platforms.

Reviewed-by: Lu Baolu 


---
  arch/x86/include/asm/iommu.h |  1 -
  arch/x86/kernel/pci-dma.c| 11 +++
  2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index baedab8ac538..b91623d521d9 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -4,7 +4,6 @@
  
  extern int force_iommu, no_iommu;

  extern int iommu_detected;
-extern int iommu_pass_through;
  
  /* 10 seconds */

  #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index f62b498b18fb..a6fd479d4a71 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,6 +1,7 @@
  // SPDX-License-Identifier: GPL-2.0
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -43,12 +44,6 @@ int iommu_detected __read_mostly = 0;
   * It is also possible to disable by default in kernel config, and enable with
   * iommu=nopt at boot time.
   */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
-
  extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
  
  void __init pci_iommu_alloc(void)

@@ -120,9 +115,9 @@ static __init int iommu_setup(char *p)
swiotlb = 1;
  #endif
if (!strncmp(p, "pt", 2))
-   iommu_pass_through = 1;
+   iommu_set_default_passthrough();
if (!strncmp(p, "nopt", 4))
-   iommu_pass_through = 0;
+   iommu_set_default_translated();
  
  		gart_parse_options(p);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/10] iommu/vt-d: Request passthrough mode from IOMMU core

2019-08-14 Thread Lu Baolu

Hi Joerg,

On 8/14/19 9:38 PM, Joerg Roedel wrote:

From: Joerg Roedel 

Get rid of the iommu_pass_through variable and request
passthrough mode via the new iommu core function.

Signed-off-by: Joerg Roedel 


Looks good to me.

Reviewed-by: Lu Baolu 


---
  drivers/iommu/intel-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bdaed2da8a55..234bc2b55c59 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3267,7 +3267,7 @@ static int __init init_dmars(void)
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
}
  
-	if (iommu_pass_through)

+   if (iommu_default_passthrough())
iommu_identity_mapping |= IDENTMAP_ALL;
  
  #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA




Re: [PATCH v3 hmm 03/11] mm/mmu_notifiers: add a get/put scheme for the registration

2019-08-14 Thread Jason Gunthorpe
On Wed, Aug 14, 2019 at 02:20:31PM -0700, Ralph Campbell wrote:
> 
> On 8/6/19 4:15 PM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe 
> > 
> > Many places in the kernel have a flow where userspace will create some
> > object and that object will need to connect to the subsystem's
> > mmu_notifier subscription for the duration of its lifetime.
> > 
> > In this case the subsystem is usually tracking multiple mm_structs and it
> > is difficult to keep track of what struct mmu_notifier's have been
> > allocated for what mm's.
> > 
> > Since this has been open coded in a variety of exciting ways, provide core
> > functionality to do this safely.
> > 
> > This approach uses the strct mmu_notifier_ops * as a key to determine if
> 
> s/strct/struct

Yes, thanks for all of this, I like having comments, but I'm a
terrible proofreader :(

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 00/11] Add mmu_notifier_get/put for managing mmu notifier registrations

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This series introduces a new registration flow for mmu_notifiers based on
the idea that the user would like to get a single refcounted piece of
memory for a mm, keyed to its use.

For instance many users of mmu_notifiers use an interval tree or similar
to dispatch notifications to some object. There are many objects but only
one notifier subscription per mm holding the tree.

Of the 12 places that call mmu_notifier_register:
  - 7 are maintaining some kind of obvious mapping of mm_struct to
mmu_notifier registration, ie in some linked list or hash table. Of
the 7 this series converts 4 (gru, hmm, RDMA, radeon)

  - 3 (hfi1, gntdev, vhost) are registering multiple notifiers, but each
one immediately does some VA range filtering, ie with an interval tree.
These would be better with a global subsystem-wide range filter and
could convert to this API.

  - 2 (kvm, amd_iommu) are deliberately using a single mm at a time, and
really can't use this API. One of the intel-svm's modes is also in this
list

The 3/7 unconverted drivers are:
  - intel-svm
This driver tracks mm's in a global linked list 'global_svm_list'
and would benefit from this API.

Its flow is a bit complex, since it also wants a set of non-shared
notifiers.

  - i915_gem_usrptr
This driver tracks mm's in a per-device hash
table (dev_priv->mm_structs), but only has an optional use of
mmu_notifiers.  Since it still seems to need the hash table it is
difficult to convert.

  - amdkfd/kfd_process
This driver is using a global SRCU hash table to track mm's

The control flow here is very complicated and the driver is relying on
this hash table to be fast on the ioctl syscall path.

It would definitely benefit, but only if the ioctl path didn't need to
do the search so often.

This series is already entangled with patches in the hmm & RDMA tree and
will require some git topic branches for the RDMA ODP stuff. I intend for
it to go through the hmm tree.

There is a git version here:

https://github.com/jgunthorpe/linux/commits/mmu_notifier

Which has the required pre-patches for the RDMA ODP conversion that are
still being reviewed.

Jason Gunthorpe (11):
   mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the
 caller
   mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm
   mm/mmu_notifiers: add a get/put scheme for the registration
   misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
   hmm: use mmu_notifier_get/put for 'struct hmm'
   RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
   RDMA/odp: remove ib_ucontext from ib_umem
   drm/radeon: use mmu_notifier_get/put for struct radeon_mn
   drm/amdkfd: fix a use after free race with mmu_notifer unregister
   drm/amdkfd: use mmu_notifier_put
   mm/mmu_notifiers: remove unregister_no_release

  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   3 -
  drivers/gpu/drm/amd/amdkfd/kfd_process.c |  88 -
  drivers/gpu/drm/nouveau/nouveau_drm.c|   3 +
  drivers/gpu/drm/radeon/radeon.h  |   3 -
  drivers/gpu/drm/radeon/radeon_device.c   |   2 -
  drivers/gpu/drm/radeon/radeon_drv.c  |   2 +
  drivers/gpu/drm/radeon/radeon_mn.c   | 157 
  drivers/infiniband/core/umem.c   |   4 +-
  drivers/infiniband/core/umem_odp.c   | 183 ++
  drivers/infiniband/core/uverbs_cmd.c |   3 -
  drivers/infiniband/core/uverbs_main.c|   1 +
  drivers/infiniband/hw/mlx5/main.c|   5 -
  drivers/misc/sgi-gru/grufile.c   |   1 +
  drivers/misc/sgi-gru/grutables.h |   2 -
  drivers/misc/sgi-gru/grutlbpurge.c   |  84 +++--
  include/linux/hmm.h  |  12 +-
  include/linux/mm_types.h |   6 -
  include/linux/mmu_notifier.h |  40 +++-
  include/rdma/ib_umem.h   |   2 +-
  include/rdma/ib_umem_odp.h   |  10 +-
  include/rdma/ib_verbs.h  |   3 -
  kernel/fork.c|   1 -
  mm/hmm.c | 121 +++-
  mm/mmu_notifier.c| 230 +--
  25 files changed, 408 insertions(+), 559 deletions(-)


For the core MM, HMM, and nouveau changes you can add:
Tested-by: Ralph Campbell 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 11/11] mm/mmu_notifiers: remove unregister_no_release

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no
longer have any users, they have all been converted to use
mmu_notifier_put().

So delete this difficult to use interface.

Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 05/11] hmm: use mmu_notifier_get/put for 'struct hmm'

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.

mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.

It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.

The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.

Signed-off-by: Jason Gunthorpe 


Looks good.
Reviewed-by: Ralph Campbell 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 03/11] mm/mmu_notifiers: add a get/put scheme for the registration

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

Many places in the kernel have a flow where userspace will create some
object and that object will need to connect to the subsystem's
mmu_notifier subscription for the duration of its lifetime.

In this case the subsystem is usually tracking multiple mm_structs and it
is difficult to keep track of what struct mmu_notifier's have been
allocated for what mm's.

Since this has been open coded in a variety of exciting ways, provide core
functionality to do this safely.

This approach uses the strct mmu_notifier_ops * as a key to determine if


s/strct/struct


the subsystem has a notifier registered on the mm or not. If there is a
registration then the existing notifier struct is returned, otherwise the
ops->alloc_notifiers() is used to create a new per-subsystem notifier for
the mm.

The destroy side incorporates an async call_srcu based destruction which
will avoid bugs in the callers such as commit 6d7c3cde93c1 ("mm/hmm: fix
use after free with struct hmm in the mmu notifiers").

Since we are inside the mmu notifier core locking is fairly simple, the
allocation uses the same approach as for mmu_notifier_mm, the write side
of the mmap_sem makes everything deterministic and we only need to do
hlist_add_head_rcu() under the mm_take_all_locks(). The new users count
and the discoverability in the hlist is fully serialized by the
mmu_notifier_mm->lock.

Co-developed-by: Christoph Hellwig 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Jason Gunthorpe 


Reviewed-by: Ralph Campbell 


---
  include/linux/mmu_notifier.h |  35 
  mm/mmu_notifier.c| 156 +--
  2 files changed, 185 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b6c004bd9f6ad9..31aa971315a142 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -211,6 +211,19 @@ struct mmu_notifier_ops {
 */
void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm,
 unsigned long start, unsigned long end);
+
+   /*
+* These callbacks are used with the get/put interface to manage the
+* lifetime of the mmu_notifier memory. alloc_notifier() returns a new
+* notifier for use with the mm.
+*
+* free_notifier() is only called after the mmu_notifier has been
+* fully put, calls to any ops callback are prevented and no ops
+* callbacks are currently running. It is called from a SRCU callback
+* and cannot sleep.
+*/
+   struct mmu_notifier *(*alloc_notifier)(struct mm_struct *mm);
+   void (*free_notifier)(struct mmu_notifier *mn);
  };
  
  /*

@@ -227,6 +240,9 @@ struct mmu_notifier_ops {
  struct mmu_notifier {
struct hlist_node hlist;
const struct mmu_notifier_ops *ops;
+   struct mm_struct *mm;
+   struct rcu_head rcu;
+   unsigned int users;
  };
  
  static inline int mm_has_notifiers(struct mm_struct *mm)

@@ -234,6 +250,21 @@ static inline int mm_has_notifiers(struct mm_struct *mm)
return unlikely(mm->mmu_notifier_mm);
  }
  
+struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops,

+struct mm_struct *mm);
+static inline struct mmu_notifier *
+mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm)
+{
+   struct mmu_notifier *ret;
+
+   down_write(>mmap_sem);
+   ret = mmu_notifier_get_locked(ops, mm);
+   up_write(>mmap_sem);
+   return ret;
+}
+void mmu_notifier_put(struct mmu_notifier *mn);
+void mmu_notifier_synchronize(void);
+
  extern int mmu_notifier_register(struct mmu_notifier *mn,
 struct mm_struct *mm);
  extern int __mmu_notifier_register(struct mmu_notifier *mn,
@@ -581,6 +612,10 @@ static inline void mmu_notifier_mm_destroy(struct 
mm_struct *mm)
  #define pudp_huge_clear_flush_notify pudp_huge_clear_flush
  #define set_pte_at_notify set_pte_at
  
+static inline void mmu_notifier_synchronize(void)

+{
+}
+
  #endif /* CONFIG_MMU_NOTIFIER */
  
  #endif /* _LINUX_MMU_NOTIFIER_H */

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 696810f632ade1..4a770b5211b71d 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -248,6 +248,9 @@ int __mmu_notifier_register(struct mmu_notifier *mn, struct 
mm_struct *mm)
lockdep_assert_held_write(>mmap_sem);
BUG_ON(atomic_read(>mm_users) <= 0);
  
+	mn->mm = mm;

+   mn->users = 1;
+
if (!mm->mmu_notifier_mm) {
/*
 * kmalloc cannot be called under mm_take_all_locks(), but we
@@ -295,18 +298,24 @@ int __mmu_notifier_register(struct mmu_notifier *mn, 
struct mm_struct *mm)
  }
  EXPORT_SYMBOL_GPL(__mmu_notifier_register);
  
-/*

+/**
+ * mmu_notifier_register - Register a notifier on a mm
+ * @mn: The notifier to attach
+ * 

Re: [PATCH v3 hmm 02/11] mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

A prior commit e0f3c3f78da2 ("mm/mmu_notifier: init notifier if necessary")
made an attempt at doing this, but had to be reverted as calling
the GFP_KERNEL allocator under the i_mmap_mutex causes deadlock, see
commit 35cfa2b0b491 ("mm/mmu_notifier: allocate mmu_notifier in advance").

However, we can avoid that problem by doing the allocation only under
the mmap_sem, which is already happening.

Since all writers to mm->mmu_notifier_mm hold the write side of the
mmap_sem reading it under that sem is deterministic and we can use that to
decide if the allocation path is required, without speculation.

The actual update to mmu_notifier_mm must still be done under the
mm_take_all_locks() to ensure read-side coherency.

Signed-off-by: Jason Gunthorpe 


Looks good to me.
Reviewed-by: Ralph Campbell 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 01/11] mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the caller

2019-08-14 Thread Ralph Campbell



On 8/6/19 4:15 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This simplifies the code to not have so many one line functions and extra
logic. __mmu_notifier_register() simply becomes the entry point to
register the notifier, and the other one calls it under lock.

Also add a lockdep_assert to check that the callers are holding the lock
as expected.

Suggested-by: Christoph Hellwig 
Signed-off-by: Jason Gunthorpe 


Nice clean up.
Reviewed-by: Ralph Campbell 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types

2019-08-14 Thread Jacob Pan
On Thu, 18 Jul 2019 10:35:37 +0200
Auger Eric  wrote:

> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dmar.c| 50
> > +
> > include/linux/intel-iommu.h | 21 +++ 2 files
> > changed, 67 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 6d969a1..0cda6fb 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(, iommu);
> >  }
> >  
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > +   unsigned int size_order, u64 granu, int ih)
> > +{
> > +   struct qi_desc desc;  
> nit: you could also init to {};
> > +
> > +   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > +   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > +   desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > +   QI_EIOTLB_AM(size_order);
> > +   desc.qw2 = 0;
> > +   desc.qw3 = 0;
> > +   qi_submit_sync(, iommu);
> > +}
> > +
> >  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> >  {
> > @@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(, iommu);
> >  }
> >  
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > +   u32 pasid,  u16 qdep, u64 addr, unsigned size, u64
> > granu)  
> s/size/size_order
> > +{
> > +   struct qi_desc desc;
> > +
> > +   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > +   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > +   QI_DEV_IOTLB_PFSID(pfsid);  
> maybe add a comment to remind MIP hint is sent to 0 as of now.
> > +   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > +   /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > +* The least significant zero bit indicates the size. VT-d
> > spec
> > +* 6.5.2.6
> > +*/
> > +   if (!size)
> > +   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;  
> this is qw1.
my mistake.
> > +   else {
> > +   unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size);  
> don't you miss a "- 1 " here?
your are right
> > +
> > +   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;  
> desc.qw1 |= addr & ~mask | QI_DEV_EIOTLB_SIZE;
> ie. I don't think QI_DEV_EIOTLB_ADDR is useful here?
> 
> 
> So won't the following lines do the job?
> 
> unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size) -1;
> desc.qw1 |= addr & ~mask;
> if (size)
> desc.qw1 |= QI_DEV_EIOTLB_SIZE
that would work too, and simpler. thanks for the suggestion. will
change.
> > +   }
> > +   qi_submit_sync(, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > +   struct qi_desc desc;
> > +
> > +   desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu)
> > | QI_PC_PASID(pasid);  
> nit: reorder the fields according to the spec, easier to check if any
> is missing.
sounds good.

> > +   desc.qw1 = 0;
> > +   desc.qw2 = 0;
> > +   desc.qw3 = 0;
> > +   qi_submit_sync(, iommu);
> > +}
> >  /*
> >   * Disable Queued Invalidation interface.
> >   */
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 94d3a9a..1cdb35b 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -339,7 +339,7 @@ enum {
> >  #define QI_IOTLB_GRAN(gran)(((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am) (((u8)am)) +#define
> > QI_IOTLB_AM(am) (((u8)am) & 0x3f) 
> >  #define QI_CC_FM(fm)   (((u64)fm) << 48)
> >  #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -357,17 +357,22 @@ enum {
> >  #define QI_PC_DID(did) (((u64)did) << 16)
> >  #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
> >  
> > -#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS   0
> > +#define QI_PC_PASID_SEL1
> >  
> >  #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
> >  #define QI_EIOTLB_GL(gl)   (((u64)gl) << 7)
> >  #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
> 

Re: [GIT PULL] dma mapping fixes for 5.3-rc

2019-08-14 Thread pr-tracker-bot
The pull request you sent on Wed, 14 Aug 2019 16:12:17 +0200:

> git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e83b009c5c366b678c7986fa6c1d38fed06c954c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] IOMMU Fixes for Linux v5.3-rc4

2019-08-14 Thread pr-tracker-bot
The pull request you sent on Wed, 14 Aug 2019 16:09:09 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-fixes-v5.3-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b5e33e44d994bb03c75f1901d47b1cf971f752a0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/15] iommu/arm-smmu: Rework cb_base handling

2019-08-14 Thread Will Deacon
On Fri, Aug 09, 2019 at 06:07:41PM +0100, Robin Murphy wrote:
> To keep register-access quirks manageable, we want to structure things
> to avoid needing too many individual overrides. It seems fairly clean to
> have a single interface which handles both global and context registers
> in terms of the architectural pages, so the first preparatory step is to
> rework cb_base into a page number rather than an absolute address.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index d9a93e5f422f..463bc8d98adb 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -95,7 +95,7 @@
>  #endif
>  
>  /* Translation context bank */
> -#define ARM_SMMU_CB(smmu, n) ((smmu)->cb_base + ((n) << (smmu)->pgshift))
> +#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << 
> (smmu)->pgshift))
>  
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
> @@ -168,8 +168,8 @@ struct arm_smmu_device {
>   struct device   *dev;
>  
>   void __iomem*base;
> - void __iomem*cb_base;
> - unsigned long   pgshift;
> + unsigned intcb_base;

I think this is now a misnomer. Would you be able to rename it cb_pfn or
something, please?

Otherwise, this seems fine.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/13] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()

2019-08-14 Thread Will Deacon
With all the pieces in place, we can finally propagate the
iommu_iotlb_gather structure from the call to unmap() down to the IOMMU
drivers' implementation of ->tlb_add_page(). Currently everybody ignores
it, but the machinery is now there to defer invalidation.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c|  3 ++-
 drivers/iommu/arm-smmu.c   |  3 ++-
 drivers/iommu/io-pgtable-arm-v7s.c | 23 ++-
 drivers/iommu/io-pgtable-arm.c | 22 ++
 drivers/iommu/msm_iommu.c  |  3 ++-
 drivers/iommu/mtk_iommu.c  |  3 ++-
 drivers/iommu/qcom_iommu.c |  3 ++-
 include/linux/io-pgtable.h | 16 +---
 8 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8e2e53079f48..d1ebc7103065 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1596,7 +1596,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
} while (size -= granule);
 }
 
-static void arm_smmu_tlb_inv_page_nosync(unsigned long iova, size_t granule,
+static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
+unsigned long iova, size_t granule,
 void *cookie)
 {
arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f6689956ab6e..5598d0ff71a8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -574,7 +574,8 @@ static void arm_smmu_tlb_inv_leaf(unsigned long iova, 
size_t size,
ops->tlb_sync(cookie);
 }
 
-static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule,
+static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather *gather,
+ unsigned long iova, size_t granule,
  void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index a7776e982b6c..18e7d212c7de 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -362,7 +362,8 @@ static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl)
return false;
 }
 
-static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *, unsigned long,
+static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *,
+ struct iommu_iotlb_gather *, unsigned long,
  size_t, int, arm_v7s_iopte *);
 
 static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
@@ -383,7 +384,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
size_t sz = ARM_V7S_BLOCK_SIZE(lvl);
 
tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl);
-   if (WARN_ON(__arm_v7s_unmap(data, iova + i * sz,
+   if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz,
sz, lvl, tblp) != sz))
return -EINVAL;
} else if (ptep[i]) {
@@ -545,6 +546,7 @@ static arm_v7s_iopte arm_v7s_split_cont(struct 
arm_v7s_io_pgtable *data,
 }
 
 static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data,
+ struct iommu_iotlb_gather *gather,
  unsigned long iova, size_t size,
  arm_v7s_iopte blk_pte,
  arm_v7s_iopte *ptep)
@@ -581,14 +583,15 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, 1);
-   return __arm_v7s_unmap(data, iova, size, 2, tablep);
+   return __arm_v7s_unmap(data, gather, iova, size, 2, tablep);
}
 
-   io_pgtable_tlb_add_page(>iop, iova, size);
+   io_pgtable_tlb_add_page(>iop, gather, iova, size);
return size;
 }
 
 static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
+ struct iommu_iotlb_gather *gather,
  unsigned long iova, size_t size, int lvl,
  arm_v7s_iopte *ptep)
 {
@@ -647,7 +650,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
 */
smp_wmb();
} else {
-   io_pgtable_tlb_add_page(iop, iova, blk_size);
+   io_pgtable_tlb_add_page(iop, gather, iova, 
blk_size);
}
iova += blk_size;
}
@@ -657,12 +660,13 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
 * Insert a table at the next level to map the old region,

[PATCH 03/13] iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops

2019-08-14 Thread Will Deacon
In preparation for TLB flush gathering in the IOMMU API, rename the
iommu_gather_ops structure in io-pgtable to iommu_flush_ops, which
better describes its purpose and avoids the potential for confusion
between different levels of the API.

$ find linux/ -type f -name '*.[ch]' | xargs sed -i 's/gather_ops/flush_ops/g'

Signed-off-by: Will Deacon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +-
 drivers/iommu/arm-smmu-v3.c | 4 ++--
 drivers/iommu/arm-smmu.c| 8 
 drivers/iommu/io-pgtable-arm-v7s.c  | 2 +-
 drivers/iommu/io-pgtable-arm.c  | 2 +-
 drivers/iommu/ipmmu-vmsa.c  | 4 ++--
 drivers/iommu/msm_iommu.c   | 4 ++--
 drivers/iommu/mtk_iommu.c   | 4 ++--
 drivers/iommu/qcom_iommu.c  | 4 ++--
 include/linux/io-pgtable.h  | 6 +++---
 10 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 92ac995dd9c6..17bceb11e708 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -257,7 +257,7 @@ static void mmu_tlb_sync_context(void *cookie)
// TODO: Wait 1000 GPU cycles for HW_ISSUE_6367/T60X
 }
 
-static const struct iommu_gather_ops mmu_tlb_ops = {
+static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
.tlb_add_flush  = mmu_tlb_inv_range_nosync,
.tlb_sync   = mmu_tlb_sync_context,
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a9a9fabd3968..7e137e1e28f1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1603,7 +1603,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
} while (size -= granule);
 }
 
-static const struct iommu_gather_ops arm_smmu_gather_ops = {
+static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync,
@@ -1796,7 +1796,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
.ias= ias,
.oas= oas,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
-   .tlb= _smmu_gather_ops,
+   .tlb= _smmu_flush_ops,
.iommu_dev  = smmu->dev,
};
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 64977c131ee6..dc08db347ef3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -251,7 +251,7 @@ enum arm_smmu_domain_stage {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct iommu_gather_ops   *tlb_ops;
+   const struct iommu_flush_ops*tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
@@ -547,19 +547,19 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long 
iova, size_t size,
writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
 }
 
-static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = {
+static const struct iommu_flush_ops arm_smmu_s1_tlb_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = {
+static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v1 = {
+static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_add_flush  = arm_smmu_tlb_inv_vmid_nosync,
.tlb_sync   = arm_smmu_tlb_sync_vmid,
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index a62733c6a632..116f97ee991e 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -817,7 +817,7 @@ static void dummy_tlb_sync(void *cookie)
WARN_ON(cookie != cfg_cookie);
 }
 
-static const struct iommu_gather_ops dummy_tlb_ops = {
+static const struct iommu_flush_ops dummy_tlb_ops = {
.tlb_flush_all  = dummy_tlb_flush_all,
.tlb_add_flush  = dummy_tlb_add_flush,
.tlb_sync   = dummy_tlb_sync,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 0d6633921c1e..402f913b6f6d 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1081,7 +1081,7 @@ static void dummy_tlb_sync(void *cookie)

[PATCH 06/13] iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()

2019-08-14 Thread Will Deacon
To allow IOMMU drivers to batch up TLB flushing operations and postpone
them until ->iotlb_sync() is called, extend the prototypes for the
->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
the current iommu_iotlb_gather structure.

All affected IOMMU drivers are updated, but there should be no
functional change since the extra parameter is ignored for now.

Signed-off-by: Will Deacon 
---
 drivers/iommu/amd_iommu.c  | 11 +--
 drivers/iommu/arm-smmu-v3.c|  7 ---
 drivers/iommu/arm-smmu.c   |  5 +++--
 drivers/iommu/exynos-iommu.c   |  3 ++-
 drivers/iommu/intel-iommu.c|  3 ++-
 drivers/iommu/iommu.c  |  2 +-
 drivers/iommu/ipmmu-vmsa.c | 12 +---
 drivers/iommu/msm_iommu.c  |  2 +-
 drivers/iommu/mtk_iommu.c  | 13 ++---
 drivers/iommu/mtk_iommu_v1.c   |  3 ++-
 drivers/iommu/omap-iommu.c |  2 +-
 drivers/iommu/qcom_iommu.c | 12 +---
 drivers/iommu/rockchip-iommu.c |  2 +-
 drivers/iommu/s390-iommu.c |  3 ++-
 drivers/iommu/tegra-gart.c | 12 +---
 drivers/iommu/tegra-smmu.c |  2 +-
 drivers/iommu/virtio-iommu.c   |  5 +++--
 include/linux/iommu.h  |  7 ---
 18 files changed, 73 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f93b148cf55e..29eeea914660 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3055,7 +3055,8 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
 }
 
 static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-  size_t page_size)
+ size_t page_size,
+ struct iommu_iotlb_gather *gather)
 {
struct protection_domain *domain = to_pdomain(dom);
size_t unmap_size;
@@ -3196,6 +3197,12 @@ static void amd_iommu_flush_iotlb_all(struct 
iommu_domain *domain)
domain_flush_complete(dom);
 }
 
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
+struct iommu_iotlb_gather *gather)
+{
+   amd_iommu_flush_iotlb_all(domain);
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3214,7 +3221,7 @@ const struct iommu_ops amd_iommu_ops = {
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
-   .iotlb_sync = amd_iommu_flush_iotlb_all,
+   .iotlb_sync = amd_iommu_iotlb_sync,
 };
 
 /*
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7e137e1e28f1..80753b8ca054 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1985,8 +1985,8 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return ops->map(ops, iova, paddr, size, prot);
 }
 
-static size_t
-arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
+static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
+size_t size, struct iommu_iotlb_gather *gather)
 {
int ret;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2010,7 +2010,8 @@ static void arm_smmu_flush_iotlb_all(struct iommu_domain 
*domain)
arm_smmu_tlb_inv_context(smmu_domain);
 }
 
-static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
+static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
+   struct iommu_iotlb_gather *gather)
 {
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index dc08db347ef3..e535ae2a9e65 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1301,7 +1301,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
-size_t size)
+size_t size, struct iommu_iotlb_gather *gather)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1329,7 +1329,8 @@ static void arm_smmu_flush_iotlb_all(struct iommu_domain 
*domain)
}
 }
 
-static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
+static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
+   struct iommu_iotlb_gather *gather)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index b0c1e5f9daae..cf5af34cb681 100644
--- a/drivers/iommu/exynos-iommu.c
+++ 

[PATCH 10/13] iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()

2019-08-14 Thread Will Deacon
The ->tlb_add_flush() callback in the io-pgtable API now looks a bit
silly:

  - It takes a size and a granule, which are always the same
  - It takes a 'bool leaf', which is always true
  - It only ever flushes a single page

With that in mind, replace it with an optional ->tlb_add_page() callback
that drops the useless parameters.

Signed-off-by: Will Deacon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  5 --
 drivers/iommu/arm-smmu-v3.c |  8 ++-
 drivers/iommu/arm-smmu.c| 88 +
 drivers/iommu/io-pgtable-arm-v7s.c  | 12 ++---
 drivers/iommu/io-pgtable-arm.c  | 11 ++---
 drivers/iommu/ipmmu-vmsa.c  |  7 ---
 drivers/iommu/msm_iommu.c   |  7 ++-
 drivers/iommu/mtk_iommu.c   |  8 ++-
 drivers/iommu/qcom_iommu.c  |  8 ++-
 include/linux/io-pgtable.h  | 22 -
 10 files changed, 105 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 651858147bd6..ff9af320cacc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -247,10 +247,6 @@ static void mmu_tlb_inv_context_s1(void *cookie)
mmu_hw_do_operation(pfdev, 0, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
 }
 
-static void mmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
-size_t granule, bool leaf, void *cookie)
-{}
-
 static void mmu_tlb_sync_context(void *cookie)
 {
//struct panfrost_device *pfdev = cookie;
@@ -273,7 +269,6 @@ static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
.tlb_flush_walk = mmu_tlb_flush_walk,
.tlb_flush_leaf = mmu_tlb_flush_leaf,
-   .tlb_add_flush  = mmu_tlb_inv_range_nosync,
.tlb_sync   = mmu_tlb_sync_context,
 };
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 79819b003b07..98c90a1b4b22 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1603,6 +1603,12 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
} while (size -= granule);
 }
 
+static void arm_smmu_tlb_inv_page_nosync(unsigned long iova, size_t granule,
+void *cookie)
+{
+   arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie);
+}
+
 static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
  size_t granule, void *cookie)
 {
@@ -1627,7 +1633,7 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
-   .tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
+   .tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
.tlb_sync   = arm_smmu_tlb_sync,
 };
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e9f01b860ae3..f056164a94b0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -248,10 +248,16 @@ enum arm_smmu_domain_stage {
ARM_SMMU_DOMAIN_BYPASS,
 };
 
+struct arm_smmu_flush_ops {
+   struct iommu_flush_ops  tlb;
+   void (*tlb_inv_range)(unsigned long iova, size_t size, size_t granule,
+ bool leaf, void *cookie)
+};
+
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct iommu_flush_ops*tlb_ops;
+   const struct arm_smmu_flush_ops *flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
@@ -551,42 +557,62 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, 
size_t size,
  size_t granule, void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
-   smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, false, cookie);
-   smmu_domain->tlb_ops->tlb_sync(cookie);
+   ops->tlb_inv_range(iova, size, granule, false, cookie);
+   ops->tlb.tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
  size_t granule, void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
+
+   ops->tlb_inv_range(iova, size, granule, true, cookie);
+   ops->tlb.tlb_sync(cookie);
+}
+
+static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule,
+ void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
-   smmu_domain->tlb_ops->tlb_add_flush(iova, 

[PATCH 11/13] iommu/io-pgtable: Remove unused ->tlb_sync() callback

2019-08-14 Thread Will Deacon
The ->tlb_sync() callback is no longer used, so it can be removed.

Signed-off-by: Will Deacon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 -
 drivers/iommu/arm-smmu-v3.c |  8 
 drivers/iommu/arm-smmu.c| 17 +
 drivers/iommu/io-pgtable-arm-v7s.c  |  6 --
 drivers/iommu/io-pgtable-arm.c  |  6 --
 drivers/iommu/ipmmu-vmsa.c  |  1 -
 drivers/iommu/msm_iommu.c   | 20 +++-
 drivers/iommu/mtk_iommu.c   |  1 -
 drivers/iommu/qcom_iommu.c  |  1 -
 include/linux/io-pgtable.h  |  9 -
 10 files changed, 16 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index ff9af320cacc..de22a2276e00 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -269,7 +269,6 @@ static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
.tlb_flush_walk = mmu_tlb_flush_walk,
.tlb_flush_leaf = mmu_tlb_flush_leaf,
-   .tlb_sync   = mmu_tlb_sync_context,
 };
 
 static const char *access_type_name(struct panfrost_device *pfdev,
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 98c90a1b4b22..231093413ff9 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1545,13 +1545,6 @@ static int arm_smmu_atc_inv_domain(struct 
arm_smmu_domain *smmu_domain,
 }
 
 /* IO_PGTABLE API */
-static void arm_smmu_tlb_sync(void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-
-   arm_smmu_cmdq_issue_sync(smmu_domain->smmu);
-}
-
 static void arm_smmu_tlb_inv_context(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
@@ -1634,7 +1627,6 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
-   .tlb_sync   = arm_smmu_tlb_sync,
 };
 
 /* IOMMU API */
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f056164a94b0..07a267c437d6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -251,7 +251,8 @@ enum arm_smmu_domain_stage {
 struct arm_smmu_flush_ops {
struct iommu_flush_ops  tlb;
void (*tlb_inv_range)(unsigned long iova, size_t size, size_t granule,
- bool leaf, void *cookie)
+ bool leaf, void *cookie);
+   void (*tlb_sync)(void *cookie);
 };
 
 struct arm_smmu_domain {
@@ -539,7 +540,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
  * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears
  * almost negligible, but the benefit of getting the first one in as far ahead
  * of the sync as possible is significant, hence we don't just make this a
- * no-op and set .tlb_sync to arm_smmu_inv_context_s2() as you might think.
+ * no-op and set .tlb_sync to arm_smmu_tlb_inv_context_s2() as you might think.
  */
 static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
 size_t granule, bool leaf, void 
*cookie)
@@ -560,7 +561,7 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, 
size_t size,
const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
ops->tlb_inv_range(iova, size, granule, false, cookie);
-   ops->tlb.tlb_sync(cookie);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
@@ -570,7 +571,7 @@ static void arm_smmu_tlb_inv_leaf(unsigned long iova, 
size_t size,
const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
ops->tlb_inv_range(iova, size, granule, true, cookie);
-   ops->tlb.tlb_sync(cookie);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule,
@@ -588,9 +589,9 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops 
= {
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_page   = arm_smmu_tlb_add_page,
-   .tlb_sync   = arm_smmu_tlb_sync_context,
},
.tlb_inv_range  = arm_smmu_tlb_inv_range_nosync,
+   .tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
@@ -599,9 +600,9 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v2 = {
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_page   = arm_smmu_tlb_add_page,
-   .tlb_sync   = arm_smmu_tlb_sync_context,
},
.tlb_inv_range  = arm_smmu_tlb_inv_range_nosync,
+   

[PATCH 08/13] iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers

2019-08-14 Thread Will Deacon
Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers using the
io-pgtable API so that we can start making use of them in the page-table
code. For now, they can just wrap the implementations of ->tlb_add_flush
and ->tlb_sync pending future optimisation in each driver.

Signed-off-by: Will Deacon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 14 ++
 drivers/iommu/arm-smmu-v3.c | 22 ++
 drivers/iommu/arm-smmu.c| 24 
 drivers/iommu/ipmmu-vmsa.c  |  8 
 drivers/iommu/msm_iommu.c   | 16 
 drivers/iommu/mtk_iommu.c   | 16 
 drivers/iommu/qcom_iommu.c  | 16 
 7 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 17bceb11e708..651858147bd6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -257,8 +257,22 @@ static void mmu_tlb_sync_context(void *cookie)
// TODO: Wait 1000 GPU cycles for HW_ISSUE_6367/T60X
 }
 
+static void mmu_tlb_flush_walk(unsigned long iova, size_t size, size_t granule,
+  void *cookie)
+{
+   mmu_tlb_sync_context(cookie);
+}
+
+static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule,
+  void *cookie)
+{
+   mmu_tlb_sync_context(cookie);
+}
+
 static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
+   .tlb_flush_walk = mmu_tlb_flush_walk,
+   .tlb_flush_leaf = mmu_tlb_flush_leaf,
.tlb_add_flush  = mmu_tlb_inv_range_nosync,
.tlb_sync   = mmu_tlb_sync_context,
 };
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 80753b8ca054..79819b003b07 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1603,8 +1603,30 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
} while (size -= granule);
 }
 
+static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   arm_smmu_tlb_inv_range_nosync(iova, size, granule, false, cookie);
+   arm_smmu_cmdq_issue_sync(smmu);
+}
+
+static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   arm_smmu_tlb_inv_range_nosync(iova, size, granule, true, cookie);
+   arm_smmu_cmdq_issue_sync(smmu);
+}
+
 static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync,
 };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e535ae2a9e65..e9f01b860ae3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -547,20 +547,44 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long 
iova, size_t size,
writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
 }
 
+static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, false, cookie);
+   smmu_domain->tlb_ops->tlb_sync(cookie);
+}
+
+static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, true, cookie);
+   smmu_domain->tlb_ops->tlb_sync(cookie);
+}
+
 static const struct iommu_flush_ops arm_smmu_s1_tlb_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk,
+ 

[PATCH 05/13] iommu: Introduce iommu_iotlb_gather_add_page()

2019-08-14 Thread Will Deacon
Introduce a helper function for drivers to use when updating an
iommu_iotlb_gather structure in response to an ->unmap() call, rather
than having to open-code the logic in every page-table implementation.

Signed-off-by: Will Deacon 
---
 include/linux/iommu.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aaf073010a9a..ad41aee55bc6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -507,6 +507,31 @@ static inline void iommu_tlb_sync(struct iommu_domain 
*domain,
iommu_iotlb_gather_init(iotlb_gather);
 }
 
+static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
+  struct iommu_iotlb_gather 
*gather,
+  unsigned long iova, size_t size)
+{
+   unsigned long start = iova, end = start + size;
+
+   /*
+* If the new page is disjoint from the current range or is mapped at
+* a different granularity, then sync the TLB so that the gather
+* structure can be rewritten.
+*/
+   if (gather->pgsize != size ||
+   end < gather->start || start > gather->end) {
+   if (gather->pgsize)
+   iommu_tlb_sync(domain, gather);
+   gather->pgsize = size;
+   }
+
+   if (gather->end < end)
+   gather->end = end;
+
+   if (gather->start > start)
+   gather->start = start;
+}
+
 /* PCI device grouping function */
 extern struct iommu_group *pci_device_group(struct device *dev);
 /* Generic device grouping function */
@@ -847,6 +872,12 @@ static inline void iommu_iotlb_gather_init(struct 
iommu_iotlb_gather *gather)
 {
 }
 
+static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
+  struct iommu_iotlb_gather 
*gather,
+  unsigned long iova, size_t size)
+{
+}
+
 static inline void iommu_device_unregister(struct iommu_device *iommu)
 {
 }
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()

2019-08-14 Thread Will Deacon
Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict
mode") added an unconditional call to io_pgtable_tlb_sync() immediately
after the case where we replace a block entry with a table entry during
an unmap() call. This is redundant, since the IOMMU API will call
iommu_tlb_sync() on this path and the patch in question mentions this:

 | To save having to reason about it too much, make sure the invalidation
 | in arm_lpae_split_blk_unmap() just performs its own unconditional sync
 | to minimise the window in which we're technically violating the break-
 | before-make requirement on a live mapping. This might work out redundant
 | with an outer-level sync for strict unmaps, but we'll never be splitting
 | blocks on a DMA fastpath anyway.

However, this sync gets in the way of deferred TLB invalidation for leaf
entries and is at best a questionable, unproven hack. Remove it.

Signed-off-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 1 -
 drivers/iommu/io-pgtable-arm.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 0fc8dfab2abf..a62733c6a632 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -587,7 +587,6 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
}
 
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
-   io_pgtable_tlb_sync(>iop);
return size;
 }
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 161a7d56264d..0d6633921c1e 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -583,7 +583,6 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
tablep = iopte_deref(pte, data);
} else if (unmap_idx >= 0) {
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
-   io_pgtable_tlb_sync(>iop);
return size;
}
 
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/13] iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()

2019-08-14 Thread Will Deacon
In preparation for deferring TLB flushes to iommu_tlb_sync(), introduce
two new synchronous invalidation helpers to the io-pgtable API, which
allow the unmap() code to force invalidation in cases where it cannot be
deferred (e.g. when replacing a table with a block or when TLBI_ON_MAP
is set).

Signed-off-by: Will Deacon 
---
 include/linux/io-pgtable.h | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 6292ea15d674..27275575b305 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -19,17 +19,31 @@ enum io_pgtable_fmt {
 /**
  * struct iommu_flush_ops - IOMMU callbacks for TLB and page table management.
  *
- * @tlb_flush_all: Synchronously invalidate the entire TLB context.
- * @tlb_add_flush: Queue up a TLB invalidation for a virtual address range.
- * @tlb_sync:  Ensure any queued TLB invalidation has taken effect, and
- * any corresponding page table updates are visible to the
- * IOMMU.
+ * @tlb_flush_all:  Synchronously invalidate the entire TLB context.
+ * @tlb_flush_walk: Synchronously invalidate all intermediate TLB state
+ *  (sometimes referred to as the "walk cache") for a virtual
+ *  address range.
+ * @tlb_flush_leaf: Synchronously invalidate all leaf TLB state for a virtual
+ *  address range.
+ * @tlb_add_flush:  Optional callback to queue up leaf TLB invalidation for a
+ *  virtual address range.  This function exists purely as an
+ *  optimisation for IOMMUs that cannot batch TLB invalidation
+ *  operations efficiently and are therefore better suited to
+ *  issuing them early rather than deferring them until
+ *  iommu_tlb_sync().
+ * @tlb_sync:   Ensure any queued TLB invalidation has taken effect, and
+ *  any corresponding page table updates are visible to the
+ *  IOMMU.
  *
  * Note that these can all be called in atomic context and must therefore
  * not block.
  */
 struct iommu_flush_ops {
void (*tlb_flush_all)(void *cookie);
+   void (*tlb_flush_walk)(unsigned long iova, size_t size, size_t granule,
+  void *cookie);
+   void (*tlb_flush_leaf)(unsigned long iova, size_t size, size_t granule,
+  void *cookie);
void (*tlb_add_flush)(unsigned long iova, size_t size, size_t granule,
  bool leaf, void *cookie);
void (*tlb_sync)(void *cookie);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/13] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()

2019-08-14 Thread Will Deacon
Update the io-pgtable ->unmap() function to take an iommu_iotlb_gather
pointer as an argument, and update the callers as appropriate.

Signed-off-by: Will Deacon 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +-
 drivers/iommu/arm-smmu-v3.c | 2 +-
 drivers/iommu/arm-smmu.c| 2 +-
 drivers/iommu/io-pgtable-arm-v7s.c  | 6 +++---
 drivers/iommu/io-pgtable-arm.c  | 7 +++
 drivers/iommu/ipmmu-vmsa.c  | 2 +-
 drivers/iommu/msm_iommu.c   | 2 +-
 drivers/iommu/mtk_iommu.c   | 2 +-
 drivers/iommu/qcom_iommu.c  | 2 +-
 include/linux/io-pgtable.h  | 4 +++-
 10 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index de22a2276e00..6e8145c36e93 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -222,7 +222,7 @@ void panfrost_mmu_unmap(struct panfrost_gem_object *bo)
size_t unmapped_page;
size_t pgsize = get_pgsize(iova, len - unmapped_len);
 
-   unmapped_page = ops->unmap(ops, iova, pgsize);
+   unmapped_page = ops->unmap(ops, iova, pgsize, NULL);
if (!unmapped_page)
break;
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 231093413ff9..8e2e53079f48 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2015,7 +2015,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
if (!ops)
return 0;
 
-   ret = ops->unmap(ops, iova, size);
+   ret = ops->unmap(ops, iova, size, gather);
if (ret && arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size))
return 0;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 07a267c437d6..f6689956ab6e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1362,7 +1362,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return 0;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->unmap(ops, iova, size);
+   ret = ops->unmap(ops, iova, size, gather);
arm_smmu_rpm_put(smmu);
 
return ret;
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 203894fb6765..a7776e982b6c 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -666,7 +666,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
 }
 
 static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-   size_t size)
+   size_t size, struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
 
@@ -892,7 +892,7 @@ static int __init arm_v7s_do_selftests(void)
size = 1UL << __ffs(cfg.pgsize_bitmap);
while (i < loopnr) {
iova_start = i * SZ_16M;
-   if (ops->unmap(ops, iova_start + size, size) != size)
+   if (ops->unmap(ops, iova_start + size, size, NULL) != size)
return __FAIL(ops);
 
/* Remap of partial unmap */
@@ -910,7 +910,7 @@ static int __init arm_v7s_do_selftests(void)
for_each_set_bit(i, _bitmap, BITS_PER_LONG) {
size = 1UL << i;
 
-   if (ops->unmap(ops, iova, size) != size)
+   if (ops->unmap(ops, iova, size, NULL) != size)
return __FAIL(ops);
 
if (ops->iova_to_phys(ops, iova + 42))
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f35516744965..325430f8a0a1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -642,7 +641,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 }
 
 static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-size_t size)
+size_t size, struct iommu_iotlb_gather *gather)
 {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte *ptep = data->pgd;
@@ -1167,7 +1166,7 @@ static int __init arm_lpae_run_tests(struct 
io_pgtable_cfg *cfg)
 
/* Partial unmap */
size = 1UL << __ffs(cfg->pgsize_bitmap);
-   if (ops->unmap(ops, SZ_1G + size, size) != size)
+   if (ops->unmap(ops, SZ_1G + size, size, NULL) != size)
return __FAIL(ops, i);
 
/* Remap of partial unmap */
@@ -1182,7 +1181,7 @@ static int __init arm_lpae_run_tests(struct 
io_pgtable_cfg *cfg)
for_each_set_bit(j, >pgsize_bitmap, BITS_PER_LONG) {
size = 1UL << j;
 
- 

[PATCH 00/13] Rework IOMMU API to allow for batching of invalidation

2019-08-14 Thread Will Deacon
Hi everybody,

These are the core IOMMU changes that I have posted previously as part
of my ongoing effort to reduce the lock contention of the SMMUv3 command
queue. I thought it would be better to split this out as a separate
series, since I think it's ready to go and all the driver conversions
mean that it's quite a pain for me to maintain out of tree!

The idea of the patch series is to allow TLB invalidation to be batched
up into a new 'struct iommu_iotlb_gather' structure, which tracks the
properties of the virtual address range being invalidated so that it
can be deferred until the driver's ->iotlb_sync() function is called.
This allows for more efficient invalidation on hardware that can submit
multiple invalidations in one go.

The previous series was included in:

  https://lkml.kernel.org/r/20190711171927.28803-1-w...@kernel.org

The only real change since then is incorporating the newly merged
virtio-iommu driver.

If you'd like to play with the patches, then I've also pushed them here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap

but they should behave as a no-op on their own. Patches to convert the
Arm SMMUv3 driver to the new API are here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq

Cheers,

Will

--->8

Cc: Jean-Philippe Brucker 
Cc: Robin Murphy 
Cc: Jayachandran Chandrasekharan Nair 
Cc: Jan Glauber 
Cc: Jon Masters 
Cc: Eric Auger 
Cc: Zhen Lei 
Cc: Jonathan Cameron 
Cc: Vijay Kilary 
Cc: Joerg Roedel 
Cc: John Garry 
Cc: Alex Williamson 
Cc: Marek Szyprowski 
Cc: David Woodhouse 

Will Deacon (13):
  iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
  iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
  iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
  iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
  iommu: Introduce iommu_iotlb_gather_add_page()
  iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
  iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
  iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
drivers
  iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
  iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
  iommu/io-pgtable: Remove unused ->tlb_sync() callback
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()

 drivers/gpu/drm/panfrost/panfrost_mmu.c |  24 +---
 drivers/iommu/amd_iommu.c   |  11 ++--
 drivers/iommu/arm-smmu-v3.c |  52 +++-
 drivers/iommu/arm-smmu.c| 103 
 drivers/iommu/dma-iommu.c   |   9 ++-
 drivers/iommu/exynos-iommu.c|   3 +-
 drivers/iommu/intel-iommu.c |   3 +-
 drivers/iommu/io-pgtable-arm-v7s.c  |  57 +-
 drivers/iommu/io-pgtable-arm.c  |  48 ---
 drivers/iommu/iommu.c   |  24 
 drivers/iommu/ipmmu-vmsa.c  |  28 +
 drivers/iommu/msm_iommu.c   |  42 +
 drivers/iommu/mtk_iommu.c   |  45 +++---
 drivers/iommu/mtk_iommu_v1.c|   3 +-
 drivers/iommu/omap-iommu.c  |   2 +-
 drivers/iommu/qcom_iommu.c  |  44 +++---
 drivers/iommu/rockchip-iommu.c  |   2 +-
 drivers/iommu/s390-iommu.c  |   3 +-
 drivers/iommu/tegra-gart.c  |  12 +++-
 drivers/iommu/tegra-smmu.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   5 +-
 drivers/vfio/vfio_iommu_type1.c |  27 +
 include/linux/io-pgtable.h  |  57 --
 include/linux/iommu.h   |  92 +---
 24 files changed, 483 insertions(+), 215 deletions(-)

-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/13] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes

2019-08-14 Thread Will Deacon
To permit batching of TLB flushes across multiple calls to the IOMMU
driver's ->unmap() implementation, introduce a new structure for
tracking the address range to be flushed and the granularity at which
the flushing is required.

This is hooked into the IOMMU API and its caller are updated to make use
of the new structure. Subsequent patches will plumb this into the IOMMU
drivers as well, but for now the gathering information is ignored.

Signed-off-by: Will Deacon 
---
 drivers/iommu/dma-iommu.c   |  9 +++--
 drivers/iommu/iommu.c   | 19 +++---
 drivers/vfio/vfio_iommu_type1.c | 26 -
 include/linux/iommu.h   | 43 +
 4 files changed, 75 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a7f9c3edbcb2..80beb1f5994a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -444,13 +444,18 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
size_t iova_off = iova_offset(iovad, dma_addr);
+   struct iommu_iotlb_gather iotlb_gather;
+   size_t unmapped;
 
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
+   iommu_iotlb_gather_init(_gather);
+
+   unmapped = iommu_unmap_fast(domain, dma_addr, size, _gather);
+   WARN_ON(unmapped != size);
 
-   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
if (!cookie->fq_domain)
-   iommu_tlb_sync(domain);
+   iommu_tlb_sync(domain, _gather);
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6d7b25fe2474..d67222fdfe44 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1862,7 +1862,7 @@ EXPORT_SYMBOL_GPL(iommu_map);
 
 static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
-   bool sync)
+   struct iommu_iotlb_gather *iotlb_gather)
 {
const struct iommu_ops *ops = domain->ops;
size_t unmapped_page, unmapped = 0;
@@ -1910,9 +1910,6 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unmapped += unmapped_page;
}
 
-   if (sync && ops->iotlb_sync)
-   ops->iotlb_sync(domain);
-
trace_unmap(orig_iova, size, unmapped);
return unmapped;
 }
@@ -1920,14 +1917,22 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 size_t iommu_unmap(struct iommu_domain *domain,
   unsigned long iova, size_t size)
 {
-   return __iommu_unmap(domain, iova, size, true);
+   struct iommu_iotlb_gather iotlb_gather;
+   size_t ret;
+
+   iommu_iotlb_gather_init(_gather);
+   ret = __iommu_unmap(domain, iova, size, _gather);
+   iommu_tlb_sync(domain, _gather);
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
 size_t iommu_unmap_fast(struct iommu_domain *domain,
-   unsigned long iova, size_t size)
+   unsigned long iova, size_t size,
+   struct iommu_iotlb_gather *iotlb_gather)
 {
-   return __iommu_unmap(domain, iova, size, false);
+   return __iommu_unmap(domain, iova, size, iotlb_gather);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap_fast);
 
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index fad7fd8c167c..ad830abe1021 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -650,12 +650,13 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
 }
 
 static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
-   struct list_head *regions)
+   struct list_head *regions,
+   struct iommu_iotlb_gather *iotlb_gather)
 {
long unlocked = 0;
struct vfio_regions *entry, *next;
 
-   iommu_tlb_sync(domain->domain);
+   iommu_tlb_sync(domain->domain, iotlb_gather);
 
list_for_each_entry_safe(entry, next, regions, list) {
unlocked += vfio_unpin_pages_remote(dma,
@@ -685,13 +686,15 @@ static size_t unmap_unpin_fast(struct vfio_domain *domain,
   struct vfio_dma *dma, dma_addr_t *iova,
   size_t len, phys_addr_t phys, long *unlocked,
   struct list_head *unmapped_list,
-  int *unmapped_cnt)
+  int *unmapped_cnt,
+  struct iommu_iotlb_gather *iotlb_gather)
 {
size_t unmapped = 0;
struct vfio_regions *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
 
if (entry) {
-   unmapped = iommu_unmap_fast(domain->domain, *iova, len);
+   

[PATCH 01/13] iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops

2019-08-14 Thread Will Deacon
Commit add02cfdc9bc ("iommu: Introduce Interface for IOMMU TLB Flushing")
added three new TLB flushing operations to the IOMMU API so that the
underlying driver operations can be batched when unmapping large regions
of IO virtual address space.

However, the ->iotlb_range_add() callback has not been implemented by
any IOMMU drivers (amd_iommu.c implements it as an empty function, which
incurs the overhead of an indirect branch). Instead, drivers either flush
the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
invalidation during ->unmap().

Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
two major issues:

  1. The page size used to map the region in the page-table is not known,
 and so it is not generally possible to issue TLB flushes in the most
 efficient manner.

  2. The only mutable state passed to the callback is a pointer to the
 iommu_domain, which can be accessed concurrently and therefore
 requires expensive synchronisation to keep track of the outstanding
 flushes.

Remove the callback entirely in preparation for extending ->unmap() and
->iotlb_sync() to update a token on the caller's stack.

Signed-off-by: Will Deacon 
---
 drivers/iommu/amd_iommu.c   |  6 --
 drivers/iommu/iommu.c   |  3 ---
 drivers/vfio/vfio_iommu_type1.c |  1 -
 include/linux/iommu.h   | 15 ---
 4 files changed, 25 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..f93b148cf55e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3196,11 +3196,6 @@ static void amd_iommu_flush_iotlb_all(struct 
iommu_domain *domain)
domain_flush_complete(dom);
 }
 
-static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
- unsigned long iova, size_t size)
-{
-}
-
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3219,7 +3214,6 @@ const struct iommu_ops amd_iommu_ops = {
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
-   .iotlb_range_add = amd_iommu_iotlb_range_add,
.iotlb_sync = amd_iommu_flush_iotlb_all,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..6d7b25fe2474 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1903,9 +1903,6 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
if (!unmapped_page)
break;
 
-   if (sync && ops->iotlb_range_add)
-   ops->iotlb_range_add(domain, iova, pgsize);
-
pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
 iova, unmapped_page);
 
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f30fa8..fad7fd8c167c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -696,7 +696,6 @@ static size_t unmap_unpin_fast(struct vfio_domain *domain,
if (!unmapped) {
kfree(entry);
} else {
-   iommu_tlb_range_add(domain->domain, *iova, unmapped);
entry->iova = *iova;
entry->phys = phys;
entry->len  = unmapped;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fdc355ccc570..1e21431262d9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -201,7 +201,6 @@ struct iommu_sva_ops {
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
- * @iotlb_range_add: Add a given iova range to the flush queue for this domain
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *queue
@@ -244,8 +243,6 @@ struct iommu_ops {
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size);
void (*flush_iotlb_all)(struct iommu_domain *domain);
-   void (*iotlb_range_add)(struct iommu_domain *domain,
-   unsigned long iova, size_t size);
void (*iotlb_sync_map)(struct iommu_domain *domain);
void (*iotlb_sync)(struct iommu_domain *domain);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
@@ -476,13 +473,6 @@ static inline void iommu_flush_tlb_all(struct iommu_domain 
*domain)
domain->ops->flush_iotlb_all(domain);
 }
 
-static inline void iommu_tlb_range_add(struct iommu_domain *domain,
-  unsigned long iova, size_t size)
-{
-   if 

[PATCH 09/13] iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()

2019-08-14 Thread Will Deacon
Now that all IOMMU drivers using the io-pgtable API implement the
->tlb_flush_walk() and ->tlb_flush_leaf() callbacks, we can use them in
the io-pgtable code instead of ->tlb_add_flush() immediately followed by
->tlb_sync().

Signed-off-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 25 +++--
 drivers/iommu/io-pgtable-arm.c | 17 -
 include/linux/io-pgtable.h | 14 ++
 3 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 116f97ee991e..8d4914fe73bc 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -493,9 +493,8 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned 
long iova,
 * a chance for anything to kick off a table walk for the new iova.
 */
if (iop->cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
-   io_pgtable_tlb_add_flush(iop, iova, size,
-ARM_V7S_BLOCK_SIZE(2), false);
-   io_pgtable_tlb_sync(iop);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_V7S_BLOCK_SIZE(2));
} else {
wmb();
}
@@ -541,8 +540,7 @@ static arm_v7s_iopte arm_v7s_split_cont(struct 
arm_v7s_io_pgtable *data,
__arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, >cfg);
 
size *= ARM_V7S_CONT_PAGES;
-   io_pgtable_tlb_add_flush(iop, iova, size, size, true);
-   io_pgtable_tlb_sync(iop);
+   io_pgtable_tlb_flush_leaf(iop, iova, size, size);
return pte;
 }
 
@@ -637,9 +635,8 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
for (i = 0; i < num_entries; i++) {
if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
/* Also flush any partial walks */
-   io_pgtable_tlb_add_flush(iop, iova, blk_size,
-   ARM_V7S_BLOCK_SIZE(lvl + 1), false);
-   io_pgtable_tlb_sync(iop);
+   io_pgtable_tlb_flush_walk(iop, iova, blk_size,
+   ARM_V7S_BLOCK_SIZE(lvl + 1));
ptep = iopte_deref(pte[i], lvl);
__arm_v7s_free_table(ptep, lvl + 1, data);
} else if (iop->cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT) {
@@ -805,13 +802,19 @@ static void dummy_tlb_flush_all(void *cookie)
WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_add_flush(unsigned long iova, size_t size,
-   size_t granule, bool leaf, void *cookie)
+static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
+   void *cookie)
 {
WARN_ON(cookie != cfg_cookie);
WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
 }
 
+static void dummy_tlb_add_flush(unsigned long iova, size_t size,
+   size_t granule, bool leaf, void *cookie)
+{
+   dummy_tlb_flush(iova, size, granule, cookie);
+}
+
 static void dummy_tlb_sync(void *cookie)
 {
WARN_ON(cookie != cfg_cookie);
@@ -819,6 +822,8 @@ static void dummy_tlb_sync(void *cookie)
 
 static const struct iommu_flush_ops dummy_tlb_ops = {
.tlb_flush_all  = dummy_tlb_flush_all,
+   .tlb_flush_walk = dummy_tlb_flush,
+   .tlb_flush_leaf = dummy_tlb_flush,
.tlb_add_flush  = dummy_tlb_add_flush,
.tlb_sync   = dummy_tlb_sync,
 };
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 402f913b6f6d..b58338c86323 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -611,9 +611,8 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 
if (!iopte_leaf(pte, lvl, iop->fmt)) {
/* Also flush any partial walks */
-   io_pgtable_tlb_add_flush(iop, iova, size,
-   ARM_LPAE_GRANULE(data), false);
-   io_pgtable_tlb_sync(iop);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
} else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) {
@@ -1069,13 +1068,19 @@ static void dummy_tlb_flush_all(void *cookie)
WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_add_flush(unsigned long iova, size_t size,
-   size_t granule, bool leaf, void *cookie)
+static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
+   void *cookie)
 {
WARN_ON(cookie != cfg_cookie);
WARN_ON(!(size & 

Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-14 Thread Daniel Vetter
On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote:
> Hello
> 
> Since lot of release (at least since 4.19), I hit the following error message:
> DMA-API: cacheline tracking ENOMEM, dma-debug disabled
> 
> After hitting that, I try to check who is creating so many DMA mapping and 
> see:
> cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
>   6 ahci
> 257 e1000e
>   6 ehci-pci
>5891 nouveau
>  24 uhci_hcd
> 
> Does nouveau having this high number of DMA mapping is normal ?

Yeah seems perfectly fine for a gpu.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 04/11] misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct

2019-08-14 Thread Dimitri Sivanich
On Wed, Aug 14, 2019 at 03:58:34PM +, Jason Gunthorpe wrote:
> On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote:
> > Looks good,
> > 
> > Reviewed-by: Christoph Hellwig 
> 
> Dimitri, are you OK with this patch?
>

I think this looks OK.

Reviewed-by: Dimitri Sivanich 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/15] iommu/arm-smmu: Convert GR0 registers to bitfields

2019-08-14 Thread Robin Murphy

On 14/08/2019 18:20, Will Deacon wrote:

On Fri, Aug 09, 2019 at 06:07:38PM +0100, Robin Murphy wrote:

FIELD_PREP remains a terrible name, but the overall simplification will
make further work on this stuff that much more manageable. This also
serves as an audit of the header, wherein we can impose a consistent
grouping and ordering of the offset and field definitions

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm-smmu-regs.h | 126 --
  drivers/iommu/arm-smmu.c  |  51 +++---
  2 files changed, 84 insertions(+), 93 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index 1c278f7ae888..d189f025537a 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -10,111 +10,101 @@
  #ifndef _ARM_SMMU_REGS_H
  #define _ARM_SMMU_REGS_H
  
+#include 

+
  /* Configuration registers */
  #define ARM_SMMU_GR0_sCR0 0x0
-#define sCR0_CLIENTPD  (1 << 0)
-#define sCR0_GFRE  (1 << 1)
-#define sCR0_GFIE  (1 << 2)
-#define sCR0_EXIDENABLE(1 << 3)
-#define sCR0_GCFGFRE   (1 << 4)
-#define sCR0_GCFGFIE   (1 << 5)
-#define sCR0_USFCFG(1 << 10)
-#define sCR0_VMIDPNE   (1 << 11)
-#define sCR0_PTM   (1 << 12)
-#define sCR0_FB(1 << 13)
-#define sCR0_VMID16EN  (1 << 31)
-#define sCR0_BSU_SHIFT 14
-#define sCR0_BSU_MASK  0x3
+#define sCR0_VMID16EN  BIT(31)
+#define sCR0_BSU   GENMASK(15, 14)
+#define sCR0_FBBIT(13)
+#define sCR0_PTM   BIT(12)
+#define sCR0_VMIDPNE   BIT(11)
+#define sCR0_USFCFGBIT(10)
+#define sCR0_GCFGFIE   BIT(5)
+#define sCR0_GCFGFRE   BIT(4)
+#define sCR0_EXIDENABLEBIT(3)
+#define sCR0_GFIE  BIT(2)
+#define sCR0_GFRE  BIT(1)
+#define sCR0_CLIENTPD  BIT(0)
  
  /* Auxiliary Configuration register */

  #define ARM_SMMU_GR0_sACR 0x10
  
  /* Identification registers */

  #define ARM_SMMU_GR0_ID0  0x20
+#define ID0_S1TS   BIT(30)
+#define ID0_S2TS   BIT(29)
+#define ID0_NTSBIT(28)
+#define ID0_SMSBIT(27)
+#define ID0_ATOSNS BIT(26)
+#define ID0_PTFS_NO_AARCH32BIT(25)
+#define ID0_PTFS_NO_AARCH32S   BIT(24)
+#define ID0_CTTW   BIT(14)
+#define ID0_NUMIRPTGENMASK(23, 16)


nit: assuming this should be above ID0_CTTW so things are in descending
bit order?


Bah, indeed it should. Fixed now.

Other than that, looks good to me.


Thanks!

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/15] iommu/arm-smmu: Convert GR0 registers to bitfields

2019-08-14 Thread Will Deacon
On Fri, Aug 09, 2019 at 06:07:38PM +0100, Robin Murphy wrote:
> FIELD_PREP remains a terrible name, but the overall simplification will
> make further work on this stuff that much more manageable. This also
> serves as an audit of the header, wherein we can impose a consistent
> grouping and ordering of the offset and field definitions
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-regs.h | 126 --
>  drivers/iommu/arm-smmu.c  |  51 +++---
>  2 files changed, 84 insertions(+), 93 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> index 1c278f7ae888..d189f025537a 100644
> --- a/drivers/iommu/arm-smmu-regs.h
> +++ b/drivers/iommu/arm-smmu-regs.h
> @@ -10,111 +10,101 @@
>  #ifndef _ARM_SMMU_REGS_H
>  #define _ARM_SMMU_REGS_H
>  
> +#include 
> +
>  /* Configuration registers */
>  #define ARM_SMMU_GR0_sCR00x0
> -#define sCR0_CLIENTPD(1 << 0)
> -#define sCR0_GFRE(1 << 1)
> -#define sCR0_GFIE(1 << 2)
> -#define sCR0_EXIDENABLE  (1 << 3)
> -#define sCR0_GCFGFRE (1 << 4)
> -#define sCR0_GCFGFIE (1 << 5)
> -#define sCR0_USFCFG  (1 << 10)
> -#define sCR0_VMIDPNE (1 << 11)
> -#define sCR0_PTM (1 << 12)
> -#define sCR0_FB  (1 << 13)
> -#define sCR0_VMID16EN(1 << 31)
> -#define sCR0_BSU_SHIFT   14
> -#define sCR0_BSU_MASK0x3
> +#define sCR0_VMID16ENBIT(31)
> +#define sCR0_BSU GENMASK(15, 14)
> +#define sCR0_FB  BIT(13)
> +#define sCR0_PTM BIT(12)
> +#define sCR0_VMIDPNE BIT(11)
> +#define sCR0_USFCFG  BIT(10)
> +#define sCR0_GCFGFIE BIT(5)
> +#define sCR0_GCFGFRE BIT(4)
> +#define sCR0_EXIDENABLE  BIT(3)
> +#define sCR0_GFIEBIT(2)
> +#define sCR0_GFREBIT(1)
> +#define sCR0_CLIENTPDBIT(0)
>  
>  /* Auxiliary Configuration register */
>  #define ARM_SMMU_GR0_sACR0x10
>  
>  /* Identification registers */
>  #define ARM_SMMU_GR0_ID0 0x20
> +#define ID0_S1TS BIT(30)
> +#define ID0_S2TS BIT(29)
> +#define ID0_NTS  BIT(28)
> +#define ID0_SMS  BIT(27)
> +#define ID0_ATOSNS   BIT(26)
> +#define ID0_PTFS_NO_AARCH32  BIT(25)
> +#define ID0_PTFS_NO_AARCH32S BIT(24)
> +#define ID0_CTTW BIT(14)
> +#define ID0_NUMIRPT  GENMASK(23, 16)

nit: assuming this should be above ID0_CTTW so things are in descending
bit order?

Other than that, looks good to me.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support

2019-08-14 Thread Jacob Pan
On Fri, 5 Jul 2019 10:21:27 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 6/28/19 4:22 AM, Jacob Pan wrote:
> >>> + }
> >>> + refcount_set(>refs, 0);
> >>> + ioasid_set_data(data->hpasid, svm);
> >>> + INIT_LIST_HEAD_RCU(>devs);
> >>> + INIT_LIST_HEAD(>list);
> >>> +
> >>> + mmput(svm->mm);
> >>> + }
> >>> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> >>> + if (!sdev) {
> >>> + ret = -ENOMEM;
> >>> + goto out;  
> >> I think you need to clean up svm if its device list is empty here,
> >> as you said above:
> >>  
> > No, we come here only if the device list is not empty and the new
> > device to bind is different than any existing device in the list.
> > If we cannot allocate memory for the new device, should not free
> > the existing SVM, right?
> >   
> 
> I'm sorry, but the code doesn't show this. We come here even an svm
> data structure was newly created with an empty device list. I post
> the code below to ensure that we are reading a same piece of code.
> 
Sorry for the delay. You are right, I need to clean up svm if device
list is empty.

Thanks!
>  mutex_lock(_mutex);
>  svm = ioasid_find(NULL, data->hpasid, NULL);
>  if (IS_ERR(svm)) {
>  ret = PTR_ERR(svm);
>  goto out;
>  }
>  if (svm) {
>  /*
>   * If we found svm for the PASID, there must be at
>   * least one device bond, otherwise svm should be
> freed. */
>  BUG_ON(list_empty(>devs));
> 
>  for_each_svm_dev() {
>  /* In case of multiple sub-devices of the
> same pdev assigned, we should
>   * allow multiple bind calls with the same 
> PASID and pdev.
>   */
>  sdev->users++;
>  goto out;
>  }
>  } else {
>  /* We come here when PASID has never been bond to a 
> device. */
>  svm = kzalloc(sizeof(*svm), GFP_KERNEL);
>  if (!svm) {
>  ret = -ENOMEM;
>  goto out;
>  }
>  /* REVISIT: upper layer/VFIO can track host process 
> that bind the PASID.
>   * ioasid_set = mm might be sufficient for vfio to 
> check pasid VMM
>   * ownership.
>   */
>  svm->mm = get_task_mm(current);
>  svm->pasid = data->hpasid;
>  if (data->flags & IOMMU_SVA_GPASID_VAL) {
>  svm->gpasid = data->gpasid;
>  svm->flags &= SVM_FLAG_GUEST_PASID;
>  }
>  refcount_set(>refs, 0);
>  ioasid_set_data(data->hpasid, svm);
>  INIT_LIST_HEAD_RCU(>devs);
>  INIT_LIST_HEAD(>list);
> 
>  mmput(svm->mm);
>  }
>  sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
>  if (!sdev) {
>  ret = -ENOMEM;
>  goto out;
>  }
>  sdev->dev = dev;
>  sdev->users = 1;
> 
> Best regards,
> Baolu

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5.2 085/144] iommu/vt-d: Check if domain->pgd was allocated

2019-08-14 Thread Greg Kroah-Hartman
[ Upstream commit 3ee9eca760e7d0b68c55813243de66bbb499dc3b ]

There is a couple of places where on domain_init() failure domain_exit()
is called. While currently domain_init() can fail only if
alloc_pgtable_page() has failed.

Make domain_exit() check if domain->pgd present, before calling
domain_unmap(), as it theoretically should crash on clearing pte entries
in dma_pte_clear_level().

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov 
Reviewed-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel-iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2101601adf57d..1ad24367373f4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1900,7 +1900,6 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
 
 static void domain_exit(struct dmar_domain *domain)
 {
-   struct page *freelist;
 
/* Remove associated devices and clear attached or cached domains */
rcu_read_lock();
@@ -1910,9 +1909,12 @@ static void domain_exit(struct dmar_domain *domain)
/* destroy iovas */
put_iova_domain(>iovad);
 
-   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   if (domain->pgd) {
+   struct page *freelist;
 
-   dma_free_pagelist(freelist);
+   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   dma_free_pagelist(freelist);
+   }
 
free_domain_mem(domain);
 }
-- 
2.20.1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn

2019-08-14 Thread Jason Gunthorpe
On Tue, Aug 06, 2019 at 08:15:45PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> radeon is using a device global hash table to track what mmu_notifiers
> have been registered on struct mm. This is better served with the new
> get/put scheme instead.
> 
> radeon has a bug where it was not blocking notifier release() until all
> the BO's had been invalidated. This could result in a use after free of
> pages the BOs. This is tied into a second bug where radeon left the
> notifiers running endlessly even once the interval tree became
> empty. This could result in a use after free with module unload.
> 
> Both are fixed by changing the lifetime model, the BOs exist in the
> interval tree with their natural lifetimes independent of the mm_struct
> lifetime using the get/put scheme. The release runs synchronously and just
> does invalidate_start across the entire interval tree to create the
> required DMA fence.
> 
> Additions to the interval tree after release are already impossible as
> only current->mm is used during the add.
> 
> Signed-off-by: Jason Gunthorpe 
>  drivers/gpu/drm/radeon/radeon.h|   3 -
>  drivers/gpu/drm/radeon/radeon_device.c |   2 -
>  drivers/gpu/drm/radeon/radeon_drv.c|   2 +
>  drivers/gpu/drm/radeon/radeon_mn.c | 157 ++---
>  4 files changed, 38 insertions(+), 126 deletions(-)

AMD team: Are you OK with this patch?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 hmm 04/11] misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct

2019-08-14 Thread Jason Gunthorpe
On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote:
> Looks good,
> 
> Reviewed-by: Christoph Hellwig 

Dimitri, are you OK with this patch?

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-14 Thread Robin Murphy

On 11/08/2019 09:05, Christoph Hellwig wrote:

We still treat devices without a DMA mask as defaulting to 32-bits for
both mask, but a few releases ago we've started warning about such
cases, as they require special cases to work around this sloppyness.
Add a dma_mask field to struct platform_object so that we can initialize


s/object/device/


the dma_mask pointer in struct device and initialize both masks to
32-bits by default.  Architectures can still override this in
arch_setup_pdev_archdata if needed.

Note that the code looks a little odd with the various conditionals
because we have to support platform_device structures that are
statically allocated.


This would be a good point to also get rid of the long-standing bodge in 
platform_device_register_full().



Signed-off-by: Christoph Hellwig 
---
  drivers/base/platform.c | 15 +--
  include/linux/platform_device.h |  1 +
  2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index ec974ba9c0c4..b216fcb0a8af 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -264,6 +264,17 @@ struct platform_object {
char name[];
  };
  
+static void setup_pdev_archdata(struct platform_device *pdev)


Bikeshed: painting the generic DMA API properties as "archdata" feels a 
bit off-target :/



+{
+   if (!pdev->dev.coherent_dma_mask)
+   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   if (!pdev->dma_mask)
+   pdev->dma_mask = DMA_BIT_MASK(32);
+   if (!pdev->dev.dma_mask)
+   pdev->dev.dma_mask = >dma_mask;
+   arch_setup_pdev_archdata(pdev);


AFAICS m68k's implementation of that arch hook becomes entirely 
redundant after this change, so may as well go. That would just leave 
powerpc's actual archdata, which at a glance looks like it could 
probably be cleaned up with not *too* much trouble.


Robin.


+};
+
  /**
   * platform_device_put - destroy a platform device
   * @pdev: platform device to free
@@ -310,7 +321,7 @@ struct platform_device *platform_device_alloc(const char 
*name, int id)
pa->pdev.id = id;
device_initialize(>pdev.dev);
pa->pdev.dev.release = platform_device_release;
-   arch_setup_pdev_archdata(>pdev);
+   setup_pdev_archdata(>pdev);
}
  
  	return pa ? >pdev : NULL;

@@ -512,7 +523,7 @@ EXPORT_SYMBOL_GPL(platform_device_del);
  int platform_device_register(struct platform_device *pdev)
  {
device_initialize(>dev);
-   arch_setup_pdev_archdata(pdev);
+   setup_pdev_archdata(pdev);
return platform_device_add(pdev);
  }
  EXPORT_SYMBOL_GPL(platform_device_register);
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 9bc36b589827..a2abde2aef25 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -24,6 +24,7 @@ struct platform_device {
int id;
boolid_auto;
struct device   dev;
+   u64 dma_mask;
u32 num_resources;
struct resource *resource;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-14 Thread Corentin Labbe
Hello

Since lot of release (at least since 4.19), I hit the following error message:
DMA-API: cacheline tracking ENOMEM, dma-debug disabled

After hitting that, I try to check who is creating so many DMA mapping and see:
cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
  6 ahci
257 e1000e
  6 ehci-pci
   5891 nouveau
 24 uhci_hcd

Does nouveau having this high number of DMA mapping is normal ?

Regards
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-14 Thread Will Deacon
Hi Yong Wu,

Sorry, but I'm still deeply confused by this patch.

On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> 
> In the mt2712 and mt8173, it's called "4GB mode", the physical address
> is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> is remapped to high address from 0x1__ to 0x1__, the
> bit32 is always enabled. thus, in the M4U, we always enable the bit9
> for all PTEs which means to enable bit32 of physical address. Here is
> the detailed remap relationship in the "4GB mode":
> CPU PA ->HW PA
> 0x4000_  0x1_4000_ (Add bit32)
> 0x8000_  0x1_8000_ ...
> 0xc000_  0x1_c000_ ...
> 0x1__0x1__ (No change)

So in this example, there are no PAs below 0x4000_ yet you later
add code to deal with that:

> + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/
> + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> + paddr |= BIT_ULL(32);

Why? Mainline currently doesn't do anything like this for the "4gb mode"
support as far as I can tell. In fact, we currently unconditionally set
bit 32 in the physical address returned by iova_to_phys() which wouldn't
match your CPU PAs listed above, so I'm confused about how this is supposed
to work.

The way I would like this quirk to work is that the io-pgtable code
basically sets bit 9 in the pte when bit 32 is set in the physical address,
and sets bit 4 in the pte when bit 33 is set in the physical address. It
would then do the opposite when converting a pte to a physical address.

That way, your driver can call the page table code directly with the high
addresses and we don't have to do any manual offsetting or range checking
in the page table code.

Please can you explain to me why the diff below doesn't work on top of
this series? I'm happy to chat on IRC if you think it would be easier,
because I have a horrible feeling that we've been talking past each other
and I'd like to see this support merged for 5.4.

Will

--->8

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index ab12ef5f8b03..d8d84617c822 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int 
lvl,
arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
 
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) {
-   if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS)
+   if (paddr & BIT_ULL(32))
pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
if (paddr & BIT_ULL(33))
pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
@@ -206,17 +206,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int 
lvl,
mask = ARM_V7S_LVL_MASK(lvl);
 
paddr = pte & mask;
-   if (cfg->oas == 32 || !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT))
-   return paddr;
 
-   if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
-   paddr |= BIT_ULL(33);
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) {
+   if (pte & ARM_V7S_ATTR_MTK_PA_BIT32)
+   paddr |= BIT_ULL(32);
+   if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
+   paddr |= BIT_ULL(33);
+   }
 
-   /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/
-   if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
-   paddr |= BIT_ULL(32);
-   else if (pte & ARM_V7S_ATTR_MTK_PA_BIT32)
-   paddr |= BIT_ULL(32);
return paddr;
 }
 
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d5b9454352fd..3ae54dedede0 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -286,7 +286,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
dom->cfg.oas = 32;
else if (data->enable_4GB)
-   dom->cfg.oas = ARM_V7S_MTK_4GB_OAS;
+   dom->cfg.oas = 33;
else
dom->cfg.oas = 34;
 
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 27337395bd42..a2a52c349fe4 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -113,8 +113,6 @@ struct io_pgtable_cfg {
};
 };
 
-#define ARM_V7S_MTK_4GB_OAS33
-
 /**
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *


[GIT PULL] dma mapping fixes for 5.3-rc

2019-08-14 Thread Christoph Hellwig
The following changes since commit 451577f3e3a9bf1861218641dbbf98e214e77851:

  Merge tag 'kbuild-fixes-v5.3-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild 
(2019-08-09 20:31:04 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-4

for you to fetch changes up to 33dcb37cef741294b481f4d889a465b8091f11bf:

  dma-mapping: fix page attributes for dma_mmap_* (2019-08-10 19:52:45 +0200)


dma-mapping fixes for 5.3-rc

 - fix the handling of the bus_dma_mask in dma_get_required_mask, which
   caused a regression in this merge window (Lucas Stach)
 - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me)
 - fix dma_mmap_coherent to not cause page attribute mismatches on
   coherent architectures like x86 (me)


Christoph Hellwig (2):
  dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING
  dma-mapping: fix page attributes for dma_mmap_*

Lucas Stach (1):
  dma-direct: don't truncate dma_required_mask to bus addressing 
capabilities

 arch/arm/mm/dma-mapping.c|  4 +---
 arch/arm64/mm/dma-mapping.c  |  4 +---
 arch/powerpc/Kconfig |  1 -
 arch/powerpc/kernel/Makefile |  3 +--
 arch/powerpc/kernel/dma-common.c | 17 -
 drivers/iommu/dma-iommu.c|  6 +++---
 include/linux/dma-noncoherent.h  | 13 +
 kernel/dma/direct.c  | 10 +-
 kernel/dma/mapping.c | 19 ++-
 kernel/dma/remap.c   |  2 +-
 10 files changed, 39 insertions(+), 40 deletions(-)
 delete mode 100644 arch/powerpc/kernel/dma-common.c
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v5.3-rc4

2019-08-14 Thread Joerg Roedel
Hi Linus,

The following changes since commit e21a712a9685488f5ce80495b37b9fdbe96c230d:

  Linux 5.3-rc3 (2019-08-04 18:40:12 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.3-rc4

for you to fetch changes up to 3a18844dcf89e636b2d0cbf577e3963b0bcb6d23:

  iommu/vt-d: Fix possible use-after-free of private domain (2019-08-09 
17:35:25 +0200)


IOMMU Fixes for Linux v5.3-rc4

Including:

- A couple more fixes for the Intel VT-d driver for bugs
  introduced during the recent conversion of this driver to use
  IOMMU core default domains.

- Fix for common dma-iommu code to make sure MSI mappings happen
  in the correct domain for a device.

- Fix a corner case in the handling of sg-lists in dma-iommu
  code that might cause dma_length to be truncated.

- Mark a switch as fall-through in arm-smmu code.


Anders Roxell (1):
  iommu/arm-smmu: Mark expected switch fall-through

Lu Baolu (4):
  iommu/vt-d: Detach domain when move device out of group
  iommu/vt-d: Correctly check format of page table in debugfs
  iommu/vt-d: Detach domain before using a private one
  iommu/vt-d: Fix possible use-after-free of private domain

Robin Murphy (2):
  iommu/dma: Handle MSI mappings separately
  iommu/dma: Handle SG length overflow better

 drivers/iommu/arm-smmu-v3.c |  4 ++--
 drivers/iommu/dma-iommu.c   | 19 +++
 drivers/iommu/intel-iommu-debugfs.c |  2 +-
 drivers/iommu/intel-iommu.c | 11 +--
 4 files changed, 23 insertions(+), 13 deletions(-)

Please pull.

Thanks,

Joerg


[PATCH 2/2] microblaze: use the generic dma coherent remap allocator

2019-08-14 Thread Christoph Hellwig
This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.

Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon.  It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.

Signed-off-by: Christoph Hellwig 
---
 arch/microblaze/Kconfig |   1 +
 arch/microblaze/mm/consistent.c | 152 +---
 2 files changed, 5 insertions(+), 148 deletions(-)

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index a0d749c309f3..e477896fbae6 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -17,6 +17,7 @@ config MICROBLAZE
select TIMER_OF
select CLONE_BACKWARDS3
select COMMON_CLK
+   select DMA_DIRECT_REMAP if MMU
select GENERIC_ATOMIC64
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_DEVICES
diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c
index 1a859e8b58c2..0e0f733eb846 100644
--- a/arch/microblaze/mm/consistent.c
+++ b/arch/microblaze/mm/consistent.c
@@ -4,43 +4,16 @@
  * Copyright (C) 2010 Michal Simek 
  * Copyright (C) 2010 PetaLogix
  * Copyright (C) 2005 John Williams 
- *
- * Based on PowerPC version derived from arch/arm/mm/consistent.c
- * Copyright (C) 2001 Dan Malek (dma...@jlc.net)
- * Copyright (C) 2000 Russell King
  */
 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
+#include 
 
 void arch_dma_prep_coherent(struct page *page, size_t size)
 {
@@ -84,126 +57,9 @@ void *cached_kernel_address(void *ptr)
return (void *)(addr & ~UNCACHED_SHADOW_MASK);
 }
 #else /* CONFIG_MMU */
-void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   gfp_t gfp, unsigned long attrs)
-{
-   unsigned long order, vaddr;
-   void *ret;
-   unsigned int i, err = 0;
-   struct page *page, *end;
-   phys_addr_t pa;
-   struct vm_struct *area;
-   unsigned long va;
-
-   if (in_interrupt())
-   BUG();
-
-   /* Only allocate page size areas. */
-   size = PAGE_ALIGN(size);
-   order = get_order(size);
-
-   vaddr = __get_free_pages(gfp | __GFP_ZERO, order);
-   if (!vaddr)
-   return NULL;
-
-   /*
-* we need to ensure that there are no cachelines in use,
-* or worse dirty in this area.
-*/
-   arch_dma_prep_coherent(virt_to_page((unsigned long)vaddr), size);
-
-   /* Allocate some common virtual space to map the new pages. */
-   area = get_vm_area(size, VM_ALLOC);
-   if (!area) {
-   free_pages(vaddr, order);
-   return NULL;
-   }
-   va = (unsigned long) area->addr;
-   ret = (void *)va;
-
-   /* This gives us the real physical address of the first page. */
-   *dma_handle = pa = __virt_to_phys(vaddr);
-
-   /*
-* free wasted pages.  We skip the first page since we know
-* that it will have count = 1 and won't require freeing.
-* We also mark the pages in use as reserved so that
-* remap_page_range works.
-*/
-   page = virt_to_page(vaddr);
-   end = page + (1 << order);
-
-   split_page(page, order);
-
-   for (i = 0; i < size && err == 0; i += PAGE_SIZE) {
-   /* MS: This is the whole magic - use cache inhibit pages */
-   err = map_page(va + i, pa + i, _PAGE_KERNEL | _PAGE_NO_CACHE);
-
-   SetPageReserved(page);
-   page++;
-   }
-
-   /* Free the otherwise unused pages. */
-   while (page < end) {
-   __free_page(page);
-   page++;
-   }
-
-   if (err) {
-   free_pages(vaddr, order);
-   return NULL;
-   }
-
-   return ret;
-}
-
-static pte_t *consistent_virt_to_pte(void *vaddr)
-{
-   unsigned long addr = (unsigned long)vaddr;
-
-   return pte_offset_kernel(pmd_offset(pgd_offset_k(addr), addr), addr);
-}
-
-long arch_dma_coherent_to_pfn(struct device *dev, void *vaddr,
-   dma_addr_t dma_addr)
+static int __init atomic_pool_init(void)
 {
-   pte_t *ptep = consistent_virt_to_pte(vaddr);
-
-   if (pte_none(*ptep) || !pte_present(*ptep))
-   return 0;
-
-   return pte_pfn(*ptep);
-}
-
-/*
- * free page(s) as defined by the above mapping.
- */
-void arch_dma_free(struct device *dev, size_t size, void *vaddr,
-   dma_addr_t dma_addr, unsigned long attrs)
-{
-   struct page *page;
-
-   if 

[PATCH 1/2] microblaze/nommu: use the generic uncached segment support

2019-08-14 Thread Christoph Hellwig
Stop providing our own arch alloc/free hooks for nommu platforms and
just expose the segment offset and use the generic dma-direct
allocator.

Signed-off-by: Christoph Hellwig 
---
 arch/microblaze/Kconfig |  2 +
 arch/microblaze/mm/consistent.c | 93 +++--
 2 files changed, 43 insertions(+), 52 deletions(-)

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index d411de05b628..a0d749c309f3 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -5,9 +5,11 @@ config MICROBLAZE
select ARCH_NO_SWAP
select ARCH_HAS_BINFMT_FLAT if !MMU
select ARCH_HAS_DMA_COHERENT_TO_PFN if MMU
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select ARCH_HAS_UNCACHED_SEGMENT if !MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_WANT_IPC_PARSE_VERSION
diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c
index bc7042209c57..1a859e8b58c2 100644
--- a/arch/microblaze/mm/consistent.c
+++ b/arch/microblaze/mm/consistent.c
@@ -42,21 +42,48 @@
 #include 
 #include 
 
-#ifndef CONFIG_MMU
-/* I have to use dcache values because I can't relate on ram size */
-# define UNCACHED_SHADOW_MASK (cpuinfo.dcache_high - cpuinfo.dcache_base + 1)
-#endif
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+   phys_addr_t paddr = page_to_phys(page);
+
+   flush_dcache_range(paddr, paddr + size);
+}
 
+#ifndef CONFIG_MMU
 /*
- * Consistent memory allocators. Used for DMA devices that want to
- * share uncached memory with the processor core.
- * My crufty no-MMU approach is simple. In the HW platform we can optionally
- * mirror the DDR up above the processor cacheable region.  So, memory accessed
- * in this mirror region will not be cached.  It's alloced from the same
- * pool as normal memory, but the handle we return is shifted up into the
- * uncached region.  This will no doubt cause big problems if memory allocated
- * here is not also freed properly. -- JW
+ * Consistent memory allocators. Used for DMA devices that want to share
+ * uncached memory with the processor core.  My crufty no-MMU approach is
+ * simple.  In the HW platform we can optionally mirror the DDR up above the
+ * processor cacheable region.  So, memory accessed in this mirror region will
+ * not be cached.  It's alloced from the same pool as normal memory, but the
+ * handle we return is shifted up into the uncached region.  This will no doubt
+ * cause big problems if memory allocated here is not also freed properly. -- 
JW
+ *
+ * I have to use dcache values because I can't relate on ram size:
  */
+#ifdef CONFIG_XILINX_UNCACHED_SHADOW
+#define UNCACHED_SHADOW_MASK (cpuinfo.dcache_high - cpuinfo.dcache_base + 1)
+#else
+#define UNCACHED_SHADOW_MASK 0
+#endif /* CONFIG_XILINX_UNCACHED_SHADOW */
+
+void *uncached_kernel_address(void *ptr)
+{
+   unsigned long addr = (unsigned long)ptr;
+
+   addr |= UNCACHED_SHADOW_MASK;
+   if (addr > cpuinfo.dcache_base && addr < cpuinfo.dcache_high)
+   pr_warn("ERROR: Your cache coherent area is CACHED!!!\n");
+   return (void *)addr;
+}
+
+void *cached_kernel_address(void *ptr)
+{
+   unsigned long addr = (unsigned long)ptr;
+
+   return (void *)(addr & ~UNCACHED_SHADOW_MASK);
+}
+#else /* CONFIG_MMU */
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs)
 {
@@ -64,12 +91,9 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
void *ret;
unsigned int i, err = 0;
struct page *page, *end;
-
-#ifdef CONFIG_MMU
phys_addr_t pa;
struct vm_struct *area;
unsigned long va;
-#endif
 
if (in_interrupt())
BUG();
@@ -86,26 +110,8 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
 * we need to ensure that there are no cachelines in use,
 * or worse dirty in this area.
 */
-   flush_dcache_range(virt_to_phys((void *)vaddr),
-   virt_to_phys((void *)vaddr) + size);
-
-#ifndef CONFIG_MMU
-   ret = (void *)vaddr;
-   /*
-* Here's the magic!  Note if the uncached shadow is not implemented,
-* it's up to the calling code to also test that condition and make
-* other arranegments, such as manually flushing the cache and so on.
-*/
-# ifdef CONFIG_XILINX_UNCACHED_SHADOW
-   ret = (void *)((unsigned) ret | UNCACHED_SHADOW_MASK);
-# endif
-   if ((unsigned int)ret > cpuinfo.dcache_base &&
-   (unsigned int)ret < cpuinfo.dcache_high)
-   pr_warn("ERROR: Your cache coherent area is CACHED!!!\n");
+   arch_dma_prep_coherent(virt_to_page((unsigned 

convert microblaze to the generic dma remap allocator

2019-08-14 Thread Christoph Hellwig
Hi Michal,

can you take a look at this patch that moves microblaze over to the
generic DMA remap allocator?  I've been trying to slowly get all
architectures over to the generic code, and microblaze is one that
seems very straightfoward to convert.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] sh: use the generic dma coherent remap allocator

2019-08-14 Thread Christoph Hellwig
This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.

Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon.  It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/Kconfig   |  2 ++
 arch/sh/kernel/dma-coherent.c | 57 +--
 2 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 6b1b5941b618..21eefe7c4ba6 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -158,7 +158,9 @@ config DMA_COHERENT
 
 config DMA_NONCOHERENT
def_bool !DMA_COHERENT
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select DMA_DIRECT_REMAP
 
 config PGTABLE_LEVELS
default 3 if X2TLB
diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
index b17514619b7e..2f0e2f2d1f9c 100644
--- a/arch/sh/kernel/dma-coherent.c
+++ b/arch/sh/kernel/dma-coherent.c
@@ -3,60 +3,13 @@
  * Copyright (C) 2004 - 2007  Paul Mundt
  */
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 
-void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   gfp_t gfp, unsigned long attrs)
+void arch_dma_prep_coherent(struct page *page, size_t size)
 {
-   void *ret, *ret_nocache;
-   int order = get_order(size);
-
-   gfp |= __GFP_ZERO;
-
-   ret = (void *)__get_free_pages(gfp, order);
-   if (!ret)
-   return NULL;
-
-   /*
-* Pages from the page allocator may have data present in
-* cache. So flush the cache before using uncached memory.
-*/
-   arch_sync_dma_for_device(dev, virt_to_phys(ret), size,
-   DMA_BIDIRECTIONAL);
-
-   ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size);
-   if (!ret_nocache) {
-   free_pages((unsigned long)ret, order);
-   return NULL;
-   }
-
-   split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order);
-
-   *dma_handle = virt_to_phys(ret);
-   if (!WARN_ON(!dev))
-   *dma_handle -= PFN_PHYS(dev->dma_pfn_offset);
-
-   return ret_nocache;
-}
-
-void arch_dma_free(struct device *dev, size_t size, void *vaddr,
-   dma_addr_t dma_handle, unsigned long attrs)
-{
-   int order = get_order(size);
-   unsigned long pfn = (dma_handle >> PAGE_SHIFT);
-   int k;
-
-   if (!WARN_ON(!dev))
-   pfn += dev->dma_pfn_offset;
-
-   for (k = 0; k < (1 << order); k++)
-   __free_pages(pfn_to_page(pfn + k), 0);
-
-   iounmap(vaddr);
+   __flush_purge_region(page_address(page), size);
 }
 
 void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
@@ -78,3 +31,9 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t 
paddr,
BUG();
}
 }
+
+static int __init atomic_pool_init(void)
+{
+   return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
+}
+postcore_initcall(atomic_pool_init);
-- 
2.20.1



convert sh to the generic dma remap allocator

2019-08-14 Thread Christoph Hellwig
Hi Rich and Yoshinori,

can you take a quick look at this patch that moves sh over to the
generic DMA remap allocator?  I've been trying to slowly get all
architectures over to the generic code, and I'd like merge this one
for 5.4 now that it has been posted a handful of times without any
feedback.


[PATCH 05/10] ia64: Get rid of iommu_pass_through

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

This variable has no users anymore so it can be removed.

Signed-off-by: Joerg Roedel 
---
 arch/ia64/include/asm/iommu.h | 2 --
 arch/ia64/kernel/pci-dma.c| 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 7429a72f3f92..92aceef63710 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -8,10 +8,8 @@
 extern void no_iommu_init(void);
 #ifdef CONFIG_INTEL_IOMMU
 extern int force_iommu, no_iommu;
-extern int iommu_pass_through;
 extern int iommu_detected;
 #else
-#define iommu_pass_through (0)
 #define no_iommu   (1)
 #define iommu_detected (0)
 #endif
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index fe988c49f01c..f5d49cd3fbb0 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -22,8 +22,6 @@ int force_iommu __read_mostly = 1;
 int force_iommu __read_mostly;
 #endif
 
-int iommu_pass_through;
-
 static int __init pci_iommu_init(void)
 {
if (iommu_detected)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/10] iommu: Disable passthrough mode when SME is active

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Using Passthrough mode when SME is active causes certain
devices to use the SWIOTLB bounce buffer. The bounce buffer
code has an upper limit of 256kb for the size of DMA
allocations, which is too small for certain devices and
causes them to fail.

With this patch we enable IOMMU by default when SME is
active in the system, making the default configuration work
for more systems than it does now.

Users that don't want IOMMUs to be enabled still can disable
them with kernel parameters.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 96cc7cc8ab21..4bc9025a9975 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -119,6 +119,11 @@ static int __init iommu_subsys_init(void)
iommu_set_default_passthrough();
else
iommu_set_default_translated();
+
+   if (iommu_default_passthrough() && sme_active()) {
+   pr_info("SME detected - Disabling default IOMMU 
Passthrough\n");
+   iommu_set_default_translated();
+   }
}
 
pr_info("Default domain type: %s %s\n",
-- 
2.17.1



[PATCH 04/10] x86/dma: Get rid of iommu_pass_through

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

This variable has no users anymore. Remove it and tell the
IOMMU code via its new functions about requested DMA modes.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/iommu.h |  1 -
 arch/x86/kernel/pci-dma.c| 11 +++
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index baedab8ac538..b91623d521d9 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -4,7 +4,6 @@
 
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
-extern int iommu_pass_through;
 
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index f62b498b18fb..a6fd479d4a71 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,12 +44,6 @@ int iommu_detected __read_mostly = 0;
  * It is also possible to disable by default in kernel config, and enable with
  * iommu=nopt at boot time.
  */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
-
 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
 
 void __init pci_iommu_alloc(void)
@@ -120,9 +115,9 @@ static __init int iommu_setup(char *p)
swiotlb = 1;
 #endif
if (!strncmp(p, "pt", 2))
-   iommu_pass_through = 1;
+   iommu_set_default_passthrough();
if (!strncmp(p, "nopt", 4))
-   iommu_pass_through = 0;
+   iommu_set_default_translated();
 
gart_parse_options(p);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/10] iommu/amd: Request passthrough mode from IOMMU core

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Get rid of the iommu_pass_through variable and request
passthrough mode via the new iommu core function.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..7434b34d7a94 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -436,7 +436,7 @@ static int iommu_init_device(struct device *dev)
 * invalid address), we ignore the capability for the device so
 * it'll be forced to go into translation mode.
 */
-   if ((iommu_pass_through || !amd_iommu_force_isolation) &&
+   if ((iommu_default_passthrough() || !amd_iommu_force_isolation) &&
dev_is_pci(dev) && pci_iommuv2_capable(to_pci_dev(dev))) {
struct amd_iommu *iommu;
 
@@ -2226,7 +2226,7 @@ static int amd_iommu_add_device(struct device *dev)
 
BUG_ON(!dev_data);
 
-   if (iommu_pass_through || dev_data->iommu_v2)
+   if (dev_data->iommu_v2)
iommu_request_dm_for_dev(dev);
 
/* Domains are initialized for this device - have a look what we ended 
up with */
@@ -2805,7 +2805,7 @@ int __init amd_iommu_init_api(void)
 
 int __init amd_iommu_init_dma_ops(void)
 {
-   swiotlb= (iommu_pass_through || sme_me_mask) ? 1 : 0;
+   swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
iommu_detected = 1;
 
if (amd_iommu_unmap_flush)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/10 v2] Cleanup IOMMU passthrough setting (and disable IOMMU Passthrough when SME is active)

2019-08-14 Thread Joerg Roedel
Hi,

This patch-set started out small to overwrite the default passthrough
setting (through CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y) when SME is active.

But on the way to that Tom reminded me that the current ways to
configure passthrough/no-passthrough modes for IOMMU on x86 is a mess.
So I added a few more patches to clean that up a bit, getting rid of the
iommu_pass_through variable on the way.This information is now kept only
in iommu code, with helpers to change that setting from architecture
code.

And of course this patch-set still disables IOMMU Passthrough mode when
SME is active even when CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y is set.

The reason for that change is that SME with passthrough mode turned out
to be fragile with devices requiring SWIOTLB, mainly because SWIOTLB has
a maximum allocation size of 256kb and a limit overall size of the
bounce buffer.

Therefore having IOMMU in translation mode by default is better when SME
is active on a system.

Please review.

Thanks,

Joerg

Changes since v1:

- Cleaned up the kernel command line parameters to
  configure passthrough/translated mode, getting rid
  of the global iommu_pass_through variable

Joerg Roedel (10):
  iommu: Add helpers to set/get default domain type
  iommu/amd: Request passthrough mode from IOMMU core
  iommu/vt-d: Request passthrough mode from IOMMU core
  x86/dma: Get rid of iommu_pass_through
  ia64: Get rid of iommu_pass_through
  iommu: Remember when default domain type was set on kernel command
line
  iommu: Print default domain type on boot
  iommu: Set default domain type at runtime
  iommu: Disable passthrough mode when SME is active
  Documentation: Update Documentation for iommu.passthrough

 .../admin-guide/kernel-parameters.txt |  2 +-
 arch/ia64/include/asm/iommu.h |  2 -
 arch/ia64/kernel/pci-dma.c|  2 -
 arch/x86/include/asm/iommu.h  |  1 -
 arch/x86/kernel/pci-dma.c | 11 +--
 drivers/iommu/amd_iommu.c |  6 +-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c | 83 +--
 include/linux/iommu.h | 16 
 9 files changed, 101 insertions(+), 24 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/10] iommu/vt-d: Request passthrough mode from IOMMU core

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Get rid of the iommu_pass_through variable and request
passthrough mode via the new iommu core function.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bdaed2da8a55..234bc2b55c59 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3267,7 +3267,7 @@ static int __init init_dmars(void)
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
}
 
-   if (iommu_pass_through)
+   if (iommu_default_passthrough())
iommu_identity_mapping |= IDENTMAP_ALL;
 
 #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
-- 
2.17.1



[PATCH 07/10] iommu: Print default domain type on boot

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Introduce a subsys_initcall for IOMMU code and use it to
print the default domain type at boot.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e1feb4061b8b..233bc22b487e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -93,12 +93,40 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
 static LIST_HEAD(iommu_device_list);
 static DEFINE_SPINLOCK(iommu_device_lock);
 
+/*
+ * Use a function instead of an array here because the domain-type is a
+ * bit-field, so an array would waste memory.
+ */
+static const char *iommu_domain_type_str(unsigned int t)
+{
+   switch (t) {
+   case IOMMU_DOMAIN_BLOCKED:
+   return "Blocked";
+   case IOMMU_DOMAIN_IDENTITY:
+   return "Passthrough";
+   case IOMMU_DOMAIN_UNMANAGED:
+   return "Unmanaged";
+   case IOMMU_DOMAIN_DMA:
+   return "Translated";
+   default:
+   return "Unknown";
+   }
+}
+
+static int __init iommu_subsys_init(void)
+{
+   pr_info("Default domain type: %s\n",
+   iommu_domain_type_str(iommu_def_domain_type));
+
+   return 0;
+}
+subsys_initcall(iommu_subsys_init);
+
 int iommu_device_register(struct iommu_device *iommu)
 {
spin_lock(_device_lock);
list_add_tail(>list, _device_list);
spin_unlock(_device_lock);
-
return 0;
 }
 
-- 
2.17.1



[PATCH 01/10] iommu: Add helpers to set/get default domain type

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Add a couple of functions to allow changing the default
domain type from architecture code and a function for iommu
drivers to request whether the default domain is
passthrough.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 16 
 include/linux/iommu.h | 16 
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..f187e85a074b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2196,6 +2196,22 @@ int iommu_request_dma_domain_for_dev(struct device *dev)
return request_default_domain_for_dev(dev, IOMMU_DOMAIN_DMA);
 }
 
+void iommu_set_default_passthrough(void)
+{
+   iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+}
+
+void iommu_set_default_translated(void)
+{
+   iommu_def_domain_type = IOMMU_DOMAIN_DMA;
+}
+
+bool iommu_default_passthrough(void)
+{
+   return iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY;
+}
+EXPORT_SYMBOL_GPL(iommu_default_passthrough);
+
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 {
const struct iommu_ops *ops = NULL;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fdc355ccc570..58c3e3e5f157 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -413,6 +413,9 @@ extern void iommu_get_resv_regions(struct device *dev, 
struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern int iommu_request_dm_for_dev(struct device *dev);
 extern int iommu_request_dma_domain_for_dev(struct device *dev);
+extern void iommu_set_default_passthrough(void);
+extern void iommu_set_default_translated(void);
+extern bool iommu_default_passthrough(void);
 extern struct iommu_resv_region *
 iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot,
enum iommu_resv_type type);
@@ -694,6 +697,19 @@ static inline int iommu_request_dma_domain_for_dev(struct 
device *dev)
return -ENODEV;
 }
 
+static inline void iommu_set_default_passthrough(void)
+{
+}
+
+static inline void iommu_set_default_translated(void)
+{
+}
+
+static inline bool iommu_default_passthrough(void)
+{
+   return true;
+}
+
 static inline int iommu_attach_group(struct iommu_domain *domain,
 struct iommu_group *group)
 {
-- 
2.17.1



[PATCH 10/10] Documentation: Update Documentation for iommu.passthrough

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

This kernel parameter now takes also effect on X86.

Signed-off-by: Joerg Roedel 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..2d5dfa46e88a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1811,7 +1811,7 @@
  synchronously.
 
iommu.passthrough=
-   [ARM64] Configure DMA to bypass the IOMMU by default.
+   [ARM64, X86] Configure DMA to bypass the IOMMU by 
default.
Format: { "0" | "1" }
0 - Use IOMMU translation for DMA.
1 - Bypass the IOMMU for DMA.
-- 
2.17.1



[PATCH 08/10] iommu: Set default domain type at runtime

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Set the default domain-type at runtime, not at compile-time.
This keeps default domain type setting in one place when we
have to change it at runtime.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 233bc22b487e..96cc7cc8ab21 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,11 +26,8 @@
 
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
-#else
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
-#endif
+
+static unsigned int iommu_def_domain_type __read_mostly;
 static bool iommu_dma_strict __read_mostly = true;
 static u32 iommu_cmd_line __read_mostly;
 
@@ -76,7 +73,7 @@ static void iommu_set_cmd_line_dma_api(void)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
 }
 
-static bool __maybe_unused iommu_cmd_line_dma_api(void)
+static bool iommu_cmd_line_dma_api(void)
 {
return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
 }
@@ -115,8 +112,18 @@ static const char *iommu_domain_type_str(unsigned int t)
 
 static int __init iommu_subsys_init(void)
 {
-   pr_info("Default domain type: %s\n",
-   iommu_domain_type_str(iommu_def_domain_type));
+   bool cmd_line = iommu_cmd_line_dma_api();
+
+   if (!cmd_line) {
+   if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH))
+   iommu_set_default_passthrough();
+   else
+   iommu_set_default_translated();
+   }
+
+   pr_info("Default domain type: %s %s\n",
+   iommu_domain_type_str(iommu_def_domain_type),
+   cmd_line ? "(set via kernel command line)" : "");
 
return 0;
 }
-- 
2.17.1



[PATCH 06/10] iommu: Remember when default domain type was set on kernel command line

2019-08-14 Thread Joerg Roedel
From: Joerg Roedel 

Introduce an extensible concept to remember when certain
configuration settings for the IOMMU code have been set on
the kernel command line.

This will be used later to prevent overwriting these
settings with other defaults.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f187e85a074b..e1feb4061b8b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@ static unsigned int iommu_def_domain_type = 
IOMMU_DOMAIN_IDENTITY;
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
 static bool iommu_dma_strict __read_mostly = true;
+static u32 iommu_cmd_line __read_mostly;
 
 struct iommu_group {
struct kobject kobj;
@@ -68,6 +69,18 @@ static const char * const iommu_group_resv_type_string[] = {
[IOMMU_RESV_SW_MSI] = "msi",
 };
 
+#define IOMMU_CMD_LINE_DMA_API (1 << 0)
+
+static void iommu_set_cmd_line_dma_api(void)
+{
+   iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
+}
+
+static bool __maybe_unused iommu_cmd_line_dma_api(void)
+{
+   return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
+}
+
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
__ATTR(_name, _mode, _show, _store)
@@ -165,6 +178,8 @@ static int __init iommu_set_def_domain_type(char *str)
if (ret)
return ret;
 
+   iommu_set_cmd_line_dma_api();
+
iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA;
return 0;
 }
-- 
2.17.1



Re: [PATCH] iommu/arm-smmu-v3: add nr_ats_masters to avoid unnecessary operations

2019-08-14 Thread Will Deacon
Hi,

I've been struggling with the memory ordering requirements here. More below.

On Thu, Aug 01, 2019 at 08:20:40PM +0800, Zhen Lei wrote:
> When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a
> smmu domain does not contain any ats master, the operations of
> arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain()
> are always executed. This will impact performance, especially in
> multi-core and stress scenarios. For my FIO test scenario, about 8%
> performance reduced.
> 
> In fact, we can use a atomic member to record how many ats masters the
> smmu contains. And check that without traverse the list and check all
> masters one by one in the lock protection.
> 
> Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS")
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/arm-smmu-v3.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index a9a9fabd396804a..1b370d9aca95f94 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -631,6 +631,7 @@ struct arm_smmu_domain {
>  
>   struct io_pgtable_ops   *pgtbl_ops;
>   boolnon_strict;
> + atomic_tnr_ats_masters;
>  
>   enum arm_smmu_domain_stage  stage;
>   union {
> @@ -1531,7 +1532,7 @@ static int arm_smmu_atc_inv_domain(struct 
> arm_smmu_domain *smmu_domain,
>   struct arm_smmu_cmdq_ent cmd;
>   struct arm_smmu_master *master;
>  
> - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
> + if (!atomic_read(_domain->nr_ats_masters))
>   return 0;

This feels wrong to me: the CPU can speculate ahead of time that
'nr_ats_masters' is 0, but we could have a concurrent call to '->attach()'
for an ATS-enabled device. Wouldn't it then be possible for the new device
to populate its ATC as a result of speculative accesses for the mapping that
we're tearing down?

The devices lock solves this problem by serialising invalidation with
'->attach()/->detach()' operations.

John's suggestion of RCU might work better, but I think you'll need to call
synchronize_rcu() between adding yourself to the 'devices' list and enabling
ATS.

What do you think?

>   arm_smmu_atc_inv_to_cmd(ssid, iova, size, );
> @@ -1869,6 +1870,7 @@ static int arm_smmu_enable_ats(struct arm_smmu_master 
> *master)
>   size_t stu;
>   struct pci_dev *pdev;
>   struct arm_smmu_device *smmu = master->smmu;
> + struct arm_smmu_domain *smmu_domain = master->domain;
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>  
>   if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
> @@ -1887,12 +1889,15 @@ static int arm_smmu_enable_ats(struct arm_smmu_master 
> *master)
>   return ret;
>  
>   master->ats_enabled = true;
> + atomic_inc(_domain->nr_ats_masters);

Here, we need to make sure that concurrent invalidation sees the updated
'nr_ats_masters' value before ATS is enabled for the device, otherwise we
could miss an ATC invalidation.

I think the code above gets this guarantee because of the way that ATS is
enabled in the STE, which ensures that we issue invalidation commands before
making the STE 'live'; this has the side-effect of a write barrier before
updating PROD, which I think we also rely on for installing the CD pointer.

Put another way: writes are ordered before a subsequent command insertion.

Do you agree? If so, I'll add a comment because this is subtle and easily
overlooked.

>  static void arm_smmu_disable_ats(struct arm_smmu_master *master)
>  {
>   struct arm_smmu_cmdq_ent cmd;
> + struct arm_smmu_domain *smmu_domain = master->domain;
>  
>   if (!master->ats_enabled || !dev_is_pci(master->dev))
>   return;
> @@ -1901,6 +1906,7 @@ static void arm_smmu_disable_ats(struct arm_smmu_master 
> *master)
>   arm_smmu_atc_inv_master(master, );
>   pci_disable_ats(to_pci_dev(master->dev));
>   master->ats_enabled = false;
> + atomic_dec(_domain->nr_ats_masters);

This part is the other way around: now we need to ensure that we don't
decrement 'nr_ats_masters' until we've disabled ATS. This works for a
number of reasons, none of which are obvious:

  - The control dependency from completing the prior CMD_SYNCs for tearing
down the STE and invalidating the ATC

  - The spinlock handover from the CMD_SYNCs above

  - The writel() when poking PCI configuration space in pci_disable_ats()
happens to be implemented with a write-write barrier

I suppose the control dependency is the most compelling one: we can't let
stores out whilst we're awaiting completion of a CMD_SYNC.

Put another way: writes are ordered after the completion of a prior CMD_SYNC.

But yeah, I need to write this down.

>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> @@ -1915,10 +1921,10 @@ 

Re: [PATCH] iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems

2019-08-14 Thread Joerg Roedel
On Tue, Aug 13, 2019 at 11:58:48AM +0800, Kai-Heng Feng wrote:
> at 23:39, Joerg Roedel  wrote:
> 
> > On Thu, Aug 08, 2019 at 06:17:07PM +0800, Kai-Heng Feng wrote:
> > > Raven Ridge systems may have malfunction touchpad or hang at boot if
> > > incorrect IVRS IOAPIC is provided by BIOS.
> > > 
> > > Users already found correct "ivrs_ioapic=" values, let's put them inside
> > > kernel to workaround buggy BIOS.
> > 
> > Will that still work when a fixed BIOS for these laptops is released?
> 
> Do you mean that we should stop applying these quirks once a BIOS fix is
> confirmed?

My concern is just that these quirks break some systems that don't need
them.

> We can modify the quirk to compare BIOS version, if there’s an unlikely BIOS
> update really fixes the issue.
> Before that happens, I think it’s OK to let the quirks stay this way.

A BIOS version check is not making things better here as it might lock
out systems that need the quirk. I think we can leave it as it for now,
but can you create a new file amd_iommu_quirks.c and move the code
there. And in the struct and function names please make clear that it is
about ivrs-quirks.


Regards,

Joerg


Re: [PATCH] iommu: exynos: Remove __init annotation from exynos_sysmmu_probe()

2019-08-14 Thread Joerg Roedel
On Mon, Aug 12, 2019 at 12:32:46PM +0200, Marek Szyprowski wrote:
> Exynos SYSMMU driver supports deferred probe. It happens when clocks
> needed for this driver are not yet available. Typically next calls to
> driver ->probe() happen before init section is free, but this is not
> really guaranteed. To make if safe, remove __init annotation from
> exynos_sysmmu_probe() function.
> 
> Signed-off-by: Marek Szyprowski 

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 5/8] iommu: Add bounce page APIs

2019-08-14 Thread Joerg Roedel
Hi Lu Baolu,

On Tue, Jul 30, 2019 at 12:52:26PM +0800, Lu Baolu wrote:
> * iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
>   - Map a buffer start at DMA address @addr in bounce page
> manner. For buffer parts that doesn't cross a whole
> minimal IOMMU page, the bounce page policy is applied.
> A bounce page mapped by swiotlb will be used as the DMA
> target in the IOMMU page table. Otherwise, the physical
> address @paddr is mapped instead.
> 
> * iommu_bounce_unmap(dev, addr, size, dir, attrs)
>   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
> page will be torn down after the bounced data get synced.
> 
> * iommu_bounce_sync(dev, addr, size, dir, target)
>   - Synce the bounced data in case the bounce mapped buffer is
> reused.

I don't really get why this API extension is needed for your use-case.
Can't this just be done using iommu_map/unmap operations? Can you please
elaborate a bit why these functions are needed?


Regards,

Joerg


Re: [PATCH v9 00/21] MT8183 IOMMU SUPPORT

2019-08-14 Thread Will Deacon
On Wed, Aug 14, 2019 at 10:18:25AM +0200, Joerg Roedel wrote:
> On Sat, Aug 10, 2019 at 03:58:00PM +0800, Yong Wu wrote:
> > Change notes:
> > v9:
> >1) rebase on v5.3-rc1.
> >2) In v7s, Use oas to implement MTK 4GB mode. It nearly reconstruct the
> >   patch, so I don't keep the R-b.
> 
> Okay, this looks close to being ready, just the io-pgtable patches still
> need review.

On my list for today :) (Today is SMMU day for me. Send coffee.)

Will


Re: [PATCH v9 00/21] MT8183 IOMMU SUPPORT

2019-08-14 Thread Joerg Roedel
On Sat, Aug 10, 2019 at 03:58:00PM +0800, Yong Wu wrote:
> Change notes:
> v9:
>1) rebase on v5.3-rc1.
>2) In v7s, Use oas to implement MTK 4GB mode. It nearly reconstruct the
>   patch, so I don't keep the R-b.

Okay, this looks close to being ready, just the io-pgtable patches still
need review.


Regards,

Joerg