Re: [PATCH 2/4] iommu: Add I/O ASID allocator

2019-09-18 Thread kbuild test robot
Hi Jacob,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190918]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-Introduce-cache_invalidate-API/20190919-072517
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from :0:0:
   include/linux/ioasid.h: In function 'ioasid_set_data':
>> include/linux/ioasid.h:43:10: error: 'ENODEV' undeclared (first use in this 
>> function)
 return -ENODEV;
 ^~
   include/linux/ioasid.h:43:10: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/ENODEV +43 include/linux/ioasid.h

40  
41  static inline int ioasid_set_data(ioasid_t ioasid, void *data)
42  {
  > 43  return -ENODEV;
44  }
45  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCHv6 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-18 Thread Sai Prakash Ranjan

On 2019-09-19 05:55, Bjorn Andersson wrote:

In the transition to this new design we lost the ability to
enable/disable the safe toggle per board, which according to Vivek
would result in some issue with Cheza.

Can you confirm that this is okay? (Or introduce the DT property for
enabling the safe_toggle logic?)



Hmm, I don't remember Vivek telling about any issue on Cheza because of 
this logic.

But I will test this on Cheza and let you know.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation







Re: [PATCHv6 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-18 Thread Sai Prakash Ranjan

On 2019-09-19 06:27, Stephen Boyd wrote:

Quoting Sai Prakash Ranjan (2019-09-17 02:45:04)
diff --git a/drivers/iommu/arm-smmu-impl.c 
b/drivers/iommu/arm-smmu-impl.c

index 3f88cd078dd5..d62da270f430 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -9,7 +9,6 @@

 #include "arm-smmu.h"

-
 static int arm_smmu_gr0_ns(int offset)
 {
switch(offset) {


Why is this hunk still around?


I remember correcting this in previous version but somehow slipped in 
this version. Will correct it.




diff --git a/drivers/iommu/arm-smmu-qcom.c 
b/drivers/iommu/arm-smmu-qcom.c

new file mode 100644
index ..24c071c1d8b0
--- /dev/null
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -0,0 +1,51 @@

[...]
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device 
*smmu)

+{
+   struct qcom_smmu *qsmmu;
+
+   qsmmu = devm_kzalloc(smmu->dev, sizeof(*qsmmu), GFP_KERNEL);
+   if (!qsmmu)
+   return ERR_PTR(-ENOMEM);
+
+   qsmmu->smmu = *smmu;
+
+   qsmmu->smmu.impl = &qcom_smmu_impl;
+   devm_kfree(smmu->dev, smmu);


This copy is interesting but OK I guess cavium does it.



This is from nvidia impl since Robin pointed me at its implementation.


+
+   return &qsmmu->smmu;
+}


Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] Retrieving zPCI specific info with VFIO

2019-09-18 Thread Matthew Rosato
ping

On 9/6/19 8:13 PM, Matthew Rosato wrote:
> Note: These patches by Pierre got lost in the ether a few months back
> as he has been unavailable to carry them forward.  I've made changes
> based upon comments received on his last version.
> 
> We define a new configuration entry for VFIO/PCI, VFIO_PCI_ZDEV
> to configure access to a zPCI region dedicated for retrieving
> zPCI features.
> 
> When the VFIO_PCI_ZDEV feature is configured we initialize
> a new device region, VFIO_REGION_SUBTYPE_ZDEV_CLP, to hold
> the information from the ZPCI device the userland needs to
> give to a guest driving the zPCI function.
> 
> 
> Note that in the current state we do not use the CLP instructions
> to access the firmware but get the information directly from
> the zdev device.
> 
> -This means that the patch 1, "s390: pci: Exporting access to CLP PCI
> function and PCI group" is not used and can be let out of this series
> without denying the good working of the other patches.
> - But we will need this later, eventually in the next iteration
>   to retrieve values not being saved inside the zdev structure.
>   like maxstbl and the PCI supported version
> 
> To share the code with arch/s390/pci/pci_clp.c the original functions
> in pci_clp.c to query PCI functions and PCI functions group are
> modified so that they can be exported.
> 
> A new function clp_query_pci() replaces clp_query_pci_fn() and
> the previous calls to clp_query_pci_fn() and clp_query_pci_fngrp()
> are replaced with calls to zdev_query_pci_fn() and zdev_query_pci_fngrp()
> using a zdev pointer as argument.
> 
> Changes since v3:
> - New patch: define maxstbl
> - Remove CLP_UTIL_STR_LEN references from uapi header
> - Fix broken ifdef CONFIG_VFIO_PCI_ZDEV
> - Change Kconfig option from tristate to bool
> - Remove VFIO_REGION_TYPE_IBM_ZDEV, move VFIO_REGION_SUBTYPE_ZDEV_CLP to a 
> 1014 subtype
> - reject iswrite in .rw callback
> - Remove rw restriction on identical buffer sizes
> - Allow arbitrary sized read
> 
> Pierre Morel (4):
>   s390: pci: Exporting access to CLP PCI function and PCI group
>   s390: pci: Define the maxstbl CLP response entry
>   vfio: zpci: defining the VFIO headers
>   vfio: pci: Using a device region to retrieve zPCI information
> 
>  arch/s390/include/asm/pci.h |  3 ++
>  arch/s390/include/asm/pci_clp.h |  2 +-
>  arch/s390/pci/pci_clp.c | 71 ---
>  drivers/vfio/pci/Kconfig|  7 +++
>  drivers/vfio/pci/Makefile   |  1 +
>  drivers/vfio/pci/vfio_pci.c |  9 
>  drivers/vfio/pci/vfio_pci_private.h | 10 +
>  drivers/vfio/pci/vfio_pci_zdev.c| 85 
> +
>  include/uapi/linux/vfio.h   |  1 +
>  include/uapi/linux/vfio_zdev.h  | 35 +++
>  10 files changed, 189 insertions(+), 35 deletions(-)
>  create mode 100644 drivers/vfio/pci/vfio_pci_zdev.c
>  create mode 100644 include/uapi/linux/vfio_zdev.h
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv6 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-18 Thread Stephen Boyd
Quoting Sai Prakash Ranjan (2019-09-17 02:45:04)
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..d62da270f430 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -9,7 +9,6 @@
>  
>  #include "arm-smmu.h"
>  
> -
>  static int arm_smmu_gr0_ns(int offset)
>  {
> switch(offset) {

Why is this hunk still around?

> diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> new file mode 100644
> index ..24c071c1d8b0
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-qcom.c
> @@ -0,0 +1,51 @@
[...]
> +struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> +   struct qcom_smmu *qsmmu;
> +
> +   qsmmu = devm_kzalloc(smmu->dev, sizeof(*qsmmu), GFP_KERNEL);
> +   if (!qsmmu)
> +   return ERR_PTR(-ENOMEM);
> +
> +   qsmmu->smmu = *smmu;
> +
> +   qsmmu->smmu.impl = &qcom_smmu_impl;
> +   devm_kfree(smmu->dev, smmu);

This copy is interesting but OK I guess cavium does it.

> +
> +   return &qsmmu->smmu;
> +}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv6 2/3] firmware/qcom_scm: Add scm call to handle smmu errata

2019-09-18 Thread Stephen Boyd
Quoting Sai Prakash Ranjan (2019-09-17 02:45:03)
> From: Vivek Gautam 
> 
> Qcom's smmu-500 needs to toggle wait-for-safe sequence to
> handle TLB invalidation sync's.
> Few firmwares allow doing that through SCM interface.
> Add API to toggle wait for safe from firmware through a
> SCM call.
> 
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: Sai Prakash Ranjan 
> ---

Reviewed-by: Stephen Boyd 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv6 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-18 Thread Stephen Boyd
Quoting Bjorn Andersson (2019-09-18 17:25:01)
> On Tue 17 Sep 02:45 PDT 2019, Sai Prakash Ranjan wrote:
> 
> > From: Vivek Gautam 
> > 
> > There are other boards such as cheza whose bootloaders don't enable this
> > logic. Such boards don't implement callbacks to handle the specific SCM
> > call so disabling this logic for such boards will be a no-op.
> > 
[...]
> > diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> > new file mode 100644
> > index ..24c071c1d8b0
> > --- /dev/null
> > +++ b/drivers/iommu/arm-smmu-qcom.c
> > @@ -0,0 +1,51 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2019, The Linux Foundation. All rights reserved.
> > + */
> > +
> > +#include 
> > +
> > +#include "arm-smmu.h"
> > +
> > +struct qcom_smmu {
> > + struct arm_smmu_device smmu;
> > +};
> > +
> > +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
> > +{
> > + int ret;
> > +
> > + arm_mmu500_reset(smmu);
> > +
> > + /*
> > +  * To address performance degradation in non-real time clients,
> > +  * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
> > +  * such as MTP and db845, whose firmwares implement secure monitor
> > +  * call handlers to turn on/off the wait-for-safe logic.
> > +  */
> > + ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
> 
> In the transition to this new design we lost the ability to
> enable/disable the safe toggle per board, which according to Vivek
> would result in some issue with Cheza.
> 
> Can you confirm that this is okay? (Or introduce the DT property for
> enabling the safe_toggle logic?)
> 

I can test this on Cheza. Not sure if anything will happen but it's
worth a shot.



Re: [PATCHv6 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call

2019-09-18 Thread Stephen Boyd
Quoting Sai Prakash Ranjan (2019-09-17 02:45:02)
> From: Vivek Gautam 
> 
> There are scnenarios where drivers are required to make a
> scm call in atomic context, such as in one of the qcom's
> arm-smmu-500 errata [1].
> 
> [1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/
>   tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n4842")
> 
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: Sai Prakash Ranjan 
> ---

Reviewed-by: Stephen Boyd 



Re: [PATCHv6 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-18 Thread Bjorn Andersson
On Tue 17 Sep 02:45 PDT 2019, Sai Prakash Ranjan wrote:

> From: Vivek Gautam 
> 
> Add reset hook for sdm845 based platforms to turn off
> the wait-for-safe sequence.
> 
> Understanding how wait-for-safe logic affects USB and UFS performance
> on MTP845 and DB845 boards:
> 
> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> to address under-performance issues in real-time clients, such as
> Display, and Camera.
> On receiving an invalidation requests, the SMMU forwards SAFE request
> to these clients and waits for SAFE ack signal from real-time clients.
> The SAFE signal from such clients is used to qualify the start of
> invalidation.
> This logic is controlled by chicken bits, one for each - MDP (display),
> IFE0, and IFE1 (camera), that can be accessed only from secure software
> on sdm845.
> 
> This configuration, however, degrades the performance of non-real time
> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> logic enabled the hardware tries to throttle non-real time clients while
> waiting for SAFE ack signals from real-time clients.
> 
> On mtp845 and db845 devices, with wait-for-safe logic enabled by the
> bootloaders we see degraded performance of USB and UFS when kernel
> enables the smmu stage-1 translations for these clients.
> Turn off this wait-for-safe logic from the kernel gets us back the perf
> of USB and UFS devices until we re-visit this when we start seeing perf
> issues on display/camera on upstream supported SDM845 platforms.
> The bootloaders on these boards implement secure monitor callbacks to
> handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
> logic can be toggled.
> 
> There are other boards such as cheza whose bootloaders don't enable this
> logic. Such boards don't implement callbacks to handle the specific SCM
> call so disabling this logic for such boards will be a no-op.
> 
> This change is inspired by the downstream change from Patrick Daly
> to address performance issues with display and camera by handling
> this wait-for-safe within separte io-pagetable ops to do TLB
> maintenance. So a big thanks to him for the change and for all the
> offline discussions.
> 
> Without this change the UFS reads are pretty slow:
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> real0m 22.39s
> user0m 0.00s
> sys 0m 0.01s
> 
> With this change they are back to rock!
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> 300+0 records in
> 300+0 records out
> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> real0m 1.03s
> user0m 0.00s
> sys 0m 0.54s
> 
> Signed-off-by: Vivek Gautam 
> Reviewed-by: Robin Murphy 
> Signed-off-by: Sai Prakash Ranjan 
> ---
>  drivers/iommu/Makefile|  2 +-
>  drivers/iommu/arm-smmu-impl.c |  6 +++--
>  drivers/iommu/arm-smmu-qcom.c | 51 +++
>  drivers/iommu/arm-smmu.h  |  3 +++
>  4 files changed, 59 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/iommu/arm-smmu-qcom.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index a2729aadd300..2816e49a8c46 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>  obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
>  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
>  obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
> -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
> +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..d62da270f430 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -9,7 +9,6 @@
>  
>  #include "arm-smmu.h"
>  
> -
>  static int arm_smmu_gr0_ns(int offset)
>  {
>   switch(offset) {
> @@ -109,7 +108,7 @@ static struct arm_smmu_device 
> *cavium_smmu_impl_init(struct arm_smmu_device *smm
>  #define ARM_MMU500_ACR_S2CRB_TLBEN   (1 << 10)
>  #define ARM_MMU500_ACR_SMTNMB_TLBEN  (1 << 8)
>  
> -static int arm_mmu500_reset(struct arm_smmu_device *smmu)
> +int arm_mmu500_reset(struct arm_smmu_device *smmu)
>  {
>   u32 reg, major;
>   int i;
> @@ -170,5 +169,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
> "calxeda,smmu-secure-config-access"))
>   smmu->impl = &calxeda_impl;
>  
> + if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
> + return qcom_smmu_impl_init(smmu);
> +
>   return smmu;
>  }
> diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> new file mode 100

[PATCH 4/4] iommu: Introduce guest PASID bind function

2019-09-18 Thread Jacob Pan
Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explaination.

.-.  .---.
|   vIOMMU|  | Guest process mm, FL only |
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |  GP
'-'
Guest
--| Shadow |--- GP->HP* -
  vv  |
Host  v
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.-.
| |   |Set SL to GPA-HPA|
| |   '-'
'-'

Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables
 - GP = Guest PASID
 - HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu Yi L 
---
 drivers/iommu/iommu.c  | 20 
 include/linux/iommu.h  | 22 +
 include/uapi/linux/iommu.h | 59 ++
 3 files changed, 101 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e27dec2d39b8..5523c035abb9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1620,6 +1620,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, 
struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
+int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+  struct device *dev, struct iommu_gpasid_bind_data 
*data)
+{
+   if (unlikely(!domain->ops->sva_bind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_bind_gpasid(domain, dev, data);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+ioasid_t pasid)
+{
+   if (unlikely(!domain->ops->sva_unbind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_unbind_gpasid(dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cf8b504966b0..0440312db86a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define IOMMU_READ (1 << 0)
@@ -230,6 +231,8 @@ struct iommu_sva_ops {
  * @page_response: handle page request response
  * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @sva_bind_gpasid: bind guest pasid and mm
+ * @sva_unbind_gpasid: unbind guest pasid and mm
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -294,6 +297,10 @@ struct iommu_ops {
 struct iommu_page_response *msg);
int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
struct iommu_cache_invalidate_info *inv_info);
+   int (*sva_bind_gpasid)(struct iommu_domain *domain,
+   struct device *dev, struct iommu_gpasid_bind_data 
*data);
+
+   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
unsigned long pgsize_bitmap;
 };
@@ -401,6 +408,10 @@ extern void iommu_detach_device(struct iommu_domain 
*domain,
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
  struct device *dev,
  struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev, struct iommu_gpasid_bind_data *data);
+extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -950,6 +961,17 @@ iom

[PATCH 1/4] iommu: Introduce cache_invalidate API

2019-09-18 Thread Jacob Pan
From: Yi L Liu 

In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.

Since the invalidation data can be obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Yi L Liu 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  |  10 +
 include/linux/iommu.h  |  14 ++
 include/uapi/linux/iommu.h | 110 +
 3 files changed, 134 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..e27dec2d39b8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1610,6 +1610,16 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   if (unlikely(!domain->ops->cache_invalidate))
+   return -ENODEV;
+
+   return domain->ops->cache_invalidate(domain, dev, inv_info);
+}
+EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fdc355ccc570..cf8b504966b0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -228,6 +228,7 @@ struct iommu_sva_ops {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
+ * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -291,6 +292,8 @@ struct iommu_ops {
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
 struct iommu_page_response *msg);
+   int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
+   struct iommu_cache_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -395,6 +398,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -937,6 +943,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
+static inline int
+iommu_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index fc00c5d4741b..f3e96214df8e 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -152,4 +152,114 @@ struct iommu_page_response {
__u32   code;
 };
 
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+   IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
+   IOMMU_INV_GRANU_PASID,  /* PASID-selective invalidation */
+   IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
+   IOMMU_INV_GRANU_NR, /* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the 
invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether only th

[PATCH 2/4] iommu: Add I/O ASID allocator

2019-09-18 Thread Jacob Pan
From: Jean-Philippe Brucker 

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/Kconfig  |   4 ++
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 151 +
 include/linux/ioasid.h |  47 +++
 4 files changed, 203 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd8cb3c..0ade8a031c09 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,10 @@
 config IOMMU_IOVA
tristate
 
+# The IOASID library may also be used by non-IOMMU_API users
+config IOASID
+   tristate
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
bool
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..011429e00598 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index ..6fbea76a47cf
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ioasid_data {
+   ioasid_t id;
+   struct ioasid_set *set;
+   void *private;
+   struct rcu_head rcu;
+};
+
+static DEFINE_XARRAY_ALLOC(ioasid_xa);
+
+/**
+ * ioasid_set_data - Set private data for an allocated ioasid
+ * @ioasid: the ID to set data
+ * @data:   the private data
+ *
+ * For IOASID that is already allocated, private data can be set
+ * via this API. Future lookup can be done via ioasid_find.
+ */
+int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+   struct ioasid_data *ioasid_data;
+   int ret = 0;
+
+   xa_lock(&ioasid_xa);
+   ioasid_data = xa_load(&ioasid_xa, ioasid);
+   if (ioasid_data)
+   rcu_assign_pointer(ioasid_data->private, data);
+   else
+   ret = -ENOENT;
+   xa_unlock(&ioasid_xa);
+
+   /*
+* Wait for readers to stop accessing the old private data, so the
+* caller can free it.
+*/
+   if (!ret)
+   synchronize_rcu();
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_set_data);
+
+/**
+ * ioasid_alloc - Allocate an IOASID
+ * @set: the IOASID set
+ * @min: the minimum ID (inclusive)
+ * @max: the maximum ID (inclusive)
+ * @private: data private to the caller
+ *
+ * Allocate an ID between @min and @max. The @private pointer is stored
+ * internally and can be retrieved with ioasid_find().
+ *
+ * Return: the allocated ID on success, or %INVALID_IOASID on failure.
+ */
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+ void *private)
+{
+   ioasid_t id;
+   struct ioasid_data *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return INVALID_IOASID;
+
+   data->set = set;
+   data->private = private;
+
+   if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
+   pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+   goto exit_free;
+   }
+   data->id = id;
+
+   return id;
+exit_free:
+   kfree(data);
+   return INVALID_IOASID;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc);
+
+/**
+ * ioasid_free - Free an IOASID
+ * @ioasid: the ID to remove
+ */
+void ioasid_free(ioasid_t ioasid)
+{
+   struct ioasid_data

[PATCH 3/4] iommu/ioasid: Add custom allocators

2019-09-18 Thread Jacob Pan
IOASID allocation may rely on platform specific methods. One use case is
that when running in the guest, in order to obtain system wide global
IOASIDs, emulated allocation interface is needed to communicate with the
host. Here we call these platform specific allocators custom allocators.

Custom IOASID allocators can be registered at runtime and take precedence
over the default XArray allocator. They have these attributes:

- provides platform specific alloc()/free() functions with private data.
- allocation results lookup are not provided by the allocator, lookup
  request must be done by the IOASID framework by its own XArray.
- allocators can be unregistered at runtime, either fallback to the next
  custom allocator or to the default allocator.
- custom allocators can share the same set of alloc()/free() helpers, in
  this case they also share the same IOASID space, thus the same XArray.
- switching between allocators requires all outstanding IOASIDs to be
  freed unless the two allocators share the same alloc()/free() helpers.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2019/4/26/462
---
 drivers/iommu/ioasid.c | 301 +++--
 include/linux/ioasid.h |  28 +
 2 files changed, 319 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 6fbea76a47cf..5b6ead4b07d6 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -17,7 +17,254 @@ struct ioasid_data {
struct rcu_head rcu;
 };
 
-static DEFINE_XARRAY_ALLOC(ioasid_xa);
+/*
+ * struct ioasid_allocator_data - Internal data structure to hold information
+ * about an allocator. There are two types of allocators:
+ *
+ * - Default allocator always has its own XArray to track the IOASIDs 
allocated.
+ * - Custom allocators may share allocation helpers with different private 
data.
+ *   Custom allocators share the same helper functions also share the same
+ *   XArray.
+ * Rules:
+ * 1. Default allocator is always available, not dynamically registered. This 
is
+ *to prevent race conditions with early boot code that want to register
+ *custom allocators or allocate IOASIDs.
+ * 2. Custom allocators take precedence over the default allocator.
+ * 3. When all custom allocators sharing the same helper functions are
+ *unregistered (e.g. due to hotplug), all outstanding IOASIDs must be
+ *freed.
+ * 4. When switching between custom allocators sharing the same helper
+ *functions, outstanding IOASIDs are preserved.
+ * 5. When switching between custom allocator and default allocator, all 
IOASIDs
+ *must be freed to ensure unadulterated space for the new allocator.
+ *
+ * @ops:   allocator helper functions and its data
+ * @list:  registered custom allocators
+ * @slist: allocators share the same ops but different data
+ * @flags: attributes of the allocator
+ * @xa xarray holds the IOASID space
+ * @users  number of allocators sharing the same ops and XArray
+ */
+struct ioasid_allocator_data {
+   struct ioasid_allocator_ops *ops;
+   struct list_head list;
+   struct list_head slist;
+#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */
+   unsigned long flags;
+   struct xarray xa;
+   refcount_t users;
+};
+
+static DEFINE_SPINLOCK(ioasid_allocator_lock);
+static LIST_HEAD(allocators_list);
+
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque);
+static void default_free(ioasid_t ioasid, void *opaque);
+
+static struct ioasid_allocator_ops default_ops = {
+   .alloc = default_alloc,
+   .free = default_free,
+};
+
+static struct ioasid_allocator_data default_allocator = {
+   .ops = &default_ops,
+   .flags = 0,
+   .xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC),
+};
+
+static struct ioasid_allocator_data *active_allocator = &default_allocator;
+
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque)
+{
+   ioasid_t id;
+
+   if (xa_alloc(&default_allocator.xa, &id, opaque, XA_LIMIT(min, max), 
GFP_ATOMIC)) {
+   pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+   return INVALID_IOASID;
+   }
+
+   return id;
+}
+
+static void default_free(ioasid_t ioasid, void *opaque)
+{
+   struct ioasid_data *ioasid_data;
+
+   ioasid_data = xa_erase(&default_allocator.xa, ioasid);
+   kfree_rcu(ioasid_data, rcu);
+}
+
+/* Allocate and initialize a new custom allocator with its helper functions */
+static struct ioasid_allocator_data *ioasid_alloc_allocator(struct 
ioasid_allocator_ops *ops)
+{
+   struct ioasid_allocator_data *ia_data;
+
+   ia_data = kzalloc(sizeof(*ia_data), GFP_ATOMIC);
+   if (!ia_data)
+   return NULL;
+
+   xa_init_flags(&ia_data->xa, XA_FLAGS_ALLOC);
+   INIT_LIST_HEAD(&ia_data->slist);
+   ia_data->flags |= IOASID_ALLOCATOR_CUSTOM;
+  

[PATCH 0/4] User API for nested shared virtual address (SVA)

2019-09-18 Thread Jacob Pan
This set consists of IOMMU APIs to support SVA in the guest, a.k.a nested
SVA. As the complete SVA support is complex, we break down the enabling
effort into three stages:
1. PCI device direct assignment
2. Fault handling, especially page request service support
3. Mediated device assignment

Each stage includes common API and vendor specific IOMMU driver changes. This
series is the common uAPI for stage #1. It is intended to build consensus on
the interface which all vendors reply on.

This series is extracted from the complete stage1 set which includes VT-d code.
https://lkml.org/lkml/2019/8/15/951

Changes:
 - Use spinlock instead of mutex to protect ioasid custom allocators. This is
   to support callers in atomic context
 - Added more padding to guest PASID bind data for future extensions, suggested
   by Joerg.
After much thinking, I did not do name change from PASID to IOASID in the uAPI,
considering we have been using PASID in the rest of uAPIs. IOASID will remain
used within the kernel.

For more discussions lead to this series, checkout LPC 2019 VFIO/IOMMU/PCI
microconference materials.
https://linuxplumbersconf.org/event/4/sessions/66/#20190909

Jacob Pan (2):
  iommu/ioasid: Add custom allocators
  iommu: Introduce guest PASID bind function

Jean-Philippe Brucker (1):
  iommu: Add I/O ASID allocator

Yi L Liu (1):
  iommu: Introduce cache_invalidate API

 drivers/iommu/Kconfig  |   4 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 432 +
 drivers/iommu/iommu.c  |  30 
 include/linux/ioasid.h |  75 
 include/linux/iommu.h  |  36 
 include/uapi/linux/iommu.h | 169 ++
 7 files changed, 747 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] MIPS: remove support for DMA_ATTR_WRITE_COMBINE

2019-09-18 Thread Maciej W. Rozycki
On Wed, 7 Aug 2019, Christoph Hellwig wrote:

> Mips uses the KSEG1 kernel memory segment to map dma coherent
> allocations for non-coherent devices as uncacheable, and does not have
> any kind of special support for DMA_ATTR_WRITE_COMBINE in the allocation
> path.  Thus supporting DMA_ATTR_WRITE_COMBINE in dma_mmap_attrs will
> lead to multiple mappings with different caching attributes.

 FYI, AFAIK _CACHE_UNCACHED_ACCELERATED (where supported) is effectively 
write-combine.  Though IIUC someone would have to wire it in first.

  Maciej


Re: DMA_ATTR_WRITE_COMBINE on mips

2019-09-18 Thread Maciej W. Rozycki
Hi Christoph,

> [I hope the imgtec address still works, but maybe the mips folks know
> if it moved to mips]

 Alex left Imagination long before the transition to the interim MIPS 
company.

> you added DMA_ATTR_WRITE_COMBINE support in dma_mmap_attrs to mips
> in commit 8c172467be36f7c9591e59b647e4cd342ce2ef41
> ("MIPS: Add implementation of dma_map_ops.mmap()"), but that commit
> only added the support in mmap, not in dma_alloc_attrs.  This means
> the memory is now used in kernel space through KSEG1, and thus uncached,
> while for userspace mappings through dma_mmap_* pgprot_writebombine
> is used, which creates a write combine mapping, which on some MIPS CPUs
> sets the _CACHE_UNCACHED_ACCELERATED pte bit instead of the
> _CACHE_UNCACHED one.  I know at least on arm, powerpc and x86 such
> mixed page cachability attributes can cause pretty severe problems.
> Are they ok on mips?

 The uncached accelerated mode is implementation-specific, so you won't 
find its definition in the architecture, however the original R1 
implementation explicitly documents[1] interactions between bus accesses 
using the two modes (essentially a _CACHE_UNCACHED store acts as a barrier 
for any outstanding _CACHE_UNCACHED_ACCELERATED stores; for loads the 
modes are equivalent), so that's clearly supported.

 I've glanced over the interAptiv manual[2] too and it seems to define the 
caching modes similarly.

>  Or was the DMA_ATTR_WRITE_COMBINE supported
> unintended and not correct and we should remove it?

 I don't know, so regrettably I can't comment on this.

References:

[1] "MIPS R1 Microprocessor", Version 2.0, MIPS Technologies, Inc., 
January 29, 1997



[2] "MIPS32 interAptiv Multiprocessing System Software User's Manual", 
Imagination Technologies Ltd., Document Number: MD00904, Revision 
02.01, June 15, 2016

  Maciej


[PATCH 0/4] iommu/arm-smmu: Remove arm_smmu_flush_ops

2019-09-18 Thread Robin Murphy
Hi all,

Off the back of Will's iommu_flush_ops work, here's an initial followup
to replace the temporary solution in arm-smmu with a full conversion.
Removing teh extra layer of indirection should generally make things a
good bit more efficient, and rather more readable to boot.

Robin.


Robin Murphy (4):
  iommu/arm-smmu: Remove .tlb_inv_range indirection
  iommu/arm-smmu: Remove "leaf" indirection
  iommu/arm-smmu: Move .tlb_sync method to implementation
  iommu/arm-smmu: Remove arm_smmu_flush_ops

 drivers/iommu/arm-smmu.c | 178 +--
 drivers/iommu/arm-smmu.h |  11 +--
 2 files changed, 97 insertions(+), 92 deletions(-)

-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu/arm-smmu: Remove .tlb_inv_range indirection

2019-09-18 Thread Robin Murphy
Fill in 'native' iommu_flush_ops callbacks for all the
arm_smmu_flush_ops variants, and clear up the remains of the previous
.tlb_inv_range abstraction.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 110 ++-
 drivers/iommu/arm-smmu.h |   2 -
 2 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c3ef0cc8f764..f2b81b1ce224 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -312,7 +312,7 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
 }
 
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
- size_t granule, bool leaf, void *cookie)
+ size_t granule, void *cookie, bool leaf)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -342,7 +342,7 @@ static void arm_smmu_tlb_inv_range_s1(unsigned long iova, 
size_t size,
 }
 
 static void arm_smmu_tlb_inv_range_s2(unsigned long iova, size_t size,
- size_t granule, bool leaf, void *cookie)
+ size_t granule, void *cookie, bool leaf)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -362,14 +362,63 @@ static void arm_smmu_tlb_inv_range_s2(unsigned long iova, 
size_t size,
} while (size -= granule);
 }
 
+static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, size_t size,
+size_t granule, void *cookie)
+{
+   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, false);
+   arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size,
+size_t granule, void *cookie)
+{
+   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, true);
+   arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_add_page_s1(struct iommu_iotlb_gather *gather,
+unsigned long iova, size_t granule,
+void *cookie)
+{
+   arm_smmu_tlb_inv_range_s1(iova, granule, granule, cookie, true);
+}
+
+static void arm_smmu_tlb_inv_walk_s2(unsigned long iova, size_t size,
+size_t granule, void *cookie)
+{
+   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie, false);
+   arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_inv_leaf_s2(unsigned long iova, size_t size,
+size_t granule, void *cookie)
+{
+   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie, true);
+   arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_add_page_s2(struct iommu_iotlb_gather *gather,
+unsigned long iova, size_t granule,
+void *cookie)
+{
+   arm_smmu_tlb_inv_range_s2(iova, granule, granule, cookie, true);
+}
+
+static void arm_smmu_tlb_inv_any_s2_v1(unsigned long iova, size_t size,
+  size_t granule, void *cookie)
+{
+   arm_smmu_tlb_inv_context_s2(cookie);
+}
 /*
  * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears
  * almost negligible, but the benefit of getting the first one in as far ahead
  * of the sync as possible is significant, hence we don't just make this a
- * no-op and set .tlb_sync to arm_smmu_tlb_inv_context_s2() as you might think.
+ * no-op and call arm_smmu_tlb_inv_context_s2() from .iotlb_sync as you might
+ * think.
  */
-static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
-size_t granule, bool leaf, void 
*cookie)
+static void arm_smmu_tlb_add_page_s2_v1(struct iommu_iotlb_gather *gather,
+   unsigned long iova, size_t granule,
+   void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -380,66 +429,33 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long 
iova, size_t size,
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
 }
 
-static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
-
-   ops->tlb_inv_range(iova, size, granule, false, cookie);
-   ops->tlb_sync(cookie);
-}
-
-static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
-
-  

[PATCH 3/4] iommu/arm-smmu: Move .tlb_sync method to implementation

2019-09-18 Thread Robin Murphy
With the .tlb_sync interface no longer exposed directly to io-pgtable,
strip away the remains of that abstraction layer. Retain the callback
in spirit, though, by transforming it into an implementation override
for the low-level sync routine itself, for which we will have at least
one user.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 33 +++--
 drivers/iommu/arm-smmu.h |  3 ++-
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b5b4cd4cae19..cc3b7517458d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -244,6 +244,9 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
*smmu, int page,
unsigned int spin_cnt, delay;
u32 reg;
 
+   if (smmu->impl && unlikely(smmu->impl->tlb_sync))
+   return smmu->impl->tlb_sync(smmu, page, sync, status);
+
arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
@@ -268,9 +271,8 @@ static void arm_smmu_tlb_sync_global(struct arm_smmu_device 
*smmu)
spin_unlock_irqrestore(&smmu->global_sync_lock, flags);
 }
 
-static void arm_smmu_tlb_sync_context(void *cookie)
+static void arm_smmu_tlb_sync_context(struct arm_smmu_domain *smmu_domain)
 {
-   struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
unsigned long flags;
 
@@ -280,13 +282,6 @@ static void arm_smmu_tlb_sync_context(void *cookie)
spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
 }
 
-static void arm_smmu_tlb_sync_vmid(void *cookie)
-{
-   struct arm_smmu_domain *smmu_domain = cookie;
-
-   arm_smmu_tlb_sync_global(smmu_domain->smmu);
-}
-
 static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
@@ -297,7 +292,7 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
wmb();
arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
  ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
-   arm_smmu_tlb_sync_context(cookie);
+   arm_smmu_tlb_sync_context(smmu_domain);
 }
 
 static void arm_smmu_tlb_inv_context_s2(void *cookie)
@@ -439,7 +434,6 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops 
= {
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s1,
.tlb_add_page   = arm_smmu_tlb_add_page_s1,
},
-   .tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
@@ -449,7 +443,6 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v2 = {
.tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s2,
.tlb_add_page   = arm_smmu_tlb_add_page_s2,
},
-   .tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
@@ -459,7 +452,6 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v1 = {
.tlb_flush_leaf = arm_smmu_tlb_inv_any_s2_v1,
.tlb_add_page   = arm_smmu_tlb_add_page_s2_v1,
},
-   .tlb_sync   = arm_smmu_tlb_sync_vmid,
 };
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
@@ -1229,11 +1221,16 @@ static void arm_smmu_iotlb_sync(struct iommu_domain 
*domain,
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-   if (smmu_domain->flush_ops) {
-   arm_smmu_rpm_get(smmu);
-   smmu_domain->flush_ops->tlb_sync(smmu_domain);
-   arm_smmu_rpm_put(smmu);
-   }
+   if (!smmu)
+   return;
+
+   arm_smmu_rpm_get(smmu);
+   if (smmu->version == ARM_SMMU_V2 ||
+   smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
+   arm_smmu_tlb_sync_context(smmu_domain);
+   else
+   arm_smmu_tlb_sync_global(smmu);
+   arm_smmu_rpm_put(smmu);
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 6edd35ca983c..5032102f05b7 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -306,7 +306,6 @@ enum arm_smmu_domain_stage {
 
 struct arm_smmu_flush_ops {
struct iommu_flush_ops  tlb;
-   void (*tlb_sync)(void *cookie);
 };
 
 struct arm_smmu_domain {
@@ -333,6 +332,8 @@ struct arm_smmu_impl {
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
+int status);
 };
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, in

[PATCH 4/4] iommu/arm-smmu: Remove arm_smmu_flush_ops

2019-09-18 Thread Robin Murphy
Now it's just an empty wrapper.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 40 +---
 drivers/iommu/arm-smmu.h |  6 +-
 2 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index cc3b7517458d..68e8d9d1902b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -427,31 +427,25 @@ static void arm_smmu_tlb_add_page_s2_v1(struct 
iommu_iotlb_gather *gather,
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
 }
 
-static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
-   .tlb = {
-   .tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
-   .tlb_flush_walk = arm_smmu_tlb_inv_walk_s1,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s1,
-   .tlb_add_page   = arm_smmu_tlb_add_page_s1,
-   },
+static const struct iommu_flush_ops arm_smmu_s1_tlb_ops = {
+   .tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk_s1,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s1,
+   .tlb_add_page   = arm_smmu_tlb_add_page_s1,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
-   .tlb = {
-   .tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
-   .tlb_flush_walk = arm_smmu_tlb_inv_walk_s2,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s2,
-   .tlb_add_page   = arm_smmu_tlb_add_page_s2,
-   },
+static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
+   .tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
+   .tlb_flush_walk = arm_smmu_tlb_inv_walk_s2,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf_s2,
+   .tlb_add_page   = arm_smmu_tlb_add_page_s2,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
-   .tlb = {
-   .tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
-   .tlb_flush_walk = arm_smmu_tlb_inv_any_s2_v1,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_any_s2_v1,
-   .tlb_add_page   = arm_smmu_tlb_add_page_s2_v1,
-   },
+static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
+   .tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
+   .tlb_flush_walk = arm_smmu_tlb_inv_any_s2_v1,
+   .tlb_flush_leaf = arm_smmu_tlb_inv_any_s2_v1,
+   .tlb_add_page   = arm_smmu_tlb_add_page_s2_v1,
 };
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
@@ -781,7 +775,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.ias= ias,
.oas= oas,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
-   .tlb= &smmu_domain->flush_ops->tlb,
+   .tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
};
 
@@ -1210,7 +1204,7 @@ static void arm_smmu_flush_iotlb_all(struct iommu_domain 
*domain)
 
if (smmu_domain->flush_ops) {
arm_smmu_rpm_get(smmu);
-   smmu_domain->flush_ops->tlb.tlb_flush_all(smmu_domain);
+   smmu_domain->flush_ops->tlb_flush_all(smmu_domain);
arm_smmu_rpm_put(smmu);
}
 }
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5032102f05b7..ba0f05952dd9 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -304,14 +304,10 @@ enum arm_smmu_domain_stage {
ARM_SMMU_DOMAIN_BYPASS,
 };
 
-struct arm_smmu_flush_ops {
-   struct iommu_flush_ops  tlb;
-};
-
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct arm_smmu_flush_ops *flush_ops;
+   const struct iommu_flush_ops*flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/arm-smmu: Remove "leaf" indirection

2019-09-18 Thread Robin Murphy
Now that the "leaf" flag is no longer part of an external interface,
there's no need to use it to infer a register offset at runtime when
we can just as easily encode the offset directly in its place.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f2b81b1ce224..b5b4cd4cae19 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -312,18 +312,16 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
 }
 
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
- size_t granule, void *cookie, bool leaf)
+ size_t granule, void *cookie, int reg)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-   int reg, idx = cfg->cbndx;
+   int idx = cfg->cbndx;
 
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
wmb();
 
-   reg = leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
-
if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
iova = (iova >> 12) << 12;
iova |= cfg->asid;
@@ -342,16 +340,15 @@ static void arm_smmu_tlb_inv_range_s1(unsigned long iova, 
size_t size,
 }
 
 static void arm_smmu_tlb_inv_range_s2(unsigned long iova, size_t size,
- size_t granule, void *cookie, bool leaf)
+ size_t granule, void *cookie, int reg)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   int reg, idx = smmu_domain->cfg.cbndx;
+   int idx = smmu_domain->cfg.cbndx;
 
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
wmb();
 
-   reg = leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : ARM_SMMU_CB_S2_TLBIIPAS2;
iova >>= 12;
do {
if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
@@ -365,14 +362,16 @@ static void arm_smmu_tlb_inv_range_s2(unsigned long iova, 
size_t size,
 static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, size_t size,
 size_t granule, void *cookie)
 {
-   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, false);
+   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie,
+ ARM_SMMU_CB_S1_TLBIVA);
arm_smmu_tlb_sync_context(cookie);
 }
 
 static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size,
 size_t granule, void *cookie)
 {
-   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, true);
+   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie,
+ ARM_SMMU_CB_S1_TLBIVAL);
arm_smmu_tlb_sync_context(cookie);
 }
 
@@ -380,20 +379,23 @@ static void arm_smmu_tlb_add_page_s1(struct 
iommu_iotlb_gather *gather,
 unsigned long iova, size_t granule,
 void *cookie)
 {
-   arm_smmu_tlb_inv_range_s1(iova, granule, granule, cookie, true);
+   arm_smmu_tlb_inv_range_s1(iova, granule, granule, cookie,
+ ARM_SMMU_CB_S1_TLBIVAL);
 }
 
 static void arm_smmu_tlb_inv_walk_s2(unsigned long iova, size_t size,
 size_t granule, void *cookie)
 {
-   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie, false);
+   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie,
+ ARM_SMMU_CB_S2_TLBIIPAS2);
arm_smmu_tlb_sync_context(cookie);
 }
 
 static void arm_smmu_tlb_inv_leaf_s2(unsigned long iova, size_t size,
 size_t granule, void *cookie)
 {
-   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie, true);
+   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie,
+ ARM_SMMU_CB_S2_TLBIIPAS2L);
arm_smmu_tlb_sync_context(cookie);
 }
 
@@ -401,7 +403,8 @@ static void arm_smmu_tlb_add_page_s2(struct 
iommu_iotlb_gather *gather,
 unsigned long iova, size_t granule,
 void *cookie)
 {
-   arm_smmu_tlb_inv_range_s2(iova, granule, granule, cookie, true);
+   arm_smmu_tlb_inv_range_s2(iova, granule, granule, cookie,
+ ARM_SMMU_CB_S2_TLBIIPAS2L);
 }
 
 static void arm_smmu_tlb_inv_any_s2_v1(unsigned long iova, size_t size,
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu: Implement iommu_put_resv_regions_simple()

2019-09-18 Thread Will Deacon
On Thu, Aug 29, 2019 at 01:17:48PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Implement a generic function for removing reserved regions. This can be
> used by drivers that don't do anything fancy with these regions other
> than allocating memory for them.
> 
> Signed-off-by: Thierry Reding 
> ---
>  drivers/iommu/iommu.c | 19 +++
>  include/linux/iommu.h |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0f585b614657..73a2a6b13507 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2170,6 +2170,25 @@ void iommu_put_resv_regions(struct device *dev, struct 
> list_head *list)
>   ops->put_resv_regions(dev, list);
>  }
>  
> +/**
> + * iommu_put_resv_regions_simple - Reserved region driver helper
> + * @dev: device for which to free reserved regions
> + * @list: reserved region list for device
> + *
> + * IOMMU drivers can use this to implement their .put_resv_regions() callback
> + * for simple reservations. Memory allocated for each reserved region will be
> + * freed. If an IOMMU driver allocates additional resources per region, it is
> + * going to have to implement a custom callback.
> + */
> +void iommu_put_resv_regions_simple(struct device *dev, struct list_head 
> *list)
> +{
> + struct iommu_resv_region *entry, *next;
> +
> + list_for_each_entry_safe(entry, next, list, list)
> + kfree(entry);
> +}
> +EXPORT_SYMBOL(iommu_put_resv_regions_simple);

Can you call this directly from iommu_put_resv_regions() if the function
pointer in ops is NULL? That would save having to plumb the default callback
into a bunch of drivers.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/arm-smmu: Report USF more clearly

2019-09-18 Thread Doug Anderson
Hi,

On Tue, Sep 17, 2019 at 7:45 AM Robin Murphy  wrote:
>
> Although CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT is a welcome tool
> for smoking out inadequate firmware, the failure mode is non-obvious
> and can be confusing for end users. Add some special-case reporting of
> Unidentified Stream Faults to help clarify this particular symptom.
> Since we're adding yet another print to the mix, also break out an
> explicit ratelimit state to make sure everything stays together (and
> reduce the static storage footprint a little).
>
> CC: Douglas Anderson 

nit: Cc, not CC.


> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 21 -
>  drivers/iommu/arm-smmu.h |  2 ++
>  2 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b7cf24402a94..b27020fd6c90 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> @@ -485,6 +486,8 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> *dev)
>  {
> u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
> struct arm_smmu_device *smmu = dev;
> +   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
>
> gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
> gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0);
> @@ -494,11 +497,19 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> *dev)
> if (!gfsr)
> return IRQ_NONE;
>
> -   dev_err_ratelimited(smmu->dev,
> -   "Unexpected global fault, this could be serious\n");
> -   dev_err_ratelimited(smmu->dev,
> -   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
> 0x%08x\n",
> -   gfsr, gfsynr0, gfsynr1, gfsynr2);
> +   if (__ratelimit(&rs)) {
> +   if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> +   (gfsr & sGFSR_USF))
> +   dev_err(smmu->dev,
> +   "Blocked unknown Stream ID 0x%hx; boot with 
> \"arm-smmu.disable_bypass=0\" to allow, but this may have security 
> implications\n",

optional nit: "%#hx" instead of "0x%hx"

Reviewed-by: Douglas Anderson 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] dma-mapping updates for 5.4

2019-09-18 Thread Christoph Hellwig
Hi Linus,

please pull the dma-mapping updates for 5.4.

In addition to the usual Kconfig conflics where you just want to keep
both edits there are a few more interesting merge issues this time:
 
 - most importanly powerpc and microblaze add new callers of
   dma_atomic_pool_init, while this tree marks the function static
   and calls it from a common postcore_initcall().  The trivial
   functions added in powerpc and microblaze adding the calls
   need to be removed for the code to compile.  This will not show up
   as a merge conflict and needs to be dealt with manually!
 - the csky tree has edits close to edits from this tree in
   arch/csky/mm/dma-mapping.c, keep both edits as there are no
   functional conflicts.
 - for ia64 keep the removal of arch/ia64/sn/pci/pci_dma.c from the
   ia64 tree.  The ia64 tree moves sba_dma_ops around a bit.  Keep
   the move and the wiring up of ->mmap and ->get_sgtable from this
   tree.
 - the -mm tree adds a new compound_nr helper that some of the Xen
   code removed in this pull request removes.  Keep the removal from
   this tree.
 - the block tree adds a new helper next to the new block helpers from
   this tree, keep both
 - the arm64 tree removes a __KERNEL__ ifdef from
   arch/arm64/include/asm/dma-mapping.h, which is removed in this tree.
   Keep the removal.

The following changes since commit a55aa89aab90fae7c815b0551b07be37db359d76:

  Linux 5.3-rc6 (2019-08-25 12:01:23 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.4

for you to fetch changes up to c7d9eccb3c1e802c5cbb2a764eb0eb9807d9f12e:

  mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE (2019-09-12 
13:14:09 +0100)


dma-mapping updates for 5.4:

 - add dma-mapping and block layer helpers to take care of IOMMU
   merging for mmc plus subsequent fixups (Yoshihiro Shimoda)
 - rework handling of the pgprot bits for remapping (me)
 - take care of the dma direct infrastructure for swiotlb-xen (me)
 - improve the dma noncoherent remapping infrastructure (me)
 - better defaults for ->mmap, ->get_sgtable and ->get_required_mask (me)
 - cleanup mmaping of coherent DMA allocations (me)
 - various misc cleanups (Andy Shevchenko, me)


Andy Shevchenko (1):
  dma-mapping: fix filename references

Christoph Hellwig (34):
  unicore32: remove the unused pgprot_dmacoherent define
  arm-nommu: remove the unused pgprot_dmacoherent define
  dma-mapping: remove arch_dma_mmap_pgprot
  dma-mapping: make dma_atomic_pool_init self-contained
  arm64: document the choice of page attributes for pgprot_dmacoherent
  MIPS: document mixing "slightly different CCAs"
  dma-mapping: move the dma_get_sgtable API comments from arm to common code
  dma-mapping: explicitly wire up ->mmap and ->get_sgtable
  dma-mapping: add a dma_can_mmap helper
  ALSA: pcm: use dma_can_mmap() to check if a device supports dma_mmap_*
  arm-nommu: call dma_mmap_from_dev_coherent directly
  parisc: don't set ARCH_NO_COHERENT_DMA_MMAP
  dma-mapping: remove CONFIG_ARCH_NO_COHERENT_DMA_MMAP
  dma-mapping: remove dma_{alloc,free,mmap}_writecombine
  dma-mapping: remove dma_release_declared_memory
  dma-mapping: remove the dma_mmap_from_dev_coherent export
  remoteproc: don't allow modular build
  dma-mapping: remove the dma_declare_coherent_memory export
  dma-mapping: provide a better default ->get_required_mask
  vmalloc: lift the arm flag for coherent mappings to common code
  dma-mapping: always use VM_DMA_COHERENT for generic DMA remap
  dma-mapping: introduce a dma_common_find_pages helper
  arm: remove wrappers for the generic dma remap helpers
  xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
  xen/arm: consolidate page-coherent.h
  xen/arm: use dev_is_dma_coherent
  xen/arm: simplify dma_cache_maint
  xen/arm: remove xen_dma_ops
  xen: remove the exports for xen_{create,destroy}_contiguous_region
  swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable
  swiotlb-xen: use the same foreign page check everywhere
  swiotlb-xen: simplify cache maintainance
  swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
  arm64: use asm-generic/dma-mapping.h

Yoshihiro Shimoda (6):
  block: add a helper function to merge the segments
  mmc: queue: use bigger segments if DMA MAP layer can merge the segments
  dma-mapping: introduce dma_get_merge_boundary()
  iommu/dma: add a new dma_map_ops of get_merge_boundary()
  mmc: queue: Fix bigger segments usage
  mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE

 Documentation/DMA-API.txt |  19 ++--
 Documentation/x86/x86_64/boot-options.rst |   2 +-
 arch/Kconfig  

Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-09-18 Thread Christoph Hellwig
On Tue, Sep 17, 2019 at 06:41:02PM +, Lendacky, Thomas wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
> > dev->admin_tagset.timeout = ADMIN_TIMEOUT;
> > dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> > dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
> > -   dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
> > +   dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
> > + BLK_MQ_F_BLOCKING;
> 
> I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required
> to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called
> from a module. Is there a DMA API that could be called to get that info?

The DMA API must support non-blocking calls, and various drivers rely
on that.  So we need to provide that even for the SEV case.  If the
actual blocking can't be made to work we'll need to wire up the DMA
pool in kernel/dma/remap.c for it (and probably move it to separate
file).