RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature

2021-01-06 Thread Krishna Reddy
> I agree that we should let a device driver enable SVA if it supports some 
> form of IOPF. 

Right, The PCIe device has capability to handle  IO page faults on its own when 
ATS translation request response indicates that no valid mapping exist.
SMMU doesn't involve/handle page faults for ATS translation failures here.
Neither the PCIe device nor the SMMU have PRI capability. 

> Perhaps we could extract the IOPF capability from IOMMU_DEV_FEAT_SVA, into a 
> new IOMMU_DEV_FEAT_IOPF feature.
> Device drivers that rely on PRI or stall can first check FEAT_IOPF, then 
> FEAT_SVA, and enable both separately.
> Enabling FEAT_SVA would require FEAT_IOPF enabled if supported. Let me try to 
> write this up.

Looks good to me. Allowing device driver to enable FEAT_IOPF and subsequently 
SVA should work. Thanks!

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature

2020-12-14 Thread Krishna Reddy
Hi Jean,

> +bool arm_smmu_master_sva_supported(struct arm_smmu_master *master) {
> +   if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
> +   return false;
+
> +   /* SSID and IOPF support are mandatory for the moment */
> +   return master->ssid_bits && arm_smmu_iopf_supported(master); }
> +

Tegra Next Gen SOC has arm-smmu-v3 and It doesn't have support for PRI 
interface.
However, PCIe client device has capability to handle the page faults on its own 
when the ATS translation fails.
The PCIe device needs SVA feature enable without PRI interface supported at 
arm-smmu-v3.
At present, the SVA feature enable is allowed only if the smmu/client device 
has PRI support. 
There seem to be no functional reason to make pri_supported as a pre-requisite 
for SVA enable.
Can SVA enable be supported for pri_supported not set case as well? 
Also, It is noticed that SVA  enable on Intel doesn't need pri_supported set. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-14 Thread Krishna Reddy
>> The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support 
>> for BTM.
>> Do you have plan to get your earlier patch to handle invalidate 
>> notifications into upstream sometime soon?
>> Can the dependency on BTM be relaxed with the patch?
>>
>> PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops 
>> https://www.spinics.net/lists/arm-kernel/msg825099.html

>This patch (which should be in 5.11) only takes care of sending ATC 
>invalidations to PCIe endpoints. With this, BTM is still required to 
>invalidate SMMU TLBs.
> However we could enable command-queue invalidation when ARM_SMMU_FEAT_BTM 
> isn't set.
> Invalidations are still a relatively rare event so it may not be outrageously 
> slow. I can add a patch to my tree if you have hardware to test. 

Thanks! We can test the patch on emulation platform. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-09 Thread Krishna Reddy
> > The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support 
> > for BTM.
> > Do you have plan to get your earlier patch to handle invalidate 
> > notifications into upstream sometime soon?

>Is that a limitation of the SMMU implementation, the interconnect or the 
>integration?

It is the limitation of interconnect. The DVM messages don't reach SMMU. 
The BTM bit in SMMU IDR does indicate that it doesn't support BTM. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-09 Thread Krishna Reddy
Hi Jean,
> > Why is BTM mandated for SVA? I couldn't find this requirement in 
> > SMMU spec (Sorry if I missed it or this got discussed earlier). But 
> > if performance is the
> only concern here,
> > is it better just to allow it with a warning rather than limiting 
> > SMMUs without
> BTM?
>
> It's a performance concern and requires to support multiple 
> configurations, but the spec allows it. Are there SMMUs without BTM 
> that need it?

The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support for 
BTM.
Do you have plan to get your earlier patch to handle invalidate notifications 
into upstream sometime soon? 
Can the dependency on BTM be relaxed with the patch?

PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
https://www.spinics.net/lists/arm-kernel/msg825099.html

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-18 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Reviewed-by: Jon Hunter 
Reviewed-by: Rob Herring 
Reviewed-by: Robin Murphy 
Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml   | 25 ++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 93fb9fe068b9..503160a7b9a0 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -44,6 +44,11 @@ properties:
 items:
   - const: marvell,ap806-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -61,7 +66,8 @@ properties:
   - cavium,smmu-v2
 
   reg:
-maxItems: 1
+minItems: 1
+maxItems: 2
 
   '#global-interrupts':
 description: The number of global interrupts exposed by the device.
@@ -144,6 +150,23 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+else:
+  properties:
+reg:
+  maxItems: 1
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-18 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_sm

[PATCH v11 0/5] NVIDIA ARM SMMU Implementation

2020-07-18 Thread Krishna Reddy
Changes in v11:
Addressed Rob comment on DT binding patch to set min/maxItems of reg property 
in else part.
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates.

Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
https://lkml.org/lkml/2020/7/8/57

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  25 +-
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 334 insertions(+), 11 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 49fbb25030265c660de732513f18275d88ff99d3
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-18 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index cdd15ead9bc4..de520115d3df 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2123,10 +2123,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2138,6 +2134,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-18 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 496fd4eafb68..ee2c0ba13a0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16810,8 +16810,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c87d825f651e..f4ff124a1967 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -213,6 +213,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxe

[PATCH v11 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-18 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 19f906de6420..cdd15ead9bc4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-17 Thread Krishna Reddy
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote:
>> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote:
> >> Changes in v10:
>> > Perform SMMU base ioremap before calling implementation init.
>> > Check for Global faults across both ARM MMU-500s during global interrupt.
>> > Check for context faults across all contexts of both ARM MMU-500s during 
>> > context fault interrupt.
> >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
>>
>> Please repost based on my SMMU queue, as this doesn't currently apply.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=
>> for-joerg/arm-smmu/updates

>Any update on this, please? It would be a shame to miss 5.9 now that the 
>patches look alright.

Thanks for pinging.
I have been out of the office this week. Just started working on reposting 
patches.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-10 Thread Krishna Reddy
Thanks Rob. One question on setting "minItems: ". Please see below.

>> +allOf:
>> +  - if:
>> +  properties:
>> +compatible:
>> +  contains:
>> +enum:
>> +  - nvidia,tegra194-smmu
>> +then:
>> +  properties:
>> +reg:
>> +  minItems: 2
>> +  maxItems: 2

>This doesn't work. The main part of the schema already said there's only
>1 reg region. This part is ANDed with that, not an override. You need to add 
>an else clause with 'maxItems: 1' and change the base schema to
>{minItems: 1, maxItems: 2}.

As the earlier version of base schema doesn't have "minItems: " set, should it 
be set to 0 for backward compatibility?  Or can it just be omitted setting in 
base schema as before?

"else" part to set "maxItems: 1" and setting "maxItems: 2" in base schema is 
clear to me.


-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-07 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..ac1f526c3424 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-07 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..d2054178df35 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-07 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054178df35..e03e873d3bca 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-07 Thread Krishna Reddy
Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 328 insertions(+), 10 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-07 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu

[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-07 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c23352059a6b..534cedaf8e55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..f15571d05474 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
+   int page, int offset, u64 val)
+{
+   unsigned int i;
+
+  

RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>>> +items:
>>>>>> +  - enum:
>>>>>> +  - nvdia,tegra194-smmu
>>>>>> +  - const: arm,mmu-500
>>>>
>>>>> Is the fallback compatible appropriate here? If software treats this as a 
>>>>> standard MMU-500 it will only program the first instance (because the 
>>>>> second isn't presented as a separate MMU-500) - is there any way that 
>>>>> isn't going to blow up?
>>>>
>>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>>>> implementation override ensure that both instances are programmed. Isn't 
>>>> it? I am not sure I follow your comment fully.
>> 
>>> The problem is, if for some reason someone had a Tegra194, but only set the 
>>> compatible string to 'arm,mmu-500' it would assume that it was a normal 
>>> arm,mmu-500 and only one instance would be programmed. We always want at 
>>> least 2 of the 3 instances >>programmed and so we should only match 
>>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>>> _mmu500.
>> 
>> In that case, new binding "nvidia,smmu-v2" can be added with data set to 
>> _mmu500 and enumeration would have nvidia,tegra194-smmu and another 
>> variant for next generation SoC in future. 

>I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be 
>something different. I see others have a smmu-v2 but I am not sure if that is 
>legacy. We have an smmu-500 and so that would seem more appropriate.

I tried to use the binding synonymous to other vendors. 
V2 is the architecture version.  MMU-500 is the actual implementation from ARM 
based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as 
nvidia,smmu-500 or similar as well.
Others probably having their own implementation based on V2 arch.

KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>> With shared irq line, the context fault identification is not optimal 
>> already.  Reading all the context banks all the time can be additional mmio 
>> read overhead. But, it may not hurt the real use cases as these happen only 
>> when there are bugs.

>Right, I did ponder the idea of a whole programmatic "request_context_irq" 
>hook that would allow registering the handler for both interrupts with the 
>appropriate context bank and instance data, but since all interrupts are 
>currently unexpected it seems somewhat hard to justify the extra complexity. 
>Obviously we can revisit this in future if you want to start actually doing 
>something with faults like the qcom GPU folks do.

Thanks, I would just avoid making changes to interrupt handlers till it is 
really necessary in future. The current code would just be simple and 
functional with more interrupts when there are multiple faults.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>Yeah, I realised later last night that this probably originated from forking 
>the whole driver downstream. But even then you could have treated the other 
>one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only 
handle one instance of usage. With your implementation hooks design, it should 
be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep 
>things simple. I do like Jon's suggestion of actually enforcing that the 
>number of "reg" regions exactly matches the number expected for the given 
>compatible - I guess for now that means just hard-coding 2 and hoping the 
>hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two 
MMU-500s together as of now. 

-KR 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
 +items:
 +  - enum:
 +  - nvdia,tegra194-smmu
 +  - const: arm,mmu-500
> >
>>> Is the fallback compatible appropriate here? If software treats this as a 
>>> standard MMU-500 it will only program the first instance (because the 
>>> second isn't presented as a separate MMU-500) - is there any way that isn't 
>>> going to blow up?
> >
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>> implementation override ensure that both instances are programmed. Isn't it? 
>> I am not sure I follow your comment fully.

>The problem is, if for some reason someone had a Tegra194, but only set the 
>compatible string to 'arm,mmu-500' it would assume that it was a normal 
>arm,mmu-500 and only one instance would be programmed. We always want at least 
>2 of the 3 instances >programmed and so we should only match 
>'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>_mmu500.

In that case, new binding "nvidia,smmu-v2" can be added with data set to 
_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant 
for next generation SoC in future. 

-KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any chance there could be more than one SMMU faulting by the time we 
>> service the interrupt?

>It certainly seems plausible if the interconnect is automatically 
>load-balancing requests across the SMMU instances - say a driver bug caused a 
>buffer to be unmapped too early, there could be many in-flight accesses to 
>parts of that buffer that aren't all taking the same path and thus could now 
>fault in parallel.
>[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
>buffer/tear down a domain/ ;) ]
>Either way I think it would be easier to reason about if we just handled these 
>like a typical shared interrupt and always checked all the instances.

It would be optimal to check at the same time across all instances. 

>>> +for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>> + idx, 
>>> + inst);
>>> +
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any reason why we don't check all banks?

>As above, we certainly shouldn't bail out without checking the bank for the 
>offending domain across all of its instances, and I guess the way this works 
>means that we would have to iterate all the banks to achieve that.

With shared irq line, the context fault identification is not optimal already.  
Reading all the context banks all the time can be additional mmio read 
overhead. But, it may not hurt the real use cases as these happen only when 
there are bugs. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
>> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> Hmm, there must be a better way to word that to express that it only applies 
> to the sets of SMMUs that must be programmed identically, and not any other 
> independent MMU-500s that might also happen to be in the same SoC.

Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
identically".

>> +items:
>> +  - enum:
>> +  - nvdia,tegra194-smmu
>> +  - const: arm,mmu-500

>Is the fallback compatible appropriate here? If software treats this as a 
>standard MMU-500 it will only program the first instance (because the second 
>isn't presented as a separate MMU-500) - is there any way that isn't going to 
>blow up?

When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
implementation override ensure that both instances are programmed. Isn't it? I 
am not sure I follow your comment fully.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple 
>independent SMMUs, and it's not something that could be configured out or 
>otherwise disabled. Plus I really don't see why you would ever want to force 
>unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it 
>would also waste precious context bank resources and might lead to weird 
>device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support 
multiple SMMU drivers at the same is not possible as one of them(that gets 
probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this 
point, It makes more sense to just get rid of 3rd instance programming in 
arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical 
programming.

-KR



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros

2020-06-30 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same values from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d2054178df357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be618..c7d0122a7c6ca 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-06-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..662c46e16f07d 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 3
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation

2020-06-30 Thread Krishna Reddy
Changes in v9:
Move TLB Timeout and spin count macros to arm-smmu.h header to share with 
implementation.
Set minItems and maxItems for reg property when compatible contains 
nvidia,tegra194-smmu.
Update commit message for NVIDIA implementation patch.
Fail single SMMU instance usage through NVIDIA implementation to limit the 
usage to two or three instances.
Fix checkpatch warnings with --strict checking.

v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 304 ++
 drivers/iommu/arm-smmu.c  |  20 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 349 insertions(+), 6 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of ARM MMU-500s together to interleave IOVA accesses
across them and must be programmed identically.
The third SMMU instance is used as a regular ARM MMU-500 and it
can either be programmed independently or identical to other
two ARM MMU-500s.

This implementation supports programming two or three ARM MMU-500s
identically as per DT config.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 206 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..f15571d05474e 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..5c874912e1c1a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+   struc

[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks

2020-06-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5c874912e1c1a..d279788eab954 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054

RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> The driver intend to support up to 3 instances. It doesn't really mandate 
>> that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> , , ; The reg can have 
>> all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have 
>up to 3 (for Tegra194). So the question is do we have a use-case where we only 
>use 2 and not 3? If not, then it still seems that we should require that all 3 
>are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let 
me fail the one instance case as it can use regular arm smmu implementation and 
don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines 
>the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will 
>get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>OK, well I see what you are saying, but if we intended to support all 3 for 
>Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that 
all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = , , ;
The reg can have all three or less and driver just configures based on reg and 
it works fine.

>It would be better to query the number of SMMUs populated in device-tree and 
>then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to 
support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit 
message.
However, Implementation supports option to configure 3 instances identically 
with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +   unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+ unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +int page, int offset, u32 val) {
>> +unsigned int i;
>> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make 
>the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope 
of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +unsigned int i;

>> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +struct resource *res;
>> +
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +if (!res)
>> +break;

>Currently this driver is only supported for Tegra194 which I understand has 3 
>SMMUs. Therefore, I don't feel that we should fail silently here, I think it 
>is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based 
on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is 
misconfig in DT, SMMU faults would catch issues.

>> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +if (IS_ERR(nvidia_smmu->bases[i]))
>> +return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 1124f0ac1823a..c9423b4199c65 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..3bb0aba15a356 100644
--- a/drivers/iommu/arm-s

[PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-29 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-29 Thread Krishna Reddy
Changes in v8:
Fixed incorrect CB_FSR read issue during context bank fault.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 196 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

RE: [PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,

>> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, 
>> + smmu->numpage + idx);
[...]
>> + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
[...]
>> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);

>It reads FSR of the default inst (1st), but clears the FSR of corresponding 
>inst -- just want to make sure that this is okay and intended.

FSR should be read from corresponding inst. Not from instance 0. 
Let me post updated patch.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
>> + if (!nvidia_smmu->bases[0])
>> + nvidia_smmu->bases[0] = smmu->base;
>> +
>> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); }

>Not critical -- just a nit: why not put the bases[0] in init()?

smmu->base is not available during nvidia_smmu_impl_init() call. It is set 
afterwards in arm-smmu.c.
It can't be avoided without changing the devm_ioremap() and impl_init() call 
order in arm-smmu.c. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-28 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-28 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 101 +++-
 drivers/iommu/arm-smmu.c|  17 +-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index b73c483fa3376..7276bb203ae79 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
@@ -185,7 +283,8 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
}
 
nvidia_smmu->smmu.i

[PATCH v7 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-28 Thread Krishna Reddy
Changes in v7:
Incorporated the review feedback from Nicolin Chen, Robin Murphy and Thierry 
Reding.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v6- https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-28 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 195 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..b73c483fa3376
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-25 Thread Krishna Reddy
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That 
>way we avoid this somewhat odd check here.

NVIDIA haven't made any changes to arm,mmu-500. It is only used in different 
topology.  New model would be mis-leading here.
As suggested by Robin, It can just be moved to end of function.

>> diff --git a/drivers/iommu/arm-smmu-nvidia.c 
>> b/drivers/iommu/arm-smmu-nvidia.c
>I wonder if it would be better to name this arm-smmu-tegra.c to make it 
>clearer that this is for a Tegra chip. We do have regular expressions in 
>MAINTAINERS that catch anything with "tegra" in it to make this easier.
>Also, the nsmmu_ prefix looks somewhat odd here. You already use struct 
>nvidia_smmu as the name of the structure, so why not be consistent and 
>continue to use nvidia_smmu_ as the prefix for function names?
>Or perhaps even use tegra_smmu_ as the prefix to match the filename change I 
>suggested earlier.

Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file 
name as arm-smmu-nvidia.c after the vendor name.  

>> +#define TLB_LOOP_TIMEOUT100 /* 1s! */
>USEC_PER_SEC?

It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for 
tlb_sync implementation.  Can rename it to TLB_LOOP_TIMEOUT_IN_US.


>> +}
>> +dev_err_ratelimited(smmu->dev,
>> +"TLB sync timed out -- SMMU may be deadlocked\n");
>Same here.
>Also, is there anything we can do when this happens?

This is never expected to happen on Silicon. This code and message is reused 
from arm-smmu.c.


>> +#define nsmmu_page(smmu, inst, page) \
>> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
>> +((page) << smmu->pgshift))

>Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in 
>nvidia_smmu_impl_init()? Then this would become just:
>   to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift)
> +
>Maybe add this here to simplify the nsmmu_page() macro above:
>   nsmmu->bases[0] = smmu->base;

This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet 
populated when nvidia_smmu_impl_init() is called.  
Let me look at the alternative place to set it.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-25 Thread Krishna Reddy
>> +  - nvdia,tegra194-smmu-500
>The -500 suffix here seems a bit redundant since there's no other type of SMMU 
>in Tegra194, correct?

Yeah, there is only one type of SMMU supported in T194. It was added to be 
synonymous with mmu-500.  Can be removed.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-04 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a45217..5999b6a770992 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d7

[PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

2020-06-04 Thread Krishna Reddy
>> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes
>> address space '' of expression

Reported-by: kbuild test robot 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5999b6a770992..6348d8dc17fc2 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -248,7 +248,7 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
nsmmu->bases[i] = devm_ioremap_resource(dev, res);
if (IS_ERR(nsmmu->bases[i]))
-   return (struct arm_smmu_device *)nsmmu->bases[i];
+   return ERR_CAST(nsmmu->bases[i]);
nsmmu->num_inst++;
}
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/4] Nvidia Arm SMMUv2 Implementation

2020-06-04 Thread Krishna Reddy
Changes in v6:
Restricted the patch set to driver specific patches.
Fixed the cast warning reported by kbuild test robot.
Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 285 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 431275afdc7155415254aef4bd3816a1b8a2ead0
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-04 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 50659d76976b7..118da0893c964 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27cb..35542df00da72 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..52c84c30f83e4 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..dafc293a45217
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-04 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index e3ef1c69d1326..8f7ffd248f303 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-22 Thread Krishna Reddy
>For the record: I don't think we should apply these because we don't have a 
>good way of testing them. We currently have three problems that prevent us 
>from enabling SMMU on Tegra194:

Out of three issues pointed here, I see that only issue 2) is a real blocker 
for enabling SMMU HW by default in upstream.

>That said, I have tested earlier versions of this patchset on top of my local 
>branch with fixes for the above and they do seem to work as expected.
>So I'll leave it up to the IOMMU maintainers whether they're willing to merge 
>the driver patches as is.
> But I want to clarify that I won't be applying the DTS patches until we've 
> solved all of the above issues and therefore it should be clear that these 
> won't be runtime tested until then.

SMMU driver patches as such are complete and can be used by nvidia with a local 
config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and
Protects the driver patches against kernel changes. This config disable option 
is tested already by Nicolin Chen and me.

Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 
patches) can be merged without DT enable patches? Is it reasonable to merge the 
driver patches alone?

>1) If we enable SMMU support, then the DMA API will automatically try
> to use SMMU domains for allocations. This means that translations
> will happen as soon as a device's IOMMU operations are initialized
> and that is typically a long time (in kernel time at least) before
> a driver is bound and has a chance of configuring the device.

> This causes problems for non-quiesced devices like display
> controllers that the bootloader might have set up to scan out a
> boot splash.

> What we're missing here is a way to:

> a) advertise reserved memory regions for boot splash framebuffers
> b) map reserved memory regions early during SMMU setup

> Patches have been floating on the public mailing lists for b) but
> a) requires changes to the bootloader (both proprietary ones and
> U-Boot for SoCs prior to Tegra194).

This happens if SMMU translations is enabled for display before reserved
 Memory regions issue is fixed. This issue is not a real blocker for SMMU 
enable.


>  2) Even if we don't enable SMMU for a given device (by not hooking up
> the iommus property), with a default kernel configuration we get a
> bunch of faults during boot because the ARM SMMU driver faults by
> default (rather than bypass) for masters which aren't hooked up to
> the SMMU.

> We could work around that by changing the default configuration or
> overriding it on the command-line, but that's not really an option
> because it decreases security and means that Tegra194 won't work
> out-of-the-box.

This is the real issue that blocks enabling SMMU.  The USF faults for devices
that don't have SMMU translations enabled should be fixed or WAR'ed before
SMMU can be enabled. We should look at keeping SID as 0x7F for the devices
that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally.

>  3) We don't properly describe the DMA hierarchy, which causes the DMA
> masks to be improperly set. As a bit of background: Tegra194 has a
> special address bit (bit 39) that causes some swizzling to happen
> within the memory controller. As a result, any I/O virtual address
> that has bit 39 set will cause this swizzling to happen on access.
> The DMA/IOMMU allocator always starts allocating from the top of
> the IOVA space, which means that the first couple of gigabytes of
> allocations will cause most devices to fail because of the
> undesired swizzling that occurs.

> We had an initial patch for SDHCI merged that hard-codes the DMA
> mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However,
> the devices all do support addressing 40 bits and the restriction
> on bit 39 is really a property of the bus rather than a capability
> of the device. This means that we would have to work around this
> for every device driver by adding similar hacks. A better option is
> to properly describe the DMA hierarchy (using dma-ranges) because
> that will then automatically be applied as a constraint on each
> device's DMA mask.

> I have been working on patches to address this, but they are fairly
> involved because they require device tree bindings changes and so
> on.

Dma_mask issue is again outside SMMU driver and as long as the clients with
Dma_mask issue don't have SMMU enabled, it would be fine.
SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care.

>So before we solve all of the above issues we can't really enable SMMU on 
>Tegra194 and hence won't be able to test it. As such we don't know if these 
>patches even work, nor can we validate that they continue to work.
>As such, I don't think there's any use in applying these patches upstream 
>since they 

[PATCH v5 1/5] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-05-21 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ecc0749810b0..0d8c966ecf17 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16560,9 +16560,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27c..35542df00da7 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 74d97a886e93..dcdd513323aa 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..dafc293a4521
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v5 3/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-05-21 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a4521..5999b6a77099 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e622f4e33379..9

[PATCH v5 5/5] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2020-05-21 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f7c4399afb55..706bbb439dcd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -59,6 +59,7 @@ ethernet@249 {
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA194_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -457,6 +458,7 @@ sdmmc1: sdhci@340 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -479,6 +481,7 @@ sdmmc3: sdhci@344 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -506,6 +509,7 @@ sdmmc4: sdhci@346 {
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-21 Thread Krishna Reddy
Changes in v5:
Rebased on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
next

v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (5):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  81 ++
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 8 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 365f8d504da50feaebf826d180113529c9383670
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/5] arm64: tegra: Add DT node for T194 SMMU

2020-05-21 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f4ede86e32b4..f7c4399afb55 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1620,6 +1620,83 @@ pcie@141a {
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu-500";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
pcie_ep@1416 {
compatible = "nvidia,tegra194-pcie-ep", "snps,dw-pcie-ep";
power-domains = < TEGRA194_POWER_DOMAIN_PCIEX4A>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/5] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-05-21 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe47508..78aba7dd5a61 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/6] iommu/arm-smmu: Add global/context fault implementation hooks

2019-10-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 -
 drivers/iommu/arm-smmu.h|   3 ++
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 05d66ac..0ba7abc 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 080af03..829fb4c 100644
--- a

[PATCH v4 5/6] arm64: tegra: Add DT node for T194 SMMU

2019-10-30 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 1e0b54b..6f81e90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1436,6 +1436,83 @@
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
sysram@4000 {
compatible = "nvidia,tegra194-sysram", "mmio-sram";
reg = <0x0 0x4000 0x0 0x5>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/6] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-10-30 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6f81e90..bf8ed7a 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +52,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -413,6 +415,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -435,6 +438,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -462,6 +466,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/6] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-10-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/6] Nvidia Arm SMMUv2 Implementation

2019-10-30 Thread Krishna Reddy
Changes in v4:
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/ 
for-joerg/arm-smmu/updates.
Updated arm-smmu-nvidia.c to use the tlb_sync implementation hook.
Dropped patch that updates arm_smmu_flush_ops for override as it is no longer 
necessary.

v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (6):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 261 +
 drivers/iommu/arm-smmu.c   |  11 +-
 drivers/iommu/arm-smmu.h   |   4 +
 9 files changed, 376 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/6] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-10-30 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 296de2b..0519024 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15972,9 +15972,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f9..7baeade 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..05d66ac
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+

[PATCH v4 4/6] arm64: tegra: Add Memory controller DT node on T194

2019-10-30 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 4c38426..82a02490 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 3c0cf54..1e0b54b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -163,6 +163,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-23 Thread Krishna Reddy
>>https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Thanks Will! Let me rebase my patches on top of this branch and send it out.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-22 Thread Krishna Reddy
Hi Robin,
>>Apologies for crossed wires, but I had a series getting rid of 
>>arm_smmu_flush_ops which was also meant to end up making things a bit easier 
>>for you:

I was looking to rebase on top of your changes first.  Then I read Will's reply 
that said your work is queued for 5.5. 
Let me know if these patches need to rebased on top of iommu/devel or a 
different branch. I can resend the patch set on top of necessary branch.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-10-18 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-18 Thread Krishna Reddy
Changes in v3:
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/ next.
Resolved compile error seen with tegra194.dtsi changes after rebase.

v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (7):
  iommu/arm-smmu: prepare arm_smmu_flush_ops for override
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 287 +
 drivers/iommu/arm-smmu.c   |  27 +-
 drivers/iommu/arm-smmu.h   |   8 +-
 9 files changed, 413 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.7.4



[PATCH v3 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-10-18 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 187 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a69e6db..b61dbda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15974,9 +15974,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 35d1709..4ef8b74 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..ca871dc
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static void nsmmu_tlb_sync(struct

[PATCH v3 5/7] arm64: tegra: Add Memory controller DT node on T194

2019-10-18 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 4c38426..82a02490 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 3c0cf54..1e0b54b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -163,6 +163,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.7.4



[PATCH v3 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-10-18 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 -
 drivers/iommu/arm-smmu.h|   3 ++
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index ca871dc..2a19d41 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -143,6 +143,104 @@ static int nsmmu_init_context(struct arm_smmu_domain 
*smmu_domain)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -150,6 +248,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.init_context = nsmmu_init_context,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fc0b2

[PATCH v3 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-10-18 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6f81e90..bf8ed7a 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +52,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -413,6 +415,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -435,6 +438,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -462,6 +466,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.7.4



[PATCH v3 1/7] iommu/arm-smmu: prepare arm_smmu_flush_ops for override

2019-10-18 Thread Krishna Reddy
Remove const keyword for arm_smmu_flush_ops in arm_smmu_domain
and replace direct references to arm_smmu_tlb_sync* functions with
arm_smmu_flush_ops->tlb_sync().
This is necessary for vendor specific implementations that
need to override arm_smmu_flush_ops in part or full.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 16 
 drivers/iommu/arm-smmu.h |  4 +++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 91af695..fc0b27d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -290,6 +287,8 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
 static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
+
/*
 * The TLBI write may be relaxed, so ensure that PTEs cleared by the
 * current CPU are visible beforehand.
@@ -297,18 +296,19 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
wmb();
arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
  ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
-   arm_smmu_tlb_sync_context(cookie);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_context_s2(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
/* See above */
wmb();
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
-   arm_smmu_tlb_sync_global(smmu);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
@@ -410,7 +410,7 @@ static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather 
*gather,
ops->tlb_inv_range(iova, granule, granule, true, cookie);
 }
 
-static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
+static struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -421,7 +421,7 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops 
= {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -432,7 +432,7 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v2 = {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b19b6ca..b2d6c7f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -207,6 +207,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
@@ -314,7 +316,7 @@ struct arm_smmu_flush_ops {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct arm_smmu_flush_ops *flush_ops;
+   struct arm_smmu_flush_ops   *flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
-- 
2.7.4



[PATCH v3 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-10-18 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 1e0b54b..6f81e90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1436,6 +1436,83 @@
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
sysram@4000 {
compatible = "nvidia,tegra194-sysram", "mmio-sram";
reg = <0x0 0x4000 0x0 0x5>;
-- 
2.7.4



[PATCH v2 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-09-02 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 5ae3bbf..cac3462 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -51,6 +51,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -381,6 +382,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -403,6 +405,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -430,6 +433,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.1.4



[PATCH v2 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-09-02 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/7] Nvidia Arm SMMUv2 Implementation

2019-09-02 Thread Krishna Reddy
Changes in v2:
- Prepare arm_smu_flush_ops for override.
- Remove NVIDIA_SMMUv2 and use ARM_SMMUv2 model as T194 SMMU hasn't modified 
ARM MMU-500.
- Add T194 specific compatible string - "nvidia,tegra194-smmu"
- Remove tlb_sync hook added in v1 and Override arm_smmu_flush_ops->tlb_sync() 
from implementation.
- Register implementation specific context/global fault hooks directly for irq 
handling.
- Update global/context interrupt list in DT and releant fault handling code in 
arm-smmu-nvidia.c.
- Implement reset hook in arm-smmu-nvidia.c to clear irq status and sync tlb.

v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (7):
  iommu/arm-smmu: prepare arm_smmu_flush_ops for override
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 287 +
 drivers/iommu/arm-smmu.c   |  27 +-
 drivers/iommu/arm-smmu.h   |   8 +-
 9 files changed, 413 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-09-02 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 187 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 74e9d9c..c9b802a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15807,9 +15807,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7caad48..556b94c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..ca871dc
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static void nsmmu_tlb_sync(struct

[PATCH v2 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-09-02 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index d906958..5ae3bbf 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1401,6 +1401,83 @@
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
sysram@4000 {
compatible = "nvidia,tegra194-sysram", "mmio-sram";
reg = <0x0 0x4000 0x0 0x5>;
-- 
2.1.4



[PATCH v2 1/7] iommu/arm-smmu: prepare arm_smmu_flush_ops for override

2019-09-02 Thread Krishna Reddy
Remove const keyword for arm_smmu_flush_ops in arm_smmu_domain
and replace direct references to arm_smmu_tlb_sync* functions with
arm_smmu_flush_ops->tlb_sync().
This is necessary for vendor specific implementations that
need to override arm_smmu_flush_ops in part or full.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 16 
 drivers/iommu/arm-smmu.h |  4 +++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5b93c79..16b5c54 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -290,6 +287,8 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
 static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
+
/*
 * The TLBI write may be relaxed, so ensure that PTEs cleared by the
 * current CPU are visible beforehand.
@@ -297,18 +296,19 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
wmb();
arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
  ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
-   arm_smmu_tlb_sync_context(cookie);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_context_s2(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
/* See above */
wmb();
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
-   arm_smmu_tlb_sync_global(smmu);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
@@ -410,7 +410,7 @@ static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather 
*gather,
ops->tlb_inv_range(iova, granule, granule, true, cookie);
 }
 
-static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
+static struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -421,7 +421,7 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops 
= {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -432,7 +432,7 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v2 = {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b19b6ca..b2d6c7f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -207,6 +207,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
@@ -314,7 +316,7 @@ struct arm_smmu_flush_ops {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct arm_smmu_flush_ops *flush_ops;
+   struct arm_smmu_flush_ops   *flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
-- 
2.1.4



[PATCH v2 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-09-02 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 -
 drivers/iommu/arm-smmu.h|   3 ++
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index ca871dc..2a19d41 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -143,6 +143,104 @@ static int nsmmu_init_context(struct arm_smmu_domain 
*smmu_domain)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -150,6 +248,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.init_context = nsmmu_init_context,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16b5c

[PATCH v2 5/7] arm64: tegra: Add Memory controller DT node on T194

2019-09-02 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 62e07e11..4b3441b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index adebbbf..d906958 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
compatible = "nvidia,tegra194";
@@ -130,6 +131,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation

2019-09-02 Thread Krishna Reddy
>>> +ARM_SMMU_MATCH_DATA(nvidia_smmuv2, ARM_SMMU_V2, NVIDIA_SMMUV2);
 
>> The ARM MMU-500 implementation is unmodified.  It is the way the are 
>> integrated and used together(for interleaved accesses) is different from 
>> regular ARM MMU-500.
>> I have added it to get the model number and to be able differentiate the 
>> SMMU implementation in arm-smmu-impl.c.

>In that case, I would rather keep smmu->model representing the MMU-500 
>microarchitecture - 
>since you'll still want to pick up errata workarounds etc. for that - and 
>detect the Tegra integration via an explicit of_device_is_compatible()
> check in arm_smmu_impl_init().

Looks good to me. 

>For comparison, under ACPI we'd probably have to detect integration details by 
>looking at table headers, separately
> from the IORT "Model" field, so I'd prefer if the DT vs. ACPI handling didn't 
> diverge more than necessary.

ACPI support for T194 can be added based on need in subsequent patches. For 
now, I am updating it for DT support.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 3/7] iommu/arm-smmu: Add tlb_sync implementation hook

2019-08-30 Thread Krishna Reddy
>> +if (smmu->impl->tlb_sync) {
>> +smmu->impl->tlb_sync(smmu, page, sync, status);

>What I'd hoped is that rather than needing a hook for this, you could just 
>override smmu_domain->tlb_ops from .init_context to wire up the alternate 
>.sync method directly. That would save this extra level of indirection.

Hi Robin,  overriding tlb_ops->tlb_sync function is not enough here.
There are direct references to arm_smmu_tlb_sync_context(),  
arm_smmu_tlb_sync_global() functions.
In arm-smmu.c.  we can replace these direct references with tlb_ops->tlb_sync() 
function except in one function arm_smmu_device_reset().  
When arm_smmu_device_reset() gets called, domains are not initialized and 
tlb_ops is not available. 
Should we add a new hook for arm_smmu_tlb_sync_global() or make this as a 
responsibility of impl->reset() hook as it gets
called at the same place?

-KR 


RE: [PATCH 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-08-30 Thread Krishna Reddy
>> +static irqreturn_t nsmmu_context_fault_inst(int irq,
>> +struct arm_smmu_device *smmu,
>> +int idx, int inst);

>More of these signed integers that could be unsigned. Also why the need to 
>predeclare this? Can you not just put the definition of the function up here?

The singed integers are based on original function prototype from arm-smmu.c.  
inst can be updated to unsigned.
This is because I was checking context faults from global fault handler as 
well.  This can avoided by using interrupt lines of all the instances across 
global and context irq entries.  Let me update.
 
> + if (smmu->impl->global_fault)
> + return smmu->impl->global_fault(irq, smmu);
>>... and here about the extra level of indirection.

Fixing in next version.

-KR



RE: [PATCH 3/7] iommu/arm-smmu: Add tlb_sync implementation hook

2019-08-30 Thread Krishna Reddy
>Wouldn't it work if you replaced all calls of __arm_smmu_tlb_sync() by
>smmu->impl->tlb_sync() and assign __arm_smmu_tlb_sync() as default for
>devices that don't need to override it? That makes this patch slightly larger, 
>but it saves us one level of indirection.

The tlb_ops->tlb_sync can be overridden directly in arm-smmu-nvidia.c specific 
implementation as pointed by Robin. Will be updating in next patch.


>> +void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
>> + int status);

>Can't page, sync and status all be unsigned?

This is to be uniform with original tlb_sync definition is arm-smmu.c.  Anyway, 
this hook is not necessary as tlb_ops->tlb_sync can be overridden directly.

-KR


RE: [PATCH 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-08-30 Thread Krishna Reddy
>> +smmu: iommu@1200 {
>> +compatible = "nvidia,smmu-v2";

>Should we have a compatibility string like "nvidia,tegra194-smmu" so that we 
>can have other chips with SMMUv2 that could be different?

As pointed by Robin as well, as Nvidia hasn't modified ARM MMU-500 
implementation, we can update it to specific chip based. 
Let me update.

-KR

 


RE: [PATCH 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-08-30 Thread Krishna Reddy
>The number of global interrupts has never been related to the number of 
>context interrupts :/

Yeah,  They are not related. I was trying to use minimum irq entries in the DT 
node as they both would achieve the same functionality.

>Clearly you have one combined interrupt output per SMMU - describing those as 
>one global interrupt and the first two context bank interrupts respectively 
>makes far less sense than calling them 3 global interrupts, not least because 
>the latter is strictly true.

Will update to 3 in next patch to make it better for readability.

-KR


RE: [PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation

2019-08-30 Thread Krishna Reddy
>> +ARM_SMMU_MATCH_DATA(nvidia_smmuv2, ARM_SMMU_V2, NVIDIA_SMMUV2);

> From the previous discussions, I got the impression that other than the 
> 'novel' way they're integrated, the actual SMMU implementations were 
> unmodified Arm MMU-500s. Is that the case, or have I misread something?

The ARM MMU-500 implementation is unmodified.  It is the way the are integrated 
and used together(for interleaved accesses) is different from regular ARM 
MMU-500.
I have added it to get the model number and to be able differentiate the SMMU 
implementation in arm-smmu-impl.c.

-KR

 


RE: [PATCH 2/7] dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2

2019-08-30 Thread Krishna Reddy
>> +"nidia,smmu-v2"
>>   "qcom,smmu-v2"

>I agree with Mikko that the compatible must be at least SoC-specific, but 
>potentially even instance-specific (e.g. "nvidia,tegra194-gpu-smmu")
> depending on how many of these parallel-SMMU configurations might be hiding 
> in current and future SoCs.

I am correcting the spelling mistake pointed by Mikko.  The NVIDIA SMMUv2 
implementation is getting used beyond  Tegra194 SOC.  
To be able to use the smmu compatible string across multiple SOC's, 
"nvidia,smmu-v2" compatible string is chosen.
Are you suggesting to make it soc specific and add another one in future?

-KR
 


RE: [PATCH 3/7] iommu/arm-smmu: Add tlb_sync implementation hook

2019-08-30 Thread Krishna Reddy
>> +for (i = 0; i < to_nsmmu(smmu)->num_inst; i++)

>It might make more sense to make this the innermost loop, i.e.:
for (i = 0; i < nsmmu->num_inst; i++)
reg &= readl_relaxed(nsmmu_page(smmu, i, page)...
>since polling the instances in parallel rather than in series seems like it 
>might be a bit more efficient.

Sync register is programmed at the same time for both instances. The status 
check is serialized.
I can update it to check status of both at the same time.

>> +if (smmu->impl->tlb_sync) {
>> +smmu->impl->tlb_sync(smmu, page, sync, status);

>What I'd hoped is that rather than needing a hook for this, you could just 
>override smmu_domain->tlb_ops from .init_context to wire up the alternate 
>.sync method directly. That would save this extra level of indirection.

With arm_smmu_domain now available in arm-smmu.h,  arm-smmu-nvidia.c can 
directly update the tlb_ops->tlb_sync and avoid indirection.
Will update in next version.

-KR 


RE: [PATCH 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-08-30 Thread Krishna Reddy
>> +if (smmu->impl->global_fault)
>> +return smmu->impl->global_fault(irq, smmu);

>Can't we just register impl->global_fault (if set) instead of 
>arm_smmu_global_fault as the handler when we first set up the IRQs in 
>arm_smmu_device_probe()?
>Ideally we'd do the same for the context banks as well, although we might need 
>an additional hook from which to request the secondary IRQs that the main flow 
>can't accommodate.

When first implemented theis patch, I think there were compile issues in 
accessing struct arm_smmu_domain from arm-smmu-nvidia.c as it was part of 
arm-smmu.c. 
To avoid issues accessing arm_smmu_domain. It did this for context fault and 
did same for global fault for uniformity.
Now, I see that it is part of arm-smmu.h.  Let me update code to register 
implementation hooks directly.  Thanks

-KR

 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-08-30 Thread Krishna Reddy
>> +#global-interrupts = <1>;

>Shouldn't that be 3?

Interrupt line is shared between global and all context faults for each SMMU 
instance.
Nvidia implementation checks for both Global and context faults on each 
interrupt to an SMMU instance. 
It can be either 1 or 3.  If we make it 3, we need to add two more irq entries 
in node for context faults. 
In the future, we can update arm-smmu.c to support shared interrupt line 
between global and all context faults.


 -KR


[PATCH 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-08-29 Thread Krishna Reddy
Add global/context fault hooks to allow Nvidia SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 127 
 drivers/iommu/arm-smmu.c|   6 ++
 drivers/iommu/arm-smmu.h|   4 ++
 3 files changed, 137 insertions(+)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index a429b2c..b2a3c49 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -14,6 +14,10 @@
 
 #define NUM_SMMU_INSTANCES 3
 
+static irqreturn_t nsmmu_context_fault_inst(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst);
+
 struct nvidia_smmu {
struct arm_smmu_device  smmu;
int num_inst;
@@ -87,12 +91,135 @@ static void nsmmu_tlb_sync(struct arm_smmu_device *smmu, 
int page,
nsmmu_tlb_sync_wait(smmu, page, sync, status, i);
 }
 
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, struct arm_smmu_device *smmu)
+{
+   int i;
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line is shared between global and context faults.
+* Check for both type of interrupts on either fault handlers.
+*/
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) {
+   irq_ret = nsmmu_context_fault_inst(irq, smmu, 0, i);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, i);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault_inst(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line shared between global and all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu, idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   break;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq,
+  struct arm_smmu_device *smmu,
+  int cbndx)
+{
+   int i;
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line is shared between g

[PATCH 5/7] arm64: tegra: Add Memory controller DT node on T194

2019-08-29 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 62e07e11..4b3441b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index adebbbf..d906958 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
compatible = "nvidia,tegra194";
@@ -130,6 +131,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS

2019-08-29 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index ad509bb..0496a87 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -51,6 +51,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -381,6 +382,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -403,6 +405,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -430,6 +433,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation

2019-08-29 Thread Krishna Reddy
Add Nvidia SMMUv2 implementation and model info.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |  2 +
 drivers/iommu/Makefile  |  2 +-
 drivers/iommu/arm-smmu-impl.c   |  2 +
 drivers/iommu/arm-smmu-nvidia.c | 97 +
 drivers/iommu/arm-smmu.c|  2 +
 drivers/iommu/arm-smmu.h|  2 +
 6 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 289fb06..b9d59e51 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15785,9 +15785,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a2729aa..7f5489e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..e5e595f 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -162,6 +162,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
case CAVIUM_SMMUV2:
return cavium_smmu_impl_init(smmu);
+   case NVIDIA_SMMUV2:
+   return nvidia_smmu_impl_init(smmu);
default:
break;
}
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..d93ceda
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+#define NUM_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   int num_inst;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+#define to_nsmmu(s)container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nsmmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   int i;
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   int i;
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static const struct arm_smmu_impl nsmmu_impl = {
+   .read_reg = nsmmu_read_reg,
+   .write_reg = nsmmu_write_reg,
+   .read_reg64 = nsmmu_read_reg64,
+   .write_reg64 = nsmmu_write_reg64,
+};
+
+struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   int i;
+   struct nvidia_smmu *nsmmu;
+   struct resource *res;
+   struct device *dev = smmu->dev;
+   struct platform_device *pdev = to_platform_device(smmu->dev);
+
+   nsmmu = devm_kzalloc(smmu->dev, sizeof(*nsmmu), GFP_KERNEL);
+   if (!nsmmu)
+   return ERR_PTR(-ENOMEM);
+
+   nsmmu->smmu = *smmu;
+   /* Instance 0 is ioremapped by arm-smmu.c */
+   nsmmu->num_inst = 1;
+
+   for (i = 1; i < NUM_SMMU_INSTANCES; i++) {
+   res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+   if (!res)
+   break;
+   nsmmu->bases[i] = devm_ioremap_resource(dev, res);
+   if (IS_ERR(nsmmu->bases[i]))
+   return (struct arm_smmu_device *)nsmmu->bases[i];
+

[PATCH 2/7] dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2

2019-08-29 Thread Krishna Reddy
Add binding doc for Nvidia's smmu-v2 implementation.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..0de3759 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,6 +17,7 @@ conditions.
 "arm,mmu-401"
 "arm,mmu-500"
 "cavium,smmu-v2"
+"nidia,smmu-v2"
 "qcom,smmu-v2"
 
   depending on the particular implementation and/or the
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >