RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>Yeah, I realised later last night that this probably originated from forking 
>the whole driver downstream. But even then you could have treated the other 
>one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only 
handle one instance of usage. With your implementation hooks design, it should 
be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep 
>things simple. I do like Jon's suggestion of actually enforcing that the 
>number of "reg" regions exactly matches the number expected for the given 
>compatible - I guess for now that means just hard-coding 2 and hoping the 
>hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two 
MMU-500s together as of now. 

-KR 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Robin Murphy

On 2020-07-01 19:18, Krishna Reddy wrote:

+ * When Linux kernel supports multiple SMMU devices, the SMMU
device +used for + * isochornous HW devices should be added as a
separate ARM MMU-500 +device + * in DT and be programmed
independently for efficient TLB invalidates.



I don't understand the "When" there - the driver has always
supported multiple independent SMMUs, and it's not something that
could be configured out or otherwise disabled. Plus I really don't
see why you would ever want to force unrelated SMMUs to be
>programmed together - beyond the TLB thing mentioned it would also
waste precious context bank resources and might lead to weird
device grouping via false stream ID aliasing, with no obvious
upside at all.


Sorry, I missed this comment. During the initial patches, when the
iommu_ops were different between, support multiple SMMU drivers at
the same is not possible as one of them(that gets probed last)
overwrites the platform bus ops. On revisiting the original issue,
This problem is no longer relevant. At this point, It makes more
sense to just get rid of 3rd instance programming in
arm-smmu-nvidia.c and just limit it to the SMMU instances that need
identical programming.


Yeah, I realised later last night that this probably originated from 
forking the whole driver downstream. But even then you could have 
treated the other one as a separate nsmmu with a single instance ;)


Since it does add a bit of confusion to the code and comments, let's 
just keep things simple. I do like Jon's suggestion of actually 
enforcing that the number of "reg" regions exactly matches the number 
expected for the given compatible - I guess for now that means just 
hard-coding 2 and hoping the hardware folks don't cook up any more of 
these...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple 
>independent SMMUs, and it's not something that could be configured out or 
>otherwise disabled. Plus I really don't see why you would ever want to force 
>unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it 
>would also waste precious context bank resources and might lead to weird 
>device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support 
multiple SMMU drivers at the same is not possible as one of them(that gets 
probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this 
point, It makes more sense to just get rid of 3rd instance programming in 
arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical 
programming.

-KR



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> The driver intend to support up to 3 instances. It doesn't really mandate 
>> that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> , , ; The reg can have 
>> all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have 
>up to 3 (for Tegra194). So the question is do we have a use-case where we only 
>use 2 and not 3? If not, then it still seems that we should require that all 3 
>are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let 
me fail the one instance case as it can use regular arm smmu implementation and 
don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines 
>the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will 
>get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 18:16, Krishna Reddy wrote:
>> OK, well I see what you are saying, but if we intended to support all 3 for 
>> Tegra194, then we should ensure all 3 are initialised correctly.
> 
> The driver intend to support up to 3 instances. It doesn't really mandate 
> that all three instances be present in same DT node.
> Each mmio aperture in "reg" property is an instance here. reg =  size>, , ;
> The reg can have all three or less and driver just configures based on reg 
> and it works fine.

So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we
have up to 3 (for Tegra194). So the question is do we have a use-case
where we only use 2 and not 3? If not, then it still seems that we
should require that all 3 are present.

The other problem I see here is that currently the arm-smmu binding
defines the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe
that this will get caught by the 'dt_binding_check' when we try to
populate the binding.

>> It would be better to query the number of SMMUs populated in device-tree and 
>> then ensure that all are initialised correctly.
> 
> Getting the IORESOURCE_MEM is the way to count the instances driver need to 
> support.  
> In a way, It is already querying through IORESOURCE_MEM here. 

Yes I was wondering that. I think we just need to decide if the 3rd SMMU
is optional or not. The DT binding should detail and min and max supported. 

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>OK, well I see what you are saying, but if we intended to support all 3 for 
>Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that 
all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = , , ;
The reg can have all three or less and driver just configures based on reg and 
it works fine.

>It would be better to query the number of SMMUs populated in device-tree and 
>then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to 
support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit 
message.
However, Implementation supports option to configure 3 instances identically 
with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +   unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+ unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +int page, int offset, u32 val) {
>> +unsigned int i;
>> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make 
>the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope 
of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 17:32, Jon Hunter wrote:
> On 30/06/2020 17:23, Krishna Reddy wrote:
 +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
 +*smmu) {
 +  unsigned int i;
>> 
 +  for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
 +  struct resource *res;
 +
 +  res = platform_get_resource(pdev, IORESOURCE_MEM, i);
 +  if (!res)
 +  break;
>>
>>> Currently this driver is only supported for Tegra194 which I understand has 
>>> 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think 
>>> it is better to return an error if all 3 cannot be initialised.
>>
>> Initialization of all the three SMMU instances is not necessary here.
> 
> That is not what I am saying.
> 
>> The driver can work with all the possible number of instances 1, 2 and 3 
>> based on the DT config though it doesn't make much sense to use it with 1 
>> instance.
>> There is no silent failure here from driver point of view. If there is 
>> misconfig in DT, SMMU faults would catch issues.
> 
> I disagree and you should return a proper error here.

OK, well I see what you are saying, but if we intended to support all 3
for Tegra194, then we should ensure all 3 are initialised correctly.

My concern here is testing, because when things break in upstream I am
usually the one that tracks it down. Not having clear warning/error
messages when something is not initialised as expected makes it harder.

It would be better to query the number of SMMUs populated in device-tree
and then ensure that all are initialised correctly.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter



On 30/06/2020 17:23, Krishna Reddy wrote:
>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>>> +*smmu) {
>>> +   unsigned int i;
> 
>>> +   for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +   struct resource *res;
>>> +
>>> +   res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +   if (!res)
>>> +   break;
> 
>> Currently this driver is only supported for Tegra194 which I understand has 
>> 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think 
>> it is better to return an error if all 3 cannot be initialised.
> 
> Initialization of all the three SMMU instances is not necessary here.

That is not what I am saying.

> The driver can work with all the possible number of instances 1, 2 and 3 
> based on the DT config though it doesn't make much sense to use it with 1 
> instance.
> There is no silent failure here from driver point of view. If there is 
> misconfig in DT, SMMU faults would catch issues.

I disagree and you should return a proper error here.

>>> +   nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +   if (IS_ERR(nvidia_smmu->bases[i]))
>>> +   return ERR_CAST(nvidia_smmu->bases[i]);
> 
>> You want to use PTR_ERR() here.
> 
> PTR_ERR() returns long integer. 
> This function returns a pointer. ERR_CAST is the right one to use here. 

Ah yes, indeed. OK that's fine.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +unsigned int i;

>> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +struct resource *res;
>> +
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +if (!res)
>> +break;

>Currently this driver is only supported for Tegra194 which I understand has 3 
>SMMUs. Therefore, I don't feel that we should fail silently here, I think it 
>is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based 
on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is 
misconfig in DT, SMMU faults would catch issues.

>> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +if (IS_ERR(nvidia_smmu->bases[i]))
>> +return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter

On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>>  
>>> Signed-off-by: Krishna Reddy 

...

>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> +   int sync, int status)
>>> +{
>>> +    unsigned int delay;
>>> +
>>> +    arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> +    for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
> 
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.

Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.

 >>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> +    unsigned int i;
>>> +    struct nvidia_smmu *nvidia_smmu;
>>> +    struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> +    nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> +    if (!nvidia_smmu)
>>> +    return ERR_PTR(-ENOMEM);
>>> +
>>> +    nvidia_smmu->smmu = *smmu;
>>> +    /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> +    nvidia_smmu->num_inst = 1;
>>> +
>>> +    for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +    struct resource *res;
>>> +
>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +    if (!res)
>>> +    break;
>>> +
>>> +    nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +    if (IS_ERR(nvidia_smmu->bases[i]))
>>> +    return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> +    nvidia_smmu->num_inst++;
>>> +    }
>>> +
>>> +    nvidia_smmu->smmu.impl = _smmu_impl;
>>> +    /*
>>> + * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> + * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> + * allocated as part of nvidia_smmu struct.
>>> + */
>>> +    devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>>   struct nvidia_smmu {
>> struct arm_smmu_device    *smmu;
>> unsigned int    num_inst;
>> void __iomem    *bases[MAX_SMMU_INSTANCES];
>>   };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
> 
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?

Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.

Cheers
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Robin Murphy

On 2020-06-30 09:19, Jon Hunter wrote:


On 30/06/2020 01:10, Krishna Reddy wrote:

NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.


There is no description here of the 3rd SMMU that you mention below.
I think that we should describe the full picture here.
  

Signed-off-by: Krishna Reddy 
---
  MAINTAINERS |   2 +
  drivers/iommu/Makefile  |   2 +-
  drivers/iommu/arm-smmu-impl.c   |   3 +
  drivers/iommu/arm-smmu-nvidia.c | 196 
  drivers/iommu/arm-smmu.h|   1 +
  5 files changed, 203 insertions(+), 1 deletion(-)
  create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
  
  TEGRA IOMMU DRIVERS

  M:Thierry Reding 
+R: Krishna Reddy 
  L:linux-te...@vger.kernel.org
  S:Supported
+F: drivers/iommu/arm-smmu-nvidia.c
  F:drivers/iommu/tegra*
  
  TEGRA KBC DRIVER

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
  
+	if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))


Nit: please use "np" like all the surrounding code does.


+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.


I don't understand the "When" there - the driver has always supported 
multiple independent SMMUs, and it's not something that could be 
configured out or otherwise disabled. Plus I really don't see why you 
would ever want to force unrelated SMMUs to be programmed together - 
beyond the TLB thing mentioned it would also waste precious context bank 
resources and might lead to weird device grouping via false stream ID 
aliasing, with no obvious upside at all.



+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)


If you run checkpatch --strict on these you will get a lot 

Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 01:10, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  MAINTAINERS |   2 +
>  drivers/iommu/Makefile  |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 196 
>  drivers/iommu/arm-smmu.h|   1 +
>  5 files changed, 203 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c

...

> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> + unsigned int i;
> + struct nvidia_smmu *nvidia_smmu;
> + struct platform_device *pdev = to_platform_device(smmu->dev);
> +
> + nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
> + if (!nvidia_smmu)
> + return ERR_PTR(-ENOMEM);
> +
> + nvidia_smmu->smmu = *smmu;
> + /* Instance 0 is ioremapped by arm-smmu.c after this function returns */
> + nvidia_smmu->num_inst = 1;
> +
> + for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
> + struct resource *res;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> + if (!res)
> + break;

Currently this driver is only supported for Tegra194 which I understand
has 3 SMMUs. Therefore, I don't feel that we should fail silently here,
I think it is better to return an error if all 3 cannot be initialised.
In the future if there is an SoC that has less (hopefully not more) than
Tegra194 then we should handle this via the DT compatible string. In
other words, we should always know how many SMMUs there are for a given
SoC and how many we should initialise.

> +
> + nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
> + if (IS_ERR(nvidia_smmu->bases[i]))
> + return ERR_CAST(nvidia_smmu->bases[i]);

You want to use PTR_ERR() here.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 01:10, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.

There is no description here of the 3rd SMMU that you mention below.
I think that we should describe the full picture here.
 
> Signed-off-by: Krishna Reddy 
> ---
>  MAINTAINERS |   2 +
>  drivers/iommu/Makefile  |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 196 
>  drivers/iommu/arm-smmu.h|   1 +
>  5 files changed, 203 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7b5ffd646c6b9..64c37dbdd4426 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16808,8 +16808,10 @@ F:   drivers/i2c/busses/i2c-tegra.c
>  
>  TEGRA IOMMU DRIVERS
>  M:   Thierry Reding 
> +R:   Krishna Reddy 
>  L:   linux-te...@vger.kernel.org
>  S:   Supported
> +F:   drivers/iommu/arm-smmu-nvidia.c
>  F:   drivers/iommu/tegra*
>  
>  TEGRA KBC DRIVER
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 342190196dfb0..2b8203db73ec3 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o 
> amd/quirks.o
>  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
>  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
>  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
> -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
> +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c75b9d957b702..70f7318017617 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
>   smmu->impl = _impl;
>  
> + if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
> + return nvidia_smmu_impl_init(smmu);
> +
>   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
>   of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
>   return qcom_smmu_impl_init(smmu);
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> new file mode 100644
> index 0..1124f0ac1823a
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// NVIDIA ARM SMMU v2 implementation quirks
> +// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "arm-smmu.h"
> +
> +/*
> + * Tegra194 has three ARM MMU-500 Instances.
> + * Two of them are used together for interleaved IOVA accesses and
> + * used by non-isochronous HW devices for SMMU translations.
> + * Third one is used for SMMU translations from isochronous HW devices.
> + * It is possible to use this implementation to program either
> + * all three or two of the instances identically as desired through
> + * DT node.
> + *
> + * Programming all the three instances identically comes with redundant TLB
> + * invalidations as all three never need to be TLB invalidated for a HW 
> device.
> + *
> + * When Linux kernel supports multiple SMMU devices, the SMMU device used for
> + * isochornous HW devices should be added as a separate ARM MMU-500 device
> + * in DT and be programmed independently for efficient TLB invalidates.
> + */
> +#define MAX_SMMU_INSTANCES 3
> +
> +#define TLB_LOOP_TIMEOUT_IN_US   100 /* 1s! */
> +#define TLB_SPIN_COUNT   10
> +
> +struct nvidia_smmu {
> + struct arm_smmu_device  smmu;
> + unsigned intnum_inst;
> + void __iomem*bases[MAX_SMMU_INSTANCES];
> +};
> +
> +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device 
> *smmu)
> +{
> + return container_of(smmu, struct nvidia_smmu, smmu);
> +}
> +
> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
> +unsigned int inst, int page)

If you run checkpatch --strict on these you will get a lot of ...

CHECK: Alignment should match open parenthesis
#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)

We should fix these.

> +{
> + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> + if (!nvidia_smmu->bases[0])
> + 

RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Pritesh Raithatha
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA
> accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Pritesh Raithatha 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Nicolin Chen
On Mon, Jun 29, 2020 at 05:10:49PM -0700, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 196 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_inst; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+