Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Guillaume Tucker
On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
> 
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> and the commit-reverted version (passing), each applying this patch.

 Sure, I've run 3 jobs:

 * v5.11-rc6 as a reference, to see the original issue:
   https://lava.collabora.co.uk/scheduler/job/3187848

 * + your debug patch:
   https://lava.collabora.co.uk/scheduler/job/3187849

 * + the "breaking" commit reverted, passing the tests:
   https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3210192
> 
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
> 
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
> 
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

  https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

  https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently.  We'll keep
that in mind for the future.

Best wishes,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Nicolin Chen
Hi Guillaume,

Thank you for the test results! And sorry for my belated reply.

On Thu, Feb 11, 2021 at 03:50:05PM +, Guillaume Tucker wrote:
> > On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
> >>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> >>> and the commit-reverted version (passing), each applying this patch.
> >>
> >> Sure, I've run 3 jobs:
> >>
> >> * v5.11-rc6 as a reference, to see the original issue:
> >>   https://lava.collabora.co.uk/scheduler/job/3187848
> >>
> >> * + your debug patch:
> >>   https://lava.collabora.co.uk/scheduler/job/3187849
> >>
> >> * + the "breaking" commit reverted, passing the tests:
> >>   https://lava.collabora.co.uk/scheduler/job/3187851
> > 
> > Thanks for the help!
> > 
> > I am able to figure out what's probably wrong, yet not so sure
> > about the best solution at this point.
> > 
> > Would it be possible for you to run one more time with another
> > debugging patch? I'd like to see the same logs as previous:
> > 1. Vanilla kernel + debug patch
> > 2. Vanilla kernel + Reverted + debug patch
> 
> As it turns out, next-20210210 is passing all the tests again so
> it looks like this got fixed in the meantime:
> 
>   https://lava.collabora.co.uk/scheduler/job/3210192

I checked this passing log, however, found that the regression is
still there though test passed, as the prints below aren't normal:
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
 Page fault (SMMU translation error [--S])

I was trying to think of a simpler solution than a revert. However,
given the fact that the callback sequence could change -- guessing
likely a recent change in iommu core, I feel it safer to revert my
previous change, not necessarily being a complete revert though.

I attached my partial reverting change in this email. Would it be
possible for you to run one more test for me to confirm it? It'd
keep the tests passing while eliminating all error prints above.

If the fix works, I'll re-send it to mail list by adding a commit
message.

Thanks!
Nicolin
>From a9950b6e76e279f19d2c9d06aef1e222b020a9e2 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Thu, 18 Feb 2021 01:11:59 -0800
Subject: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 EMEM address decode error (SMMU translation error [--S])
  tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
	 Page fault (SMMU translation error [--S])

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 72 +-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 4a3f095a1c26..97eb62f667d2 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -798,10 +798,70 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain,
 	return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
+static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
+{
+	struct platform_device *pdev;
+	struct tegra_mc *mc;
+
+	pdev = of_find_device_by_node(np);
+	if (!pdev)
+		return NULL;
+
+	mc = platform_get_drvdata(pdev);
+	if (!mc)
+		return NULL;
+
+	return mc->smmu;
+}
+
+static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
+struct of_phandle_args *args)
+{
+	const struct iommu_ops *ops = smmu->iommu.ops;
+	int err;
+
+	err = iommu_fwspec_init(dev, >of_node->fwnode, ops);
+	if (err < 0) {
+		dev_err(dev, "failed to initialize fwspec: %d\n", err);
+		return err;
+	}
+
+	err = ops->of_xlate(dev, args);
+	if (err < 0) {
+		dev_err(dev, "failed to parse SW group ID: %d\n", err);
+		iommu_fwspec_free(dev);
+		return err;
+	}
+
+	return 0;
+}
+
 static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
-	struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
+	struct device_node *np = dev->of_node;
+	struct tegra_smmu *smmu = NULL;
+	struct of_phandle_args args;
+	unsigned int index = 0;
+	int err;
+
+	while (of_parse_phandle_with_args(np, "iommus", "#iommu-cells", index,
+	  ) == 0) {
+		smmu = tegra_smmu_find(args.np);
+		if (smmu) {
+			err = tegra_smmu_configure(smmu, dev, );
+			of_node_put(args.np);
 
+			if (err < 0)
+return ERR_PTR(err);
+
+			break;
+		}
+
+		of_node_put(args.np);
+		index++;
+	}
+
+	smmu = dev_iommu_priv_get(dev);
 	if (!smmu)
 		return ERR_PTR(-ENODEV);
 
@@ -1028,6 +1088,16 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 	if (!smmu)
 		return ERR_PTR(-ENOMEM);
 
+	/*
+	 * This is a bit of a hack. Ideally we'd want to simply return this
+	 * value. However the IOMMU registration process will attempt to add
+	 * all devices to the IOMMU when bus_set_iommu() is called. In order
+	 * not to rely on global variables to track the IOMMU 

Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-11 Thread Guillaume Tucker
On 10/02/2021 08:20, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>> and the commit-reverted version (passing), each applying this patch.
>>
>> Sure, I've run 3 jobs:
>>
>> * v5.11-rc6 as a reference, to see the original issue:
>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>
>> * + your debug patch:
>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>
>> * + the "breaking" commit reverted, passing the tests:
>>   https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Thanks for the help!
> 
> I am able to figure out what's probably wrong, yet not so sure
> about the best solution at this point.
> 
> Would it be possible for you to run one more time with another
> debugging patch? I'd like to see the same logs as previous:
> 1. Vanilla kernel + debug patch
> 2. Vanilla kernel + Reverted + debug patch

As it turns out, next-20210210 is passing all the tests again so
it looks like this got fixed in the meantime:

  https://lava.collabora.co.uk/scheduler/job/3210192
  https://lava.collabora.co.uk/results/3210192/0_igt-kms-tegra

And here's a more extensive list of IGT tests on next-20210211,
all the regressions have been fixed:

  https://kernelci.org/test/plan/id/60254c42f51df36be53abe62/


I haven't run a reversed bisection to find the fix, but I guess
it wouldn't be too hard to find out what happened by hand anyway.
I see the drm/tegra/for-5.12-rc1 tag has been merged into
linux-next, maybe that solved the issue?

FYI I've also run some jobs with your debug patch and with the
breaking patch reverted:

  https://lava.collabora.co.uk/scheduler/job/3210245
  https://lava.collabora.co.uk/scheduler/job/3210596

Meanwhile I'll see what can be done to improve the automated
bisection so if there are new IGT regressions they would get
reported earlier.  I guess it would have saved us all some time
if it had been bisected in December.

Thanks,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-10 Thread Nicolin Chen
Hi Guillaume,

On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
> > It'd be nicer if I can get both logs of the vanilla kernel (failing)
> > and the commit-reverted version (passing), each applying this patch.
> 
> Sure, I've run 3 jobs:
> 
> * v5.11-rc6 as a reference, to see the original issue:
>   https://lava.collabora.co.uk/scheduler/job/3187848
> 
> * + your debug patch:
>   https://lava.collabora.co.uk/scheduler/job/3187849
> 
> * + the "breaking" commit reverted, passing the tests:
>   https://lava.collabora.co.uk/scheduler/job/3187851

Thanks for the help!

I am able to figure out what's probably wrong, yet not so sure
about the best solution at this point.

Would it be possible for you to run one more time with another
debugging patch? I'd like to see the same logs as previous:
1. Vanilla kernel + debug patch
2. Vanilla kernel + Reverted + debug patch

Thank you
Nicolin
>From dce84731e12900b0e98dffc0ff981638a0bb3405 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Fri, 5 Feb 2021 01:41:07 -0800
Subject: [PATCH] iommu: debug tegra-smmu v2

Signed-off-by: Nicolin Chen 
---
 arch/arm/mm/dma-mapping.c   |  6 +++--
 drivers/gpu/drm/tegra/dc.c  |  1 +
 drivers/gpu/drm/tegra/drm.c |  4 +++
 drivers/iommu/iommu.c   | 47 +++--
 drivers/iommu/tegra-smmu.c  | 31 +-
 drivers/memory/tegra/tegra124.c |  2 ++
 6 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c4b8df2ad328..6a6715817707 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2136,13 +2136,15 @@ static int __arm_iommu_attach_device(struct device *dev,
 	int err;
 
 	err = iommu_attach_device(mapping->domain, dev);
-	if (err)
+	if (err) {
+		dev_alert(dev, "--%s failed with err: %d", __func__, err);
 		return err;
+	}
 
 	kref_get(>kref);
 	to_dma_iommu_mapping(dev) = mapping;
 
-	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
+	pr_alert("Attached IOMMU controller to %s device.\n", dev_name(dev));
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 0ae3a025efe9..96af9186e81a 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2033,6 +2033,7 @@ static int tegra_dc_init(struct host1x_client *client)
 	struct drm_plane *cursor = NULL;
 	int err;
 
+	dev_alert(dc->dev, "---%s, %d\n", __func__, __LINE__);
 	/*
 	 * XXX do not register DCs with no window groups because we cannot
 	 * assign a primary plane to them, which in turn will cause KMS to
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index e9ce7d6992d2..ea6bf26f4df3 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -906,6 +906,7 @@ int host1x_client_iommu_attach(struct host1x_client *client)
 	struct iommu_group *group = NULL;
 	int err;
 
+	dev_alert(client->dev, "---%s, %d\n", __func__, __LINE__);
 	/*
 	 * If the host1x client is already attached to an IOMMU domain that is
 	 * not the shared IOMMU domain, don't try to attach it to a different
@@ -914,7 +915,9 @@ int host1x_client_iommu_attach(struct host1x_client *client)
 	if (domain && domain != tegra->domain)
 		return 0;
 
+	dev_alert(client->dev, "---%s, %d\n", __func__, __LINE__);
 	if (tegra->domain) {
+	dev_alert(client->dev, "---%s, %d\n", __func__, __LINE__);
 		group = iommu_group_get(client->dev);
 		if (!group)
 			return -ENODEV;
@@ -932,6 +935,7 @@ int host1x_client_iommu_attach(struct host1x_client *client)
 
 	client->group = group;
 
+	dev_alert(client->dev, "---%s, %d\n", __func__, __LINE__);
 	return 0;
 }
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..890c7c7ecf94 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -220,6 +220,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
+		dev_alert(dev, "-%s, %d\n", __func__, __LINE__);
 		ret = PTR_ERR(group);
 		goto out_release;
 	}
@@ -268,7 +269,9 @@ int iommu_probe_device(struct device *dev)
 	 */
 	iommu_alloc_default_domain(group, dev);
 
+	dev_alert(dev, "-%s, %d\n", __func__, __LINE__);
 	if (group->default_domain) {
+		dev_alert(dev, "-%s, %d\n", __func__, __LINE__);
 		ret = __iommu_attach_device(group->default_domain, dev);
 		if (ret) {
 			iommu_group_put(group);
@@ -852,8 +855,11 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 
 	mutex_lock(>mutex);
 	list_add_tail(>list, >devices);
-	if (group->domain  && !iommu_is_attach_deferred(group->domain, dev))
+	dev_alert(dev, "-%s, %d\n", __func__, __LINE__);
+	if (group->domain  && !iommu_is_attach_deferred(group->domain, dev)) {
+		dev_alert(dev, "-%s, %d\n", __func__, __LINE__);
 		ret = __iommu_attach_device(group->domain, dev);
+	}
 	

Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-06 Thread Guillaume Tucker
On 05/02/2021 09:45, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
>>> Please let us know if you need any help debugging this issue or
>>> to try a fix on this platform.
>>
>> Yes, I don't have any Tegra124 platform to run. It'd be very nice
>> if you can run some debugging patch (I can provide you) and a fix
>> after I root cause the issue.
> 
> Would it be possible for you to run with the given debugging patch?
> 
> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> and the commit-reverted version (passing), each applying this patch.

Sure, I've run 3 jobs:

* v5.11-rc6 as a reference, to see the original issue:
  https://lava.collabora.co.uk/scheduler/job/3187848

* + your debug patch:
  https://lava.collabora.co.uk/scheduler/job/3187849

* + the "breaking" commit reverted, passing the tests:
  https://lava.collabora.co.uk/scheduler/job/3187851


You can see the history of the test branch I'm using here, with
the 3 revisions mentioned above:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/linux-5.11-rc6-nyan-big-drm-read/


Hope that helps,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-05 Thread Nicolin Chen
Hi Guillaume,

On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
> > Please let us know if you need any help debugging this issue or
> > to try a fix on this platform.
> 
> Yes, I don't have any Tegra124 platform to run. It'd be very nice
> if you can run some debugging patch (I can provide you) and a fix
> after I root cause the issue.

Would it be possible for you to run with the given debugging patch?

It'd be nicer if I can get both logs of the vanilla kernel (failing)
and the commit-reverted version (passing), each applying this patch.

Thanks in advance!
Nicolin
>From 80f288d7101101fca0412c5c200cea7e203a675d Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Fri, 5 Feb 2021 01:41:07 -0800
Subject: [PATCH] iommu: debug tegra-smmu

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 4a3f095a1c26..796b7df54b8f 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -363,6 +363,7 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup,
 		value |= SMMU_ASID_VALUE(asid);
 		value |= SMMU_ASID_ENABLE;
 		smmu_writel(smmu, value, group->reg);
+		pr_alert("%s, swgroup %d: writing %x to reg1 %x\n", __func__, swgroup, value, group->reg);
 	} else {
 		pr_warn("%s group from swgroup %u not found\n", __func__,
 swgroup);
@@ -379,6 +380,7 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup,
 		value = smmu_readl(smmu, client->smmu.reg);
 		value |= BIT(client->smmu.bit);
 		smmu_writel(smmu, value, client->smmu.reg);
+		pr_alert("%s, swgroup %d: writing %x to reg2 %x\n", __func__, swgroup, value, client->smmu.reg);
 	}
 }
 
@@ -491,13 +493,19 @@ static int tegra_smmu_attach_dev(struct iommu_domain *domain,
 	unsigned int index;
 	int err;
 
+	dev_alert(dev, "---%s: smmu %s\n", __func__, smmu ? "valid" : "NULL");
+	dump_stack();
 	if (!fwspec)
 		return -ENOENT;
 
+	dev_alert(dev, "---%s: fwspec->num_ids %d\n", __func__, fwspec->num_ids);
 	for (index = 0; index < fwspec->num_ids; index++) {
 		err = tegra_smmu_as_prepare(smmu, as);
-		if (err)
+		if (err) {
+			dev_err(dev, "failed to prepare as(%d) for fwspec %d",
+as->id, fwspec->ids[index]);
 			goto disable;
+		}
 
 		tegra_smmu_enable(smmu, fwspec->ids[index], as->id);
 	}
@@ -805,6 +813,8 @@ static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 	if (!smmu)
 		return ERR_PTR(-ENODEV);
 
+	dev_alert(dev, "%s, %d\n", __func__, __LINE__);
+	dump_stack();
 	return >iommu;
 }
 
@@ -904,6 +914,8 @@ static int tegra_smmu_of_xlate(struct device *dev,
 
 	dev_iommu_priv_set(dev, mc->smmu);
 
+	dev_alert(dev, "---%s: id %d", __func__, id);
+	dump_stack();
 	return iommu_fwspec_add_ids(dev, , 1);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Nicolin Chen
Hi Guillaume,

On Thu, Feb 04, 2021 at 11:10:15AM +, Guillaume Tucker wrote:
> Hi Nicolin,
> 
> A regression was detected by kernelci.org in IGT's drm_read tests
> on mainline, it was first seen on 17th December 2020.  You can
> find some details here:
> 
>   https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Thanks for reporting the issue. We did test on Tegra210 and Tegra30
yet not on Tegra124. I am wondering what could go wrong...

> Please let us know if you need any help debugging this issue or
> to try a fix on this platform.

Yes, I don't have any Tegra124 platform to run. It'd be very nice
if you can run some debugging patch (I can provide you) and a fix
after I root cause the issue.

Thanks
Nicolin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Guillaume Tucker
Hi Nicolin,

A regression was detected by kernelci.org in IGT's drm_read tests
on mainline, it was first seen on 17th December 2020.  You can
find some details here:

  https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Then an automated bisection was run and it landed on this
patch (v5.10-rc3-4-g25938c73cd79 on mainline).  Normally, an
email is generated automatically but I had to start this one by
hand as there were issues getting it to complete.

You can see the failing test cases with this patch:

  https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra

Some errors are seen around this point in the log:

  https://lava.collabora.co.uk/scheduler/job/3126405#L1005

[3.029729] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: EMEM address decode error (SMMU translation error [--S])
[3.042058] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: Page fault (SMMU translation error [--S])


Here's the same test passing with this patch reverted:

  https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra
  

For completeness, you can see all the test jobs run by the
automated bisection here:

  
https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25_search=bisection-gtucker-12#dt_


Please let us know if you need any help debugging this issue or
to try a fix on this platform.

Best wishes,
Guillaume

On 25/11/2020 10:10, Nicolin Chen wrote:
> The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
> to call in tegra_smmu_probe_device() where each client searches
> its DT node for smmu pointer and swgroup ID, so as to configure
> an fwspec. But this requires a valid smmu pointer even before mc
> and smmu drivers are probed. So in tegra_smmu_probe() we added a
> line of code to fill mc->smmu, marking "a bit of a hack".
> 
> This works for most of clients in the DTB, however, doesn't work
> for a client that doesn't exist in DTB, a PCI device for example.
> 
> Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
> it's called from bus_set_iommu(), iommu core will let everything
> carry on. Then when a client gets probed, of_iommu_configure() in
> iommu core will search DTB for swgroup ID and call ->of_xlate()
> to prepare an fwspec, similar to tegra_smmu_probe_device() and
> tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
> again, and this time we shall return smmu->iommu pointer properly.
> 
> So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
> along with DT polling code by letting the iommu core handle every
> thing, except a problem that we search iommus property in DTB not
> only for swgroup ID but also for mc node to get mc->smmu pointer
> to call dev_iommu_priv_set() and return the smmu->iommu pointer.
> So we'll need to find another way to get smmu pointer.
> 
> Referencing the implementation of sun50i-iommu driver, of_xlate()
> has client's dev pointer, mc node and swgroup ID. This means that
> we can call dev_iommu_priv_set() in of_xlate() instead, so we can
> simply get smmu pointer in ->probe_device().
> 
> This patch reworks tegra_smmu_probe_device() by:
> 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
>ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
>tegra_smmu_probe/tegra_mc_probe().
> 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
>pointer in tegra_smmu_probe_device() to replace DTB polling.
> 3) Removing tegra_smmu_configure() accordingly since iommu core
>takes care of it.
> 
> This also fixes a problem that previously we could add clients to
> iommu groups before iommu core initializes its default domain:
> ubuntu@jetson:~$ dmesg | grep iommu
> platform 5000.host1x: Adding to iommu group 1
> platform 5700.gpu: Adding to iommu group 2
> iommu: Default domain type: Translated
> platform 5420.dc: Adding to iommu group 3
> platform 5424.dc: Adding to iommu group 3
> platform 5434.vic: Adding to iommu group 4
> 
> Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
> warnings if switching to IOMMU_DOMAIN_DMA:
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> 
> Now, bypassing the first probe_device() call from bus_set_iommu()
> fixes the sequence:
> ubuntu@jetson:~$ dmesg | grep iommu
> iommu: Default domain type: Translated
> tegra-host1x 5000.host1x: Adding to iommu group 0
> tegra-dc 5420.dc: Adding to iommu group 1
> tegra-dc 5424.dc: Adding to iommu group 1
> tegra-vic 5434.vic: Adding to iommu group 2
> nouveau 5700.gpu: Adding to iommu group 3
> 
> Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.
> 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 
> 

[PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2020-11-25 Thread Nicolin Chen
The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
to call in tegra_smmu_probe_device() where each client searches
its DT node for smmu pointer and swgroup ID, so as to configure
an fwspec. But this requires a valid smmu pointer even before mc
and smmu drivers are probed. So in tegra_smmu_probe() we added a
line of code to fill mc->smmu, marking "a bit of a hack".

This works for most of clients in the DTB, however, doesn't work
for a client that doesn't exist in DTB, a PCI device for example.

Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
it's called from bus_set_iommu(), iommu core will let everything
carry on. Then when a client gets probed, of_iommu_configure() in
iommu core will search DTB for swgroup ID and call ->of_xlate()
to prepare an fwspec, similar to tegra_smmu_probe_device() and
tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
again, and this time we shall return smmu->iommu pointer properly.

So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
along with DT polling code by letting the iommu core handle every
thing, except a problem that we search iommus property in DTB not
only for swgroup ID but also for mc node to get mc->smmu pointer
to call dev_iommu_priv_set() and return the smmu->iommu pointer.
So we'll need to find another way to get smmu pointer.

Referencing the implementation of sun50i-iommu driver, of_xlate()
has client's dev pointer, mc node and swgroup ID. This means that
we can call dev_iommu_priv_set() in of_xlate() instead, so we can
simply get smmu pointer in ->probe_device().

This patch reworks tegra_smmu_probe_device() by:
1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
   ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
   tegra_smmu_probe/tegra_mc_probe().
2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
   pointer in tegra_smmu_probe_device() to replace DTB polling.
3) Removing tegra_smmu_configure() accordingly since iommu core
   takes care of it.

This also fixes a problem that previously we could add clients to
iommu groups before iommu core initializes its default domain:
ubuntu@jetson:~$ dmesg | grep iommu
platform 5000.host1x: Adding to iommu group 1
platform 5700.gpu: Adding to iommu group 2
iommu: Default domain type: Translated
platform 5420.dc: Adding to iommu group 3
platform 5424.dc: Adding to iommu group 3
platform 5434.vic: Adding to iommu group 4

Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
warnings if switching to IOMMU_DOMAIN_DMA:
iommu: Failed to allocate default IOMMU domain of type 0 for
   group (null) - Falling back to IOMMU_DOMAIN_DMA
iommu: Failed to allocate default IOMMU domain of type 0 for
   group (null) - Falling back to IOMMU_DOMAIN_DMA

Now, bypassing the first probe_device() call from bus_set_iommu()
fixes the sequence:
ubuntu@jetson:~$ dmesg | grep iommu
iommu: Default domain type: Translated
tegra-host1x 5000.host1x: Adding to iommu group 0
tegra-dc 5420.dc: Adding to iommu group 1
tegra-dc 5424.dc: Adding to iommu group 1
tegra-vic 5434.vic: Adding to iommu group 2
nouveau 5700.gpu: Adding to iommu group 3

Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Acked-by: Thierry Reding 
Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 96 ++
 1 file changed, 15 insertions(+), 81 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 297d49f3f80e..f45ed43cf8db 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -797,75 +797,9 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct 
iommu_domain *domain,
return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
 }
 
-static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
-{
-   struct platform_device *pdev;
-   struct tegra_mc *mc;
-
-   pdev = of_find_device_by_node(np);
-   if (!pdev)
-   return NULL;
-
-   mc = platform_get_drvdata(pdev);
-   if (!mc)
-   return NULL;
-
-   return mc->smmu;
-}
-
-static int tegra_smmu_configure(struct tegra_smmu *smmu, struct device *dev,
-   struct of_phandle_args *args)
-{
-   const struct iommu_ops *ops = smmu->iommu.ops;
-   int err;
-
-   err = iommu_fwspec_init(dev, >of_node->fwnode, ops);
-   if (err < 0) {
-   dev_err(dev, "failed to initialize fwspec: %d\n", err);
-   return err;
-   }
-
-   err = ops->of_xlate(dev, args);
-   if (err < 0) {
-   dev_err(dev, "failed to parse SW group ID: %d\n", err);
-   iommu_fwspec_free(dev);
-   return err;
-   }
-
-   return 0;
-}
-
 static struct iommu_device *tegra_smmu_probe_device(struct device