RE: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter

2020-02-11 Thread Leo Li



> -Original Message-
> From: Russell King - ARM Linux admin 
> Sent: Tuesday, February 11, 2020 5:46 PM
> To: Leo Li 
> Cc: Joerg Roedel ; Will Deacon ;
> Robin Murphy ; iommu@lists.linux-
> foundation.org; linux-ker...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH] iommu/arm-smmu: fix the module name for
> disable_bypass parameter
> 
> On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote:
> > Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a
> > module"), there is a side effect that the module name is changed from
> > arm-smmu to arm-smmu-mod.  So the kernel parameter for
> disable_bypass
> > need to be changed too.  Fix the Kconfig help and error message to the
> > correct parameter name.
> 
> Hmm, this seems to be a user-visible change - so those of us who have been
> booting with "arm-smmu.disable_bypass=0" now need to change that
> depending on which kernel is being booted - which is not nice, and makes
> the support side on platforms that need this kernel parameter harder.

I agree.  Probably a better fix is to update the Makefile to change the module 
name back to the original one.

Regards,
Leo

> 
> >
> > Signed-off-by: Li Yang 
> > ---
> >  drivers/iommu/Kconfig| 2 +-
> >  drivers/iommu/arm-smmu.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> > d2fade984999..fb54be903c60 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -415,7 +415,7 @@ config
> ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
> >   hardcode the bypass disable in the code.
> >
> >   NOTE: the kernel command line parameter
> > - 'arm-smmu.disable_bypass' will continue to override this
> > + 'arm-smmu-mod.disable_bypass' will continue to override this
> >   config.
> >
> >  config ARM_SMMU_V3
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index
> > 16c4b87af42b..2ffe8ff04393 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq,
> void *dev)
> > if
> (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> > (gfsr & ARM_SMMU_sGFSR_USF))
> > dev_err(smmu->dev,
> > -   "Blocked unknown Stream ID 0x%hx; boot
> with \"arm-smmu.disable_bypass=0\" to allow, but this may have security
> implications\n",
> > +   "Blocked unknown Stream ID 0x%hx; boot
> with
> > +\"arm-smmu-mod.disable_bypass=0\" to allow, but this may have
> > +security implications\n",
> > (u16)gfsynr1);
> > else
> > dev_err(smmu->dev,
> > --
> > 2.17.1
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists
> > .infradead.org%2Fmailman%2Flistinfo%2Flinux-arm-
> kerneldata=02%7C0
> >
> 1%7Cleoyang.li%40nxp.com%7Cf2f7f3c7c8fa4df0fb0608d7af4c84d1%7C686ea
> 1d3
> >
> bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637170615487429054sdata=
> NO4VZ1
> > sSMKyeXiL%2BUc5K6gIW5Uld%2BRsGAICLgI2nnd8%3Dreserved=0
> >
> 
> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .armlinux.org.uk%2Fdeveloper%2Fpatches%2Fdata=02%7C01%7Cleoy
> ang.li%40nxp.com%7Cf2f7f3c7c8fa4df0fb0608d7af4c84d1%7C686ea1d3bc2b4
> c6fa92cd99c5c301635%7C0%7C0%7C637170615487429054sdata=eMRT
> wZGZPeq3DvkBwjBjGbsS1Qsy3LYMnjH%2B9FJm2aE%3Dreserved=0
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps
> up According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-11 Thread Daniel Drake
On Wed, Feb 12, 2020 at 12:03 AM Derrick, Jonathan
 wrote:
> On Tue, 2020-02-11 at 17:13 +0800, Daniel Drake wrote:
> > The PCI devices handled by intel-iommu may have a DMA requester on
> > another bus, such as VMD subdevices needing to use the VMD endpoint.
> >
> > The real DMA device is now used for the DMA mapping, but one case was
> > missed earlier, when allocating memory through (e.g.) intel_map_page().
> > Confusion ensues if the iommu domain type for the subdevice does not match
> > the iommu domain type for the real DMA device.
> Is there a way to force this situation for my testing?

I think you could hack device_def_domain_type() to return
IOMMU_DOMAIN_IDENTITY for the real device, and IOMMU_DOMAIN_DMA for
the subdevice.

But I got curious as to why my subdevice might be IOMMU_DOMAIN_DMA, so
I checked, and found out that my assumptions weren't quite correct.
The subdevice has no iommu domain recorded at all. Before applying any
patches here, what's actually happening is:

1. Real DMA device gets registered with the iommu as
IOMMU_DOMAIN_IDENTITY using si_domain.
2. When the subdevice gets registered, the relevant code flow is
inside dmar_insert_one_dev_info():
 - it creates a new device_domain_info and domain->domain.type == IDENTITY, but
 - it then calls find_domain(dev) which successfully defers to the
real DMA device and returns the real DMA device's dmar_domain
 - since found != NULL (dmar_domain was found for this device) the
function bails out before setting dev->archdata.iommu

The results at this point are that the real DMA device is fully
registered as IOMMU_DOMAIN_IDENTITY using si_domain, but all of the
subdevices will always have dev->archdata.iommu == NULL.

Then when intel_map_page() is reached for the subdevice, it calls
iommu_need_mapping() for the subdevice.
This calls identity_mapping() on the subdevice, but that will always
return 0 because dev->archdata.iommu == NULL.
Following on from there, iommu_need_mapping() will then *always*
return true (mapping needed) for subdevices.

That will then lead to the situation described in my last mail, where
later down the allocation chain the request for creating a mapping
will be handed towards the real DMA dev, but that will then fail
because the real DMA dev is using IOMMU_DOMAIN_IDENTITY where no
mapping is needed.

Unless I missed anything that seems pretty clear to me now, and I
guess the only reason why you may not have already faced this in the
vmd case is if the real DMA device is not using IOMMU_DOMAIN_IDENTITY.
(To check this, you could log the value of the real dev
domain->domain.type in dmar_insert_one_dev_info(), and/or observe the
return value of identity_mapping() in iommu_need_mapping for the real
dev).

In any case it seems increasingly clear to me that
iommu_need_mapping() should be consulting the real DMA device in the
identity_mapping check, and your patch likewise solves the problem
faced here.

Thanks
Daniel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix the module name to be consistent with older kernel

2020-02-11 Thread Li Yang
Commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module")
introduced a side effect that changed the module name from arm-smmu to
arm-smmu-mod.  This breaks the users of kernel parameters for the driver
(e.g. arm-smmu.disable_bypass).  This patch changes the module name back
to be consistent.

Signed-off-by: Li Yang 
---
 drivers/iommu/Makefile  | 4 ++--
 drivers/iommu/{arm-smmu.c => arm-smmu-common.c} | 0
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/iommu/{arm-smmu.c => arm-smmu-common.c} (100%)

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 2104fb8afc06..f39e82e68640 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -14,8 +14,8 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu-mod.o
-arm-smmu-mod-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
+arm-smmu-objs := arm-smmu-common.o arm-smmu-impl.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu-common.c
similarity index 100%
rename from drivers/iommu/arm-smmu.c
rename to drivers/iommu/arm-smmu-common.c
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter

2020-02-11 Thread Li Yang
On Tue, Feb 11, 2020 at 5:47 PM Russell King - ARM Linux admin
 wrote:
>
> On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote:
> > Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"),
> > there is a side effect that the module name is changed from arm-smmu to
> > arm-smmu-mod.  So the kernel parameter for disable_bypass need to be
> > changed too.  Fix the Kconfig help and error message to the correct
> > parameter name.
>
> Hmm, this seems to be a user-visible change - so those of us who have
> been booting with "arm-smmu.disable_bypass=0" now need to change that
> depending on which kernel is being booted - which is not nice, and
> makes the support side on platforms that need this kernel parameter
> harder.

I have sent a new patch replacing this patch.  That patch will keep
the command line unchanged.

>
> >
> > Signed-off-by: Li Yang 
> > ---
> >  drivers/iommu/Kconfig| 2 +-
> >  drivers/iommu/arm-smmu.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index d2fade984999..fb54be903c60 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
> > hardcode the bypass disable in the code.
> >
> > NOTE: the kernel command line parameter
> > -   'arm-smmu.disable_bypass' will continue to override this
> > +   'arm-smmu-mod.disable_bypass' will continue to override this
> > config.
> >
> >  config ARM_SMMU_V3
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 16c4b87af42b..2ffe8ff04393 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> > *dev)
> >   if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> >   (gfsr & ARM_SMMU_sGFSR_USF))
> >   dev_err(smmu->dev,
> > - "Blocked unknown Stream ID 0x%hx; boot with 
> > \"arm-smmu.disable_bypass=0\" to allow, but this may have security 
> > implications\n",
> > + "Blocked unknown Stream ID 0x%hx; boot with 
> > \"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security 
> > implications\n",
> >   (u16)gfsynr1);
> >   else
> >   dev_err(smmu->dev,
> > --
> > 2.17.1
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance

2020-02-11 Thread Andre Tomt

On 11.02.2020 23:00, Chuck Lever wrote:

Hi Andre, thanks for trying this out.


On Feb 11, 2020, at 3:50 PM, Andre Tomt  wrote:

On 11.02.2020 20:58, Chuck Lever wrote:

The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.
The bug was exposed by recent changes to the AMD IOMMU driver.


This seems to fail differently on my system; mount fails with:
mount.nfs: mount system call failed

and the kernel log reports:
[   38.890344] NFS: Registering the id_resolver key type
[   38.890351] Key type id_resolver registered
[   38.890352] Key type id_legacy registered
[   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting 
with error EIO
[   38.901817] NFS4: Couldn't follow remote path

amd_iommu=off still works

One detail I accidentally left out of the original report is that the server (intel 
system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients are Ubuntu 
19.10 userspace. Although I dont believe this to matter at this point.


Next thing to try:

# trace-cmd record -e sunrpc -e rpcrdma

then issue the mount command. Once it completes, ^C the trace-cmd and send me 
trace.dat.

Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd 
overwrites trace.dat, so copy it out between tests).


I've uploaded them to https://tomt.net/temp/rdmaiommubug/
I'll probably do a 5.5.3 with the v1 fix as well, should show up 
momentarily.



Reported-by: Andre Tomt 
Suggested-by: Robin Murphy 
Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
Signed-off-by: Chuck Lever 
---
  net/sunrpc/xprtrdma/frwr_ops.c |5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
Hey Andre, please try this out. It just reverts the bit of brokenness that
Robin observed this morning. I've done basic testing here with Intel
IOMMU systems, no change in behavior (ie, all good to go).
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..449bb51e4fe8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt 
*r_xprt,
break;
}
mr->mr_dir = rpcrdma_data_dir(writing);
+   mr->mr_nents = i;
  - mr->mr_nents =
-   ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-   if (!mr->mr_nents)
+   if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
goto out_dmamap_err;
ibmr = mr->frwr.fr_mr;




--
Chuck Lever





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter

2020-02-11 Thread Russell King - ARM Linux admin
On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote:
> Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"),
> there is a side effect that the module name is changed from arm-smmu to
> arm-smmu-mod.  So the kernel parameter for disable_bypass need to be
> changed too.  Fix the Kconfig help and error message to the correct
> parameter name.

Hmm, this seems to be a user-visible change - so those of us who have
been booting with "arm-smmu.disable_bypass=0" now need to change that
depending on which kernel is being booted - which is not nice, and
makes the support side on platforms that need this kernel parameter
harder.

> 
> Signed-off-by: Li Yang 
> ---
>  drivers/iommu/Kconfig| 2 +-
>  drivers/iommu/arm-smmu.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d2fade984999..fb54be903c60 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
> hardcode the bypass disable in the code.
>  
> NOTE: the kernel command line parameter
> -   'arm-smmu.disable_bypass' will continue to override this
> +   'arm-smmu-mod.disable_bypass' will continue to override this
> config.
>  
>  config ARM_SMMU_V3
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 16c4b87af42b..2ffe8ff04393 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> *dev)
>   if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
>   (gfsr & ARM_SMMU_sGFSR_USF))
>   dev_err(smmu->dev,
> - "Blocked unknown Stream ID 0x%hx; boot with 
> \"arm-smmu.disable_bypass=0\" to allow, but this may have security 
> implications\n",
> + "Blocked unknown Stream ID 0x%hx; boot with 
> \"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security 
> implications\n",
>   (u16)gfsynr1);
>   else
>   dev_err(smmu->dev,
> -- 
> 2.17.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter

2020-02-11 Thread Li Yang
Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"),
there is a side effect that the module name is changed from arm-smmu to
arm-smmu-mod.  So the kernel parameter for disable_bypass need to be
changed too.  Fix the Kconfig help and error message to the correct
parameter name.

Signed-off-by: Li Yang 
---
 drivers/iommu/Kconfig| 2 +-
 drivers/iommu/arm-smmu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d2fade984999..fb54be903c60 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
  hardcode the bypass disable in the code.
 
  NOTE: the kernel command line parameter
- 'arm-smmu.disable_bypass' will continue to override this
+ 'arm-smmu-mod.disable_bypass' will continue to override this
  config.
 
 config ARM_SMMU_V3
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..2ffe8ff04393 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
(gfsr & ARM_SMMU_sGFSR_USF))
dev_err(smmu->dev,
-   "Blocked unknown Stream ID 0x%hx; boot with 
\"arm-smmu.disable_bypass=0\" to allow, but this may have security 
implications\n",
+   "Blocked unknown Stream ID 0x%hx; boot with 
\"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security 
implications\n",
(u16)gfsynr1);
else
dev_err(smmu->dev,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance

2020-02-11 Thread Chuck Lever
Hi Andre, thanks for trying this out.

> On Feb 11, 2020, at 3:50 PM, Andre Tomt  wrote:
> 
> On 11.02.2020 20:58, Chuck Lever wrote:
>> The @nents value that was passed to ib_dma_map_sg() has to be passed
>> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
>> concatenate sg entries, it will return a different nents value than
>> it was passed.
>> The bug was exposed by recent changes to the AMD IOMMU driver.
> 
> This seems to fail differently on my system; mount fails with:
> mount.nfs: mount system call failed
> 
> and the kernel log reports:
> [   38.890344] NFS: Registering the id_resolver key type
> [   38.890351] Key type id_resolver registered
> [   38.890352] Key type id_legacy registered
> [   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting 
> with error EIO
> [   38.901817] NFS4: Couldn't follow remote path
> 
> amd_iommu=off still works
> 
> One detail I accidentally left out of the original report is that the server 
> (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients 
> are Ubuntu 19.10 userspace. Although I dont believe this to matter at this 
> point.

Next thing to try:

# trace-cmd record -e sunrpc -e rpcrdma

then issue the mount command. Once it completes, ^C the trace-cmd and send me 
trace.dat.

Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd 
overwrites trace.dat, so copy it out between tests).


>> Reported-by: Andre Tomt 
>> Suggested-by: Robin Murphy 
>> Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
>> Signed-off-by: Chuck Lever 
>> ---
>>  net/sunrpc/xprtrdma/frwr_ops.c |5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> Hey Andre, please try this out. It just reverts the bit of brokenness that
>> Robin observed this morning. I've done basic testing here with Intel
>> IOMMU systems, no change in behavior (ie, all good to go).
>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
>> index 095be887753e..449bb51e4fe8 100644
>> --- a/net/sunrpc/xprtrdma/frwr_ops.c
>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
>> @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt 
>> *r_xprt,
>>  break;
>>  }
>>  mr->mr_dir = rpcrdma_data_dir(writing);
>> +mr->mr_nents = i;
>>  -   mr->mr_nents =
>> -ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
>> -if (!mr->mr_nents)
>> +if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
>>  goto out_dmamap_err;
>>  ibmr = mr->frwr.fr_mr;
> 

--
Chuck Lever



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance

2020-02-11 Thread Andre Tomt

On 11.02.2020 20:58, Chuck Lever wrote:

The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.

The bug was exposed by recent changes to the AMD IOMMU driver.


This seems to fail differently on my system; mount fails with:
mount.nfs: mount system call failed

and the kernel log reports:
[   38.890344] NFS: Registering the id_resolver key type
[   38.890351] Key type id_resolver registered
[   38.890352] Key type id_legacy registered
[   38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. 
Exiting with error EIO

[   38.901817] NFS4: Couldn't follow remote path

amd_iommu=off still works

One detail I accidentally left out of the original report is that the 
server (intel system) is running Ubuntu 20.04 ("beta") userspace, and 
AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to 
matter at this point.



Reported-by: Andre Tomt 
Suggested-by: Robin Murphy 
Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
Signed-off-by: Chuck Lever 
---
  net/sunrpc/xprtrdma/frwr_ops.c |5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

Hey Andre, please try this out. It just reverts the bit of brokenness that
Robin observed this morning. I've done basic testing here with Intel
IOMMU systems, no change in behavior (ie, all good to go).

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..449bb51e4fe8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt 
*r_xprt,
break;
}
mr->mr_dir = rpcrdma_data_dir(writing);
+   mr->mr_nents = i;
  
-	mr->mr_nents =

-   ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-   if (!mr->mr_nents)
+   if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
goto out_dmamap_err;
  
  	ibmr = mr->frwr.fr_mr;





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance

2020-02-11 Thread Chuck Lever
The @nents value that was passed to ib_dma_map_sg() has to be passed
to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
concatenate sg entries, it will return a different nents value than
it was passed.

The bug was exposed by recent changes to the AMD IOMMU driver.

Reported-by: Andre Tomt 
Suggested-by: Robin Murphy 
Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
Signed-off-by: Chuck Lever 
---
 net/sunrpc/xprtrdma/frwr_ops.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Hey Andre, please try this out. It just reverts the bit of brokenness that
Robin observed this morning. I've done basic testing here with Intel
IOMMU systems, no change in behavior (ie, all good to go).

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 095be887753e..449bb51e4fe8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt 
*r_xprt,
break;
}
mr->mr_dir = rpcrdma_data_dir(writing);
+   mr->mr_nents = i;
 
-   mr->mr_nents =
-   ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
-   if (!mr->mr_nents)
+   if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir))
goto out_dmamap_err;
 
ibmr = mr->frwr.fr_mr;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Andre Tomt

On 11.02.2020 17:36, Robin Murphy wrote:

On 11/02/2020 4:03 pm, Chuck Lever wrote:
Robin, your explanation makes sense to me. I can post a fix for this 
imbalance later today for Andre to try.


FWIW here's a quick hack which *should* suppress the concatenation 
behaviour - if it makes Andre's system any happier then that would 
indeed point towards dma_map_sg() handling being the culprit.


Robin.


This hack do indeed make things work again.


->8-
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..a6b71bad518e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,

   * - but doesn't fall at a segment boundary
   * - and wouldn't make the resulting output segment too long
   */
-    if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
+    if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
  (max_len - cur_len >= s_length)) {
  /* ...then concatenate it with the previous one */
  cur_len += s_length;
@@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,

  if (s_length + s_iova_off < s_iova_len)
  cur_len = 0;
  }
+    WARN_ON(count < nents);
  return count;
  }



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Chuck Lever



> On Feb 11, 2020, at 11:36 AM, Robin Murphy  wrote:
> 
> On 11/02/2020 4:03 pm, Chuck Lever wrote:
>>> On Feb 11, 2020, at 10:32 AM, Robin Murphy  wrote:
>>> 
>>> On 11/02/2020 3:24 pm, Chuck Lever wrote:
> On Feb 11, 2020, at 10:12 AM, Robin Murphy  wrote:
> 
> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>> Andre-
>> Thank you for the detailed report!
>> Tom-
>> There is a rich set of trace points available in the RPC/RDMA 
>> implementation in 5.4/5.5, fwiw.
>> Please keep me in the loop, let me know if there is anything I can do to 
>> help.
> 
> One aspect that may be worth checking is whether there's anywhere that 
> assumes a successful return value from dma_map_sg() is always the same as 
> the number of entries passed in - that's the most obvious way the 
> iommu-dma code differs (legitimately) from the previous amd-iommu 
> implementation.
 net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
 317 mr->mr_nents =
 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, 
 mr->mr_dir);
 319 if (!mr->mr_nents)
 320 goto out_dmamap_err;
 Should that rather be "if (mr->mr_nents != i)" ?
>>> 
>>> No, that much is OK - the point is that dma_map_sg() may pack the DMA 
>>> addresses such that sg_dma_len(sg) > sg->length - however, subsequently 
>>> passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than 
>>> the original value of i) looks at a glance like an example of how things 
>>> may start to get out-of-whack.
>> Robin, your explanation makes sense to me. I can post a fix for this 
>> imbalance later today for Andre to try.
> 
> FWIW here's a quick hack which *should* suppress the concatenation behaviour 
> - if it makes Andre's system any happier then that would indeed point towards 
> dma_map_sg() handling being the culprit.

Even so, 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails")
looks like it introduced this problem.


> Robin.
> 
> ->8-
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index a2e96a5fd9a7..a6b71bad518e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>* - but doesn't fall at a segment boundary
>* - and wouldn't make the resulting output segment too long
>*/
> - if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> + if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
>   (max_len - cur_len >= s_length)) {
>   /* ...then concatenate it with the previous one */
>   cur_len += s_length;
> @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
> scatterlist *sg, int nents,
>   if (s_length + s_iova_off < s_iova_len)
>   cur_len = 0;
>   }
> + WARN_ON(count < nents);
>   return count;
> }

--
Chuck Lever



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Robin Murphy

On 11/02/2020 4:03 pm, Chuck Lever wrote:




On Feb 11, 2020, at 10:32 AM, Robin Murphy  wrote:

On 11/02/2020 3:24 pm, Chuck Lever wrote:

On Feb 11, 2020, at 10:12 AM, Robin Murphy  wrote:

On 11/02/2020 1:48 pm, Chuck Lever wrote:

Andre-
Thank you for the detailed report!
Tom-
There is a rich set of trace points available in the RPC/RDMA implementation in 
5.4/5.5, fwiw.
Please keep me in the loop, let me know if there is anything I can do to help.


One aspect that may be worth checking is whether there's anywhere that assumes 
a successful return value from dma_map_sg() is always the same as the number of 
entries passed in - that's the most obvious way the iommu-dma code differs 
(legitimately) from the previous amd-iommu implementation.

net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
317 mr->mr_nents =
318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
319 if (!mr->mr_nents)
320 goto out_dmamap_err;
Should that rather be "if (mr->mr_nents != i)" ?


No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such 
that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents 
to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a 
glance like an example of how things may start to get out-of-whack.


Robin, your explanation makes sense to me. I can post a fix for this imbalance 
later today for Andre to try.


FWIW here's a quick hack which *should* suppress the concatenation 
behaviour - if it makes Andre's system any happier then that would 
indeed point towards dma_map_sg() handling being the culprit.


Robin.

->8-
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..a6b71bad518e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,

 * - but doesn't fall at a segment boundary
 * - and wouldn't make the resulting output segment too long
 */
-   if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
+   if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
(max_len - cur_len >= s_length)) {
/* ...then concatenate it with the previous one */
cur_len += s_length;
@@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,

if (s_length + s_iova_off < s_iova_len)
cur_len = 0;
}
+   WARN_ON(count < nents);
return count;
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Chuck Lever



> On Feb 11, 2020, at 10:32 AM, Robin Murphy  wrote:
> 
> On 11/02/2020 3:24 pm, Chuck Lever wrote:
>>> On Feb 11, 2020, at 10:12 AM, Robin Murphy  wrote:
>>> 
>>> On 11/02/2020 1:48 pm, Chuck Lever wrote:
 Andre-
 Thank you for the detailed report!
 Tom-
 There is a rich set of trace points available in the RPC/RDMA 
 implementation in 5.4/5.5, fwiw.
 Please keep me in the loop, let me know if there is anything I can do to 
 help.
>>> 
>>> One aspect that may be worth checking is whether there's anywhere that 
>>> assumes a successful return value from dma_map_sg() is always the same as 
>>> the number of entries passed in - that's the most obvious way the iommu-dma 
>>> code differs (legitimately) from the previous amd-iommu implementation.
>> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()
>> 317 mr->mr_nents =
>> 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, 
>> mr->mr_dir);
>> 319 if (!mr->mr_nents)
>> 320 goto out_dmamap_err;
>> Should that rather be "if (mr->mr_nents != i)" ?
> 
> No, that much is OK - the point is that dma_map_sg() may pack the DMA 
> addresses such that sg_dma_len(sg) > sg->length - however, subsequently 
> passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than 
> the original value of i) looks at a glance like an example of how things may 
> start to get out-of-whack.

Robin, your explanation makes sense to me. I can post a fix for this imbalance 
later today for Andre to try.


--
Chuck Lever



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-11 Thread Derrick, Jonathan
Hi Daniel

On Tue, 2020-02-11 at 17:13 +0800, Daniel Drake wrote:
> The PCI devices handled by intel-iommu may have a DMA requester on
> another bus, such as VMD subdevices needing to use the VMD endpoint.
> 
> The real DMA device is now used for the DMA mapping, but one case was
> missed earlier, when allocating memory through (e.g.) intel_map_page().
> Confusion ensues if the iommu domain type for the subdevice does not match
> the iommu domain type for the real DMA device.
Is there a way to force this situation for my testing? 

> 
> For example, take the case of the subdevice handled by intel_map_page()
> in a IOMMU_DOMAIN_DMA, with the real DMA device in a
> IOMMU_DOMAIN_IDENTITY:
> 
> 1. intel_map_page() checks if an IOMMU mapping is needed by calling
>iommu_need_mapping() on the subdevice. Result: mapping is needed.
> 2. __intel_map_single() is called to create the mapping:
>   - __intel_map_single() calls find_domain(). This function now returns
> the IDENTITY domain corresponding to the real DMA device.
>   - __intel_map_single() then calls domain_get_iommu() on this "real"
> domain. A failure is hit and the entire operation is aborted, because
> this codepath is not intended to handle IDENTITY mappings:
> if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
>return NULL;
> 
> Fix this by using the real DMA device when checking if a mapping is
> needed. The above case will then directly fall back on
> dma_direct_map_page().
> 
> Fixes: 2b0140c69637 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
> Signed-off-by: Daniel Drake 
> ---
> 
> Notes:
> This problem was detected with a non-upstream patch
> "PCI: Add Intel remapped NVMe device support"
> (https://marc.info/?l=linux-ide=156015271021615=2)
> 
> This patch creates PCI devices in the same way as VMD, and hence
> I believe VMD would hit this class of problem for any cases where
> iommu domain type may mismatch between subdevice and real device,
> which we have run into here.
> 
> However this hasn't actually been tested on VMD (don't have the hardware)
> so if I've missed anything and/or it's not a real issue then feel free to
> drop this patch.
> 
>  drivers/iommu/intel-iommu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 9dc37672bf89..713810f8350c 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3587,6 +3587,9 @@ static bool iommu_need_mapping(struct device *dev)
>   if (iommu_dummy(dev))
>   return false;
>  
> + if (dev_is_pci(dev))
> + dev = _real_dma_dev(to_pci_dev(dev))->dev;
> +
>   ret = identity_mapping(dev);
>   if (ret) {
>   u64 dma_mask = *dev->dma_mask;
This will be a problem. We really want to use the subdevice's dma mask
in case there's a situation where the subdevice only supports 32-bit
dma (with the real dma requester having a 64-bit dma mask)



Would this work?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc3767..8f35e6b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3582,19 +3582,24 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap process.*/
 static bool iommu_need_mapping(struct device *dev)
 {
+   u64 dma_mask, required_dma_mask;
int ret;
 
if (iommu_dummy(dev))
return false;
 
-   ret = identity_mapping(dev);
-   if (ret) {
-   u64 dma_mask = *dev->dma_mask;
+   dma_mask = *dev->dma_mask;
+   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
+   dma_mask = dev->coherent_dma_mask;
 
-   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
-   dma_mask = dev->coherent_dma_mask;
+   required_dma_mask = dma_direct_get_required_mask(dev);
 
-   if (dma_mask >= dma_direct_get_required_mask(dev))
+   if (dev_is_pci(dev))
+   dev = _real_dma_dev(to_pci_dev(dev))->dev;
+
+   ret = identity_mapping(dev);
+   if (ret) {
+   if (dma_mask >= required_dma_mask)
return false;
 
/*
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Robin Murphy

On 11/02/2020 3:24 pm, Chuck Lever wrote:




On Feb 11, 2020, at 10:12 AM, Robin Murphy  wrote:

On 11/02/2020 1:48 pm, Chuck Lever wrote:

Andre-
Thank you for the detailed report!
Tom-
There is a rich set of trace points available in the RPC/RDMA implementation in 
5.4/5.5, fwiw.
Please keep me in the loop, let me know if there is anything I can do to help.


One aspect that may be worth checking is whether there's anywhere that assumes 
a successful return value from dma_map_sg() is always the same as the number of 
entries passed in - that's the most obvious way the iommu-dma code differs 
(legitimately) from the previous amd-iommu implementation.


net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()

317 mr->mr_nents =
318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
319 if (!mr->mr_nents)
320 goto out_dmamap_err;

Should that rather be "if (mr->mr_nents != i)" ?


No, that much is OK - the point is that dma_map_sg() may pack the DMA 
addresses such that sg_dma_len(sg) > sg->length - however, subsequently 
passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather 
than the original value of i) looks at a glance like an example of how 
things may start to get out-of-whack.


Robin.


On Feb 11, 2020, at 2:25 AM, Joerg Roedel  wrote:

Adding Tom's new email address.

Tom, can you have a look, please?
https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
issue.

On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:

Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
stopped working. But only on my AMD Ryzen systems. And so far only NFS,
curiously other RDMA diagnostic tools (like qperf  -cm1 rc_bw) work
fine.

A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
Convert AMD iommu driver to the dma-iommu api

5.5.3-rc1, 5.6-rc1 are also not working.

I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
everything work again.

The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
fancy besides the RoCEv1 and related bits network bits like PFC and storage
VLAN. Bare metal, no virtualization.

The impacted systems are:
ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201

pcaps off a mirror port can be provided. They show that on 5.5.x, CM
succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
both acked, and then the rest just never goes out from the client until the
mount times out and CM is torn down.

No messages shows up in the kernel log on either side. I was at least
expecting some scary IOMMU warnings.

More serious hardware is not available for RDMA testing currently, so I dont
know if a EPYC system or newer mlx5 cards would have similar issues. Intel
I've only tested as server so far, that worked fine, as expected given the
bisect result.



git bisect start
# bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
# good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
# good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
# bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
# good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag 
'5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
# good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
# bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of 
git://git.lwn.net/linux
git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
# bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 
'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
# good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate 
refault stalls from swap_readpage
git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
# good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 
'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
# good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove 
duplicate error message
git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
# bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple 
PCI DMA 

Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Chuck Lever



> On Feb 11, 2020, at 10:12 AM, Robin Murphy  wrote:
> 
> On 11/02/2020 1:48 pm, Chuck Lever wrote:
>> Andre-
>> Thank you for the detailed report!
>> Tom-
>> There is a rich set of trace points available in the RPC/RDMA implementation 
>> in 5.4/5.5, fwiw.
>> Please keep me in the loop, let me know if there is anything I can do to 
>> help.
> 
> One aspect that may be worth checking is whether there's anywhere that 
> assumes a successful return value from dma_map_sg() is always the same as the 
> number of entries passed in - that's the most obvious way the iommu-dma code 
> differs (legitimately) from the previous amd-iommu implementation.

net/sunrpc/xprtrdma/frwr_ops.c: frwr_map()

317 mr->mr_nents =
318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir);
319 if (!mr->mr_nents)
320 goto out_dmamap_err;

Should that rather be "if (mr->mr_nents != i)" ?


> Robin.
> 
>>> On Feb 11, 2020, at 2:25 AM, Joerg Roedel  wrote:
>>> 
>>> Adding Tom's new email address.
>>> 
>>> Tom, can you have a look, please?
>>> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
>>> issue.
>>> 
>>> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
 Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
 stopped working. But only on my AMD Ryzen systems. And so far only NFS,
 curiously other RDMA diagnostic tools (like qperf  -cm1 rc_bw) work
 fine.
 
 A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
 Convert AMD iommu driver to the dma-iommu api
 
 5.5.3-rc1, 5.6-rc1 are also not working.
 
 I verified it by booting with amd_iommu=off on the kernel cmdline - it 
 makes
 everything work again.
 
 The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
 RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
 fancy besides the RoCEv1 and related bits network bits like PFC and storage
 VLAN. Bare metal, no virtualization.
 
 The impacted systems are:
 ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
 ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
 
 pcaps off a mirror port can be provided. They show that on 5.5.x, CM
 succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
 both acked, and then the rest just never goes out from the client until the
 mount times out and CM is torn down.
 
 No messages shows up in the kernel log on either side. I was at least
 expecting some scary IOMMU warnings.
 
 More serious hardware is not available for RDMA testing currently, so I 
 dont
 know if a EPYC system or newer mlx5 cards would have similar issues. Intel
 I've only tested as server so far, that worked fine, as expected given the
 bisect result.
 
 
> git bisect start
> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag 
> '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 
> 'x86-urgent-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' 
> of git://git.lwn.net/linux
> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 
> 'iommu-updates-v5.5' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate 
> refault stalls from swap_readpage
> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 
> 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove 
> duplicate error message
> git bisect good 

Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Robin Murphy

On 11/02/2020 1:48 pm, Chuck Lever wrote:

Andre-

Thank you for the detailed report!

Tom-

There is a rich set of trace points available in the RPC/RDMA implementation in 
5.4/5.5, fwiw.
Please keep me in the loop, let me know if there is anything I can do to help.


One aspect that may be worth checking is whether there's anywhere that 
assumes a successful return value from dma_map_sg() is always the same 
as the number of entries passed in - that's the most obvious way the 
iommu-dma code differs (legitimately) from the previous amd-iommu 
implementation.


Robin.


On Feb 11, 2020, at 2:25 AM, Joerg Roedel  wrote:

Adding Tom's new email address.

Tom, can you have a look, please?
https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
issue.

On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:

Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
stopped working. But only on my AMD Ryzen systems. And so far only NFS,
curiously other RDMA diagnostic tools (like qperf  -cm1 rc_bw) work
fine.

A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
Convert AMD iommu driver to the dma-iommu api

5.5.3-rc1, 5.6-rc1 are also not working.

I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
everything work again.

The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
fancy besides the RoCEv1 and related bits network bits like PFC and storage
VLAN. Bare metal, no virtualization.

The impacted systems are:
ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201

pcaps off a mirror port can be provided. They show that on 5.5.x, CM
succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
both acked, and then the rest just never goes out from the client until the
mount times out and CM is torn down.

No messages shows up in the kernel log on either side. I was at least
expecting some scary IOMMU warnings.

More serious hardware is not available for RDMA testing currently, so I dont
know if a EPYC system or newer mlx5 cards would have similar issues. Intel
I've only tested as server so far, that worked fine, as expected given the
bisect result.



git bisect start
# bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
# good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
# good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
# bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
# good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag 
'5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
# good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
# bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of 
git://git.lwn.net/linux
git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
# bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 
'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
# good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate 
refault stalls from swap_readpage
git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
# good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 
'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
# good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove 
duplicate error message
git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
# bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple 
PCI DMA aliases in IRQ Remapping
git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
# bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu 
driver to the dma-iommu api
git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
# good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to 
iommu_ops::map
git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
# good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the 
dev->coherent_dma_mask
git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
# first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: 
Convert AMD iommu driver to the dma-iommu api


--
Chuck Lever




Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)

2020-02-11 Thread Chuck Lever
Andre-

Thank you for the detailed report!

Tom-

There is a rich set of trace points available in the RPC/RDMA implementation in 
5.4/5.5, fwiw.
Please keep me in the loop, let me know if there is anything I can do to help.


> On Feb 11, 2020, at 2:25 AM, Joerg Roedel  wrote:
> 
> Adding Tom's new email address.
> 
> Tom, can you have a look, please? 
> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar
> issue.
> 
> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote:
>> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA
>> stopped working. But only on my AMD Ryzen systems. And so far only NFS,
>> curiously other RDMA diagnostic tools (like qperf  -cm1 rc_bw) work
>> fine.
>> 
>> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd:
>> Convert AMD iommu driver to the dma-iommu api
>> 
>> 5.5.3-rc1, 5.6-rc1 are also not working.
>> 
>> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes
>> everything work again.
>> 
>> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over
>> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing
>> fancy besides the RoCEv1 and related bits network bits like PFC and storage
>> VLAN. Bare metal, no virtualization.
>> 
>> The impacted systems are:
>> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002
>> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201
>> 
>> pcaps off a mirror port can be provided. They show that on 5.5.x, CM
>> succeeds, and then a couple of NFS NULL calls comes through (over RoCE),
>> both acked, and then the rest just never goes out from the client until the
>> mount times out and CM is torn down.
>> 
>> No messages shows up in the kernel log on either side. I was at least
>> expecting some scary IOMMU warnings.
>> 
>> More serious hardware is not available for RDMA testing currently, so I dont
>> know if a EPYC system or newer mlx5 cards would have similar issues. Intel
>> I've only tested as server so far, that worked fine, as expected given the
>> bisect result.
>> 
>> 
>>> git bisect start
>>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
>>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
>>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
>>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
>>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
>>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' 
>>> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
>>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag 
>>> '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
>>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
>>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 
>>> 'x86-urgent-for-linus' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4
>>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of 
>>> git://git.lwn.net/linux
>>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048
>>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 
>>> 'iommu-updates-v5.5' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
>>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7
>>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate 
>>> refault stalls from swap_readpage
>>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2
>>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 
>>> 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
>>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
>>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove 
>>> duplicate error message
>>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9
>>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support 
>>> multiple PCI DMA aliases in IRQ Remapping
>>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae
>>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD 
>>> iommu driver to the dma-iommu api
>>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7
>>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter 
>>> to iommu_ops::map
>>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995
>>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the 
>>> dev->coherent_dma_mask
>>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a
>>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: 
>>> Convert AMD iommu driver to the dma-iommu api

--
Chuck Lever




Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver

2020-02-11 Thread Maxime Ripard
Hi Robin,

On Mon, Jan 27, 2020 at 07:01:02PM +, Robin Murphy wrote:
> > > > +static void sun50i_iommu_domain_free(struct iommu_domain *domain)
> > > > +{
> > > > +   struct sun50i_iommu_domain *sun50i_domain = 
> > > > to_sun50i_domain(domain);
> > > > +   struct sun50i_iommu *iommu = sun50i_domain->iommu;
> > > > +   unsigned long flags;
> > > > +   int i;
> > > > +
> > > > +   spin_lock_irqsave(_domain->dt_lock, flags);
> > > > +
> > > > +   for (i = 0; i < NUM_DT_ENTRIES; i++) {
> > > > +   phys_addr_t pt_phys;
> > > > +   u32 *page_table;
> > > > +   u32 *dte_addr;
> > > > +   u32 dte;
> > > > +
> > > > +   dte_addr = _domain->dt[i];
> > > > +   dte = *dte_addr;
> > > > +   if (!sun50i_dte_is_pt_valid(dte))
> > > > +   continue;
> > > > +
> > > > +   memset(dte_addr, 0, sizeof(*dte_addr));
> > > > +   sun50i_table_flush(sun50i_domain, 
> > > > virt_to_phys(dte_addr), 1);
> > >
> > > This shouldn't be necessary - freeing a domain while it's still live is an
> > > incredibly very wrong thing to do, so the hardware should have already 
> > > been
> > > programmed to no longer walk this table by this point.
> >
> > We never "garbage collect" and remove the dte for the page table we
> > don't use anymore elsewhere though, so couldn't we end up in a
> > situation where we don't have a page table (because it has been freed)
> > at the other end of our dte, but the IOMMU doesn't know about it since
> > we never flushed?
>
> Let me reiterate: at the point of freeing, the assumption is that this
> domain should be long dissociated from the hardware. Any actual invalidation
> should have already happened at the point the TTB was disabled or pointed to
> some other domain, therefore fiddling with pagetable pages just before you
> free them back to the kernel is just pointless busywork.
>
> If the TTB *was* still live here, then as soon as you call free_pages()
> below the DT memory could get reallocated by someone else and filled with
> data that happens to look like valid pagetables, so even if you invalidate
> the TLBs the hardware could just immediately go walk that data and refill
> them with nonsense, thus any pretence at invalidation is in vain anyway.

Thanks, that makes a lot of sense.

> The fly in the soup, however, is that default domains appear to be lacking
> proper detach notifications (I hadn't consciously noticed that before), so
> if you've been looking at the iommu_group_release() path it might have given
> the wrong impression. So what might be justifiable here is to check if the
> domain being freed is the one currently active in hardware, and if so
> perform a detach (i.e. disable the TTB and invalidate everything) first,
> then free everything as normal. Or just handwave that we essentially never
> free a default domain anyway so it's OK to just assume that we're not
> freeing anything live.

I'm still a bit unsure as of what to do exactly here. I haven't found
a hook that would be called when a given domain doesn't have any
devices attached to it. Or did you mean that I should just create a
separate function, not part of the IOMMU ops?

> > > > +
> > > > +   if (iommu->domain == domain)
> > > > +   return 0;
> > > > +
> > > > +   if (iommu->domain)
> > > > +   sun50i_iommu_detach_device(iommu->domain, dev);
> > > > +
> > > > +   iommu->domain = domain;
> > > > +   sun50i_domain->iommu = iommu;
> > > > +
> > > > +   return pm_runtime_get_sync(iommu->dev);
> > >
> > > Deferring all the actual hardware pogramming to the suspend/resume hooks 
> > > is
> > > a fiendishly clever idea that took me more than a moment to make sense of,
> > > but how well does it work when RPM is compiled out or runtime-inhibited?
> >
> > We have a bunch of other controllers that require runtime_pm already,
> > so it's going to be enabled. But that should be expressed in Kconfig.
>
> But it can still be inhibited from sysfs, right? We don't want driver
> behaviour to be *unnecessarily* fragile to user actions, however silly they
> may be.

That's a good point :)

> > > Furthermore, basing RPM on having a domain attached means that
> > > you'll effectively never turn the IOMMU off, even when all the
> > > clients are idle. It would make more sene to use device links like
> > > most other drivers do to properly model the producer/consumer
> > > relationship.
> >
> > I'm not familiar with device links for runtime_pm, I thought this was
> > only useful for system-wide resume and suspend?
>
> See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full
> advantage of this.

I'll look into it, thanks!

> > > > +static int __maybe_unused sun50i_iommu_resume(struct device *dev)
> > > > +{
> > > > +   struct sun50i_iommu_domain *sun50i_domain;
> > > > +   struct sun50i_iommu *iommu;
> > > > +   unsigned 

[PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-11 Thread Daniel Drake
The PCI devices handled by intel-iommu may have a DMA requester on
another bus, such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was
missed earlier, when allocating memory through (e.g.) intel_map_page().
Confusion ensues if the iommu domain type for the subdevice does not match
the iommu domain type for the real DMA device.

For example, take the case of the subdevice handled by intel_map_page()
in a IOMMU_DOMAIN_DMA, with the real DMA device in a
IOMMU_DOMAIN_IDENTITY:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
   iommu_need_mapping() on the subdevice. Result: mapping is needed.
2. __intel_map_single() is called to create the mapping:
  - __intel_map_single() calls find_domain(). This function now returns
the IDENTITY domain corresponding to the real DMA device.
  - __intel_map_single() then calls domain_get_iommu() on this "real"
domain. A failure is hit and the entire operation is aborted, because
this codepath is not intended to handle IDENTITY mappings:
if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
 return NULL;

Fix this by using the real DMA device when checking if a mapping is
needed. The above case will then directly fall back on
dma_direct_map_page().

Fixes: 2b0140c69637 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Daniel Drake 
---

Notes:
This problem was detected with a non-upstream patch
"PCI: Add Intel remapped NVMe device support"
(https://marc.info/?l=linux-ide=156015271021615=2)

This patch creates PCI devices in the same way as VMD, and hence
I believe VMD would hit this class of problem for any cases where
iommu domain type may mismatch between subdevice and real device,
which we have run into here.

However this hasn't actually been tested on VMD (don't have the hardware)
so if I've missed anything and/or it's not a real issue then feel free to
drop this patch.

 drivers/iommu/intel-iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc37672bf89..713810f8350c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3587,6 +3587,9 @@ static bool iommu_need_mapping(struct device *dev)
if (iommu_dummy(dev))
return false;
 
+   if (dev_is_pci(dev))
+   dev = _real_dma_dev(to_pci_dev(dev))->dev;
+
ret = identity_mapping(dev);
if (ret) {
u64 dma_mask = *dev->dma_mask;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/intel-iommu: set as DUMMY_DEVICE_DOMAIN_INFO if no IOMMU

2020-02-11 Thread Daniel Drake
On Sat, Feb 8, 2020 at 2:30 PM Lu Baolu  wrote:
> > The devices under segment 1 are fake devices produced by
> > intel-nvme-remap mentioned here https://lkml.org/lkml/2020/2/5/139
>
> Has this series been accepted?

Sadly not - we didn't find any consensus on the right approach, and
further conversation is hindered by the questionable hardware design
and lack of further communication from Intel in explaining it. It's
one of the exceptional cases where we carry a significant non-upstream
kernel change, because unfortunately most of the current day consumer
PCs we work with have this BIOS option on by default and hence
unmodified Linux can't access the storage devices. On the offchance
that you have some energy to bubble this up inside Intel please let me
know and we will talk more... :)

That said, this thread was indeed only opened since we thought we had
found a more general issue that would potentially affect other cases.
The issue described does seem to highlight a possible imperfection in
code flow, although it may also be reasonable to say that (without
crazy downstream patches at play) if intel_iommu_add_device() fails
then we have bigger problems to face anyway...

> Will this help here? https://www.spinics.net/lists/iommu/msg41300.html

Yes! Very useful info and a real improvement. We'll follow the same
approach here. That does solve the problem we were facing, although we
needed one more fixup which I've sent separately for your
consideration: iommu/vt-d: consider real PCI device when checking if
mapping is needed

Thanks!
Daniel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/dmar: ignore devices with out-of-spec domain number

2020-02-11 Thread Daniel Drake
VMD subdevices are created with a PCI domain ID of 0x1 or
higher.

These subdevices are also handled like all other PCI devices by
dmar_pci_bus_notifier().

However, when dmar_alloc_pci_notify_info() take records of such devices,
it will truncate the domain ID to a u16 value (in info->seg).
The device at (e.g.) 1:00:02.0 is then treated by the DMAR code as if
it is :00:02.0.

In the unlucky event that a real device also exists at :00:02.0 and
also has a device-specific entry in the DMAR table,
dmar_insert_dev_scope() will crash on:
   BUG_ON(i >= devices_cnt);

That's basically a sanity check that only one PCI device matches a
single DMAR entry; in this case we seem to have two matching devices.

Fix this by ignoring devices that have a domain number higher than
what can be looked up in the DMAR table.

This problem was carefully diagnosed by Jian-Hong Pan.

Signed-off-by: Daniel Drake 
---

Notes:
This problem was detected with a non-upstream patch
"PCI: Add Intel remapped NVMe device support"
(https://marc.info/?l=linux-ide=156015271021615=2)

This patch creates PCI devices in the same way as VMD, and hence
I believe VMD would hit the same problem that we encountered here, when
a VMD-using product comes along that meets the mentioned conditions.

However this hasn't actually been tested on VMD (don't have the hardware)
so if I've missed anything and/or it's not a real issue then feel free to
drop this patch.

 drivers/iommu/dmar.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 071bb42bbbc5..8f94c817a7b5 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -128,6 +129,13 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
long event)
 
BUG_ON(dev->is_virtfn);
 
+   /*
+* Ignore devices that have a domain number higher than what can
+* be looked up in DMAR, e.g. VMD subdevices with domain 0x1
+*/
+   if (pci_domain_nr(dev->bus) > U16_MAX)
+   return NULL;
+
/* Only generate path[] for device addition event */
if (event == BUS_NOTIFY_ADD_DEVICE)
for (tmp = dev; tmp; tmp = tmp->bus->self)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v12 2/4] uacce: add uacce driver

2020-02-11 Thread zhangfei



On 2020/2/11 上午7:37, Greg Kroah-Hartman wrote:

On Wed, Jan 15, 2020 at 10:12:46PM +0800, Zhangfei Gao wrote:

From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
only data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

The IOMMU core only tracks mm<->device bonds at the moment, because it
only needs to handle IOTLB invalidation and PASID table entries. However
uacce needs a finer granularity since multiple queues from the same
device can be bound to an mm. When the mm exits, all bound queues must
be stopped so that the IOMMU can safely clear the PASID table entry and
reallocate the PASID.

An intermediate struct uacce_mm links uacce devices and queues.
Note that an mm may be bound to multiple devices but an uacce_mm
structure only ever belongs to a single device, because we don't need
anything more complex (if multiple devices are bound to one mm, then
we'll create one uacce_mm for each bond).

 uacce_device --+-- uacce_mm --+-- uacce_queue
|  '-- uacce_queue
|
'-- uacce_mm --+-- uacce_queue
   +-- uacce_queue
   '-- uacce_queue

Reviewed-by: Jonathan Cameron 
Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhangfei Gao 

Looks much saner now, thanks for all of the work on this:

Reviewed-by: Greg Kroah-Hartman 

Or am I supposed to take this in my tree?  If so, I can, but I need an
ack for the crypto parts.



That's Great, thanks Greg.

For the convenience, I rebase the patchset on 5.6-rc1.
Not sure is there any conflict to crypto tree.
How about just pick the uacce part, patch 1 , 2.
We can resend the crypto part to crypto tree.

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu