RE: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter
> -Original Message- > From: Russell King - ARM Linux admin > Sent: Tuesday, February 11, 2020 5:46 PM > To: Leo Li > Cc: Joerg Roedel ; Will Deacon ; > Robin Murphy ; iommu@lists.linux- > foundation.org; linux-ker...@vger.kernel.org; linux-arm- > ker...@lists.infradead.org > Subject: Re: [PATCH] iommu/arm-smmu: fix the module name for > disable_bypass parameter > > On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote: > > Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a > > module"), there is a side effect that the module name is changed from > > arm-smmu to arm-smmu-mod. So the kernel parameter for > disable_bypass > > need to be changed too. Fix the Kconfig help and error message to the > > correct parameter name. > > Hmm, this seems to be a user-visible change - so those of us who have been > booting with "arm-smmu.disable_bypass=0" now need to change that > depending on which kernel is being booted - which is not nice, and makes > the support side on platforms that need this kernel parameter harder. I agree. Probably a better fix is to update the Makefile to change the module name back to the original one. Regards, Leo > > > > > Signed-off-by: Li Yang > > --- > > drivers/iommu/Kconfig| 2 +- > > drivers/iommu/arm-smmu.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index > > d2fade984999..fb54be903c60 100644 > > --- a/drivers/iommu/Kconfig > > +++ b/drivers/iommu/Kconfig > > @@ -415,7 +415,7 @@ config > ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT > > hardcode the bypass disable in the code. > > > > NOTE: the kernel command line parameter > > - 'arm-smmu.disable_bypass' will continue to override this > > + 'arm-smmu-mod.disable_bypass' will continue to override this > > config. > > > > config ARM_SMMU_V3 > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index > > 16c4b87af42b..2ffe8ff04393 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, > void *dev) > > if > (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && > > (gfsr & ARM_SMMU_sGFSR_USF)) > > dev_err(smmu->dev, > > - "Blocked unknown Stream ID 0x%hx; boot > with \"arm-smmu.disable_bypass=0\" to allow, but this may have security > implications\n", > > + "Blocked unknown Stream ID 0x%hx; boot > with > > +\"arm-smmu-mod.disable_bypass=0\" to allow, but this may have > > +security implications\n", > > (u16)gfsynr1); > > else > > dev_err(smmu->dev, > > -- > > 2.17.1 > > > > > > ___ > > linux-arm-kernel mailing list > > linux-arm-ker...@lists.infradead.org > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists > > .infradead.org%2Fmailman%2Flistinfo%2Flinux-arm- > kerneldata=02%7C0 > > > 1%7Cleoyang.li%40nxp.com%7Cf2f7f3c7c8fa4df0fb0608d7af4c84d1%7C686ea > 1d3 > > > bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637170615487429054sdata= > NO4VZ1 > > sSMKyeXiL%2BUc5K6gIW5Uld%2BRsGAICLgI2nnd8%3Dreserved=0 > > > > -- > RMK's Patch system: > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww > .armlinux.org.uk%2Fdeveloper%2Fpatches%2Fdata=02%7C01%7Cleoy > ang.li%40nxp.com%7Cf2f7f3c7c8fa4df0fb0608d7af4c84d1%7C686ea1d3bc2b4 > c6fa92cd99c5c301635%7C0%7C0%7C637170615487429054sdata=eMRT > wZGZPeq3DvkBwjBjGbsS1Qsy3LYMnjH%2B9FJm2aE%3Dreserved=0 > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps > up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed
On Wed, Feb 12, 2020 at 12:03 AM Derrick, Jonathan wrote: > On Tue, 2020-02-11 at 17:13 +0800, Daniel Drake wrote: > > The PCI devices handled by intel-iommu may have a DMA requester on > > another bus, such as VMD subdevices needing to use the VMD endpoint. > > > > The real DMA device is now used for the DMA mapping, but one case was > > missed earlier, when allocating memory through (e.g.) intel_map_page(). > > Confusion ensues if the iommu domain type for the subdevice does not match > > the iommu domain type for the real DMA device. > Is there a way to force this situation for my testing? I think you could hack device_def_domain_type() to return IOMMU_DOMAIN_IDENTITY for the real device, and IOMMU_DOMAIN_DMA for the subdevice. But I got curious as to why my subdevice might be IOMMU_DOMAIN_DMA, so I checked, and found out that my assumptions weren't quite correct. The subdevice has no iommu domain recorded at all. Before applying any patches here, what's actually happening is: 1. Real DMA device gets registered with the iommu as IOMMU_DOMAIN_IDENTITY using si_domain. 2. When the subdevice gets registered, the relevant code flow is inside dmar_insert_one_dev_info(): - it creates a new device_domain_info and domain->domain.type == IDENTITY, but - it then calls find_domain(dev) which successfully defers to the real DMA device and returns the real DMA device's dmar_domain - since found != NULL (dmar_domain was found for this device) the function bails out before setting dev->archdata.iommu The results at this point are that the real DMA device is fully registered as IOMMU_DOMAIN_IDENTITY using si_domain, but all of the subdevices will always have dev->archdata.iommu == NULL. Then when intel_map_page() is reached for the subdevice, it calls iommu_need_mapping() for the subdevice. This calls identity_mapping() on the subdevice, but that will always return 0 because dev->archdata.iommu == NULL. Following on from there, iommu_need_mapping() will then *always* return true (mapping needed) for subdevices. That will then lead to the situation described in my last mail, where later down the allocation chain the request for creating a mapping will be handed towards the real DMA dev, but that will then fail because the real DMA dev is using IOMMU_DOMAIN_IDENTITY where no mapping is needed. Unless I missed anything that seems pretty clear to me now, and I guess the only reason why you may not have already faced this in the vmd case is if the real DMA device is not using IOMMU_DOMAIN_IDENTITY. (To check this, you could log the value of the real dev domain->domain.type in dmar_insert_one_dev_info(), and/or observe the return value of identity_mapping() in iommu_need_mapping for the real dev). In any case it seems increasingly clear to me that iommu_need_mapping() should be consulting the real DMA device in the identity_mapping check, and your patch likewise solves the problem faced here. Thanks Daniel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: fix the module name to be consistent with older kernel
Commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module") introduced a side effect that changed the module name from arm-smmu to arm-smmu-mod. This breaks the users of kernel parameters for the driver (e.g. arm-smmu.disable_bypass). This patch changes the module name back to be consistent. Signed-off-by: Li Yang --- drivers/iommu/Makefile | 4 ++-- drivers/iommu/{arm-smmu.c => arm-smmu-common.c} | 0 2 files changed, 2 insertions(+), 2 deletions(-) rename drivers/iommu/{arm-smmu.c => arm-smmu-common.c} (100%) diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 2104fb8afc06..f39e82e68640 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -14,8 +14,8 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu-mod.o -arm-smmu-mod-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o +arm-smmu-objs := arm-smmu-common.o arm-smmu-impl.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu-common.c similarity index 100% rename from drivers/iommu/arm-smmu.c rename to drivers/iommu/arm-smmu-common.c -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter
On Tue, Feb 11, 2020 at 5:47 PM Russell King - ARM Linux admin wrote: > > On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote: > > Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"), > > there is a side effect that the module name is changed from arm-smmu to > > arm-smmu-mod. So the kernel parameter for disable_bypass need to be > > changed too. Fix the Kconfig help and error message to the correct > > parameter name. > > Hmm, this seems to be a user-visible change - so those of us who have > been booting with "arm-smmu.disable_bypass=0" now need to change that > depending on which kernel is being booted - which is not nice, and > makes the support side on platforms that need this kernel parameter > harder. I have sent a new patch replacing this patch. That patch will keep the command line unchanged. > > > > > Signed-off-by: Li Yang > > --- > > drivers/iommu/Kconfig| 2 +- > > drivers/iommu/arm-smmu.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index d2fade984999..fb54be903c60 100644 > > --- a/drivers/iommu/Kconfig > > +++ b/drivers/iommu/Kconfig > > @@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT > > hardcode the bypass disable in the code. > > > > NOTE: the kernel command line parameter > > - 'arm-smmu.disable_bypass' will continue to override this > > + 'arm-smmu-mod.disable_bypass' will continue to override this > > config. > > > > config ARM_SMMU_V3 > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > index 16c4b87af42b..2ffe8ff04393 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void > > *dev) > > if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && > > (gfsr & ARM_SMMU_sGFSR_USF)) > > dev_err(smmu->dev, > > - "Blocked unknown Stream ID 0x%hx; boot with > > \"arm-smmu.disable_bypass=0\" to allow, but this may have security > > implications\n", > > + "Blocked unknown Stream ID 0x%hx; boot with > > \"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security > > implications\n", > > (u16)gfsynr1); > > else > > dev_err(smmu->dev, > > -- > > 2.17.1 > > > > > > ___ > > linux-arm-kernel mailing list > > linux-arm-ker...@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up > According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance
On 11.02.2020 23:00, Chuck Lever wrote: Hi Andre, thanks for trying this out. On Feb 11, 2020, at 3:50 PM, Andre Tomt wrote: On 11.02.2020 20:58, Chuck Lever wrote: The @nents value that was passed to ib_dma_map_sg() has to be passed to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to concatenate sg entries, it will return a different nents value than it was passed. The bug was exposed by recent changes to the AMD IOMMU driver. This seems to fail differently on my system; mount fails with: mount.nfs: mount system call failed and the kernel log reports: [ 38.890344] NFS: Registering the id_resolver key type [ 38.890351] Key type id_resolver registered [ 38.890352] Key type id_legacy registered [ 38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO [ 38.901817] NFS4: Couldn't follow remote path amd_iommu=off still works One detail I accidentally left out of the original report is that the server (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to matter at this point. Next thing to try: # trace-cmd record -e sunrpc -e rpcrdma then issue the mount command. Once it completes, ^C the trace-cmd and send me trace.dat. Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd overwrites trace.dat, so copy it out between tests). I've uploaded them to https://tomt.net/temp/rdmaiommubug/ I'll probably do a 5.5.3 with the v1 fix as well, should show up momentarily. Reported-by: Andre Tomt Suggested-by: Robin Murphy Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails") Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/frwr_ops.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Hey Andre, please try this out. It just reverts the bit of brokenness that Robin observed this morning. I've done basic testing here with Intel IOMMU systems, no change in behavior (ie, all good to go). diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 095be887753e..449bb51e4fe8 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt, break; } mr->mr_dir = rpcrdma_data_dir(writing); + mr->mr_nents = i; - mr->mr_nents = - ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); - if (!mr->mr_nents) + if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir)) goto out_dmamap_err; ibmr = mr->frwr.fr_mr; -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter
On Tue, Feb 11, 2020 at 05:36:55PM -0600, Li Yang wrote: > Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"), > there is a side effect that the module name is changed from arm-smmu to > arm-smmu-mod. So the kernel parameter for disable_bypass need to be > changed too. Fix the Kconfig help and error message to the correct > parameter name. Hmm, this seems to be a user-visible change - so those of us who have been booting with "arm-smmu.disable_bypass=0" now need to change that depending on which kernel is being booted - which is not nice, and makes the support side on platforms that need this kernel parameter harder. > > Signed-off-by: Li Yang > --- > drivers/iommu/Kconfig| 2 +- > drivers/iommu/arm-smmu.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index d2fade984999..fb54be903c60 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT > hardcode the bypass disable in the code. > > NOTE: the kernel command line parameter > - 'arm-smmu.disable_bypass' will continue to override this > + 'arm-smmu-mod.disable_bypass' will continue to override this > config. > > config ARM_SMMU_V3 > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 16c4b87af42b..2ffe8ff04393 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void > *dev) > if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && > (gfsr & ARM_SMMU_sGFSR_USF)) > dev_err(smmu->dev, > - "Blocked unknown Stream ID 0x%hx; boot with > \"arm-smmu.disable_bypass=0\" to allow, but this may have security > implications\n", > + "Blocked unknown Stream ID 0x%hx; boot with > \"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security > implications\n", > (u16)gfsynr1); > else > dev_err(smmu->dev, > -- > 2.17.1 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: fix the module name for disable_bypass parameter
Since commit cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module"), there is a side effect that the module name is changed from arm-smmu to arm-smmu-mod. So the kernel parameter for disable_bypass need to be changed too. Fix the Kconfig help and error message to the correct parameter name. Signed-off-by: Li Yang --- drivers/iommu/Kconfig| 2 +- drivers/iommu/arm-smmu.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d2fade984999..fb54be903c60 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -415,7 +415,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT hardcode the bypass disable in the code. NOTE: the kernel command line parameter - 'arm-smmu.disable_bypass' will continue to override this + 'arm-smmu-mod.disable_bypass' will continue to override this config. config ARM_SMMU_V3 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 16c4b87af42b..2ffe8ff04393 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -512,7 +512,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev) if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && (gfsr & ARM_SMMU_sGFSR_USF)) dev_err(smmu->dev, - "Blocked unknown Stream ID 0x%hx; boot with \"arm-smmu.disable_bypass=0\" to allow, but this may have security implications\n", + "Blocked unknown Stream ID 0x%hx; boot with \"arm-smmu-mod.disable_bypass=0\" to allow, but this may have security implications\n", (u16)gfsynr1); else dev_err(smmu->dev, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance
Hi Andre, thanks for trying this out. > On Feb 11, 2020, at 3:50 PM, Andre Tomt wrote: > > On 11.02.2020 20:58, Chuck Lever wrote: >> The @nents value that was passed to ib_dma_map_sg() has to be passed >> to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to >> concatenate sg entries, it will return a different nents value than >> it was passed. >> The bug was exposed by recent changes to the AMD IOMMU driver. > > This seems to fail differently on my system; mount fails with: > mount.nfs: mount system call failed > > and the kernel log reports: > [ 38.890344] NFS: Registering the id_resolver key type > [ 38.890351] Key type id_resolver registered > [ 38.890352] Key type id_legacy registered > [ 38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting > with error EIO > [ 38.901817] NFS4: Couldn't follow remote path > > amd_iommu=off still works > > One detail I accidentally left out of the original report is that the server > (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients > are Ubuntu 19.10 userspace. Although I dont believe this to matter at this > point. Next thing to try: # trace-cmd record -e sunrpc -e rpcrdma then issue the mount command. Once it completes, ^C the trace-cmd and send me trace.dat. Try this with both the v5.4 kernel and the v5.5 kernel (and note that trace-cmd overwrites trace.dat, so copy it out between tests). >> Reported-by: Andre Tomt >> Suggested-by: Robin Murphy >> Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails") >> Signed-off-by: Chuck Lever >> --- >> net/sunrpc/xprtrdma/frwr_ops.c |5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> Hey Andre, please try this out. It just reverts the bit of brokenness that >> Robin observed this morning. I've done basic testing here with Intel >> IOMMU systems, no change in behavior (ie, all good to go). >> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c >> index 095be887753e..449bb51e4fe8 100644 >> --- a/net/sunrpc/xprtrdma/frwr_ops.c >> +++ b/net/sunrpc/xprtrdma/frwr_ops.c >> @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt >> *r_xprt, >> break; >> } >> mr->mr_dir = rpcrdma_data_dir(writing); >> +mr->mr_nents = i; >> - mr->mr_nents = >> -ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); >> -if (!mr->mr_nents) >> +if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir)) >> goto out_dmamap_err; >> ibmr = mr->frwr.fr_mr; > -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance
On 11.02.2020 20:58, Chuck Lever wrote: The @nents value that was passed to ib_dma_map_sg() has to be passed to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to concatenate sg entries, it will return a different nents value than it was passed. The bug was exposed by recent changes to the AMD IOMMU driver. This seems to fail differently on my system; mount fails with: mount.nfs: mount system call failed and the kernel log reports: [ 38.890344] NFS: Registering the id_resolver key type [ 38.890351] Key type id_resolver registered [ 38.890352] Key type id_legacy registered [ 38.901799] NFS: nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO [ 38.901817] NFS4: Couldn't follow remote path amd_iommu=off still works One detail I accidentally left out of the original report is that the server (intel system) is running Ubuntu 20.04 ("beta") userspace, and AMD clients are Ubuntu 19.10 userspace. Although I dont believe this to matter at this point. Reported-by: Andre Tomt Suggested-by: Robin Murphy Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails") Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/frwr_ops.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Hey Andre, please try this out. It just reverts the bit of brokenness that Robin observed this morning. I've done basic testing here with Intel IOMMU systems, no change in behavior (ie, all good to go). diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 095be887753e..449bb51e4fe8 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt, break; } mr->mr_dir = rpcrdma_data_dir(writing); + mr->mr_nents = i; - mr->mr_nents = - ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); - if (!mr->mr_nents) + if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir)) goto out_dmamap_err; ibmr = mr->frwr.fr_mr; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1] xprtrdma: Fix DMA scatter-gather list mapping imbalance
The @nents value that was passed to ib_dma_map_sg() has to be passed to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to concatenate sg entries, it will return a different nents value than it was passed. The bug was exposed by recent changes to the AMD IOMMU driver. Reported-by: Andre Tomt Suggested-by: Robin Murphy Fixes: 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails") Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/frwr_ops.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Hey Andre, please try this out. It just reverts the bit of brokenness that Robin observed this morning. I've done basic testing here with Intel IOMMU systems, no change in behavior (ie, all good to go). diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 095be887753e..449bb51e4fe8 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -313,10 +313,9 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt *r_xprt, break; } mr->mr_dir = rpcrdma_data_dir(writing); + mr->mr_nents = i; - mr->mr_nents = - ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); - if (!mr->mr_nents) + if (!ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir)) goto out_dmamap_err; ibmr = mr->frwr.fr_mr; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
On 11.02.2020 17:36, Robin Murphy wrote: On 11/02/2020 4:03 pm, Chuck Lever wrote: Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try. FWIW here's a quick hack which *should* suppress the concatenation behaviour - if it makes Andre's system any happier then that would indeed point towards dma_map_sg() handling being the culprit. Robin. This hack do indeed make things work again. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5fd9a7..a6b71bad518e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, * - but doesn't fall at a segment boundary * - and wouldn't make the resulting output segment too long */ - if (cur_len && !s_iova_off && (dma_addr & seg_mask) && + if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) && (max_len - cur_len >= s_length)) { /* ...then concatenate it with the previous one */ cur_len += s_length; @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, if (s_length + s_iova_off < s_iova_len) cur_len = 0; } + WARN_ON(count < nents); return count; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
> On Feb 11, 2020, at 11:36 AM, Robin Murphy wrote: > > On 11/02/2020 4:03 pm, Chuck Lever wrote: >>> On Feb 11, 2020, at 10:32 AM, Robin Murphy wrote: >>> >>> On 11/02/2020 3:24 pm, Chuck Lever wrote: > On Feb 11, 2020, at 10:12 AM, Robin Murphy wrote: > > On 11/02/2020 1:48 pm, Chuck Lever wrote: >> Andre- >> Thank you for the detailed report! >> Tom- >> There is a rich set of trace points available in the RPC/RDMA >> implementation in 5.4/5.5, fwiw. >> Please keep me in the loop, let me know if there is anything I can do to >> help. > > One aspect that may be worth checking is whether there's anywhere that > assumes a successful return value from dma_map_sg() is always the same as > the number of entries passed in - that's the most obvious way the > iommu-dma code differs (legitimately) from the previous amd-iommu > implementation. net/sunrpc/xprtrdma/frwr_ops.c: frwr_map() 317 mr->mr_nents = 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); 319 if (!mr->mr_nents) 320 goto out_dmamap_err; Should that rather be "if (mr->mr_nents != i)" ? >>> >>> No, that much is OK - the point is that dma_map_sg() may pack the DMA >>> addresses such that sg_dma_len(sg) > sg->length - however, subsequently >>> passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than >>> the original value of i) looks at a glance like an example of how things >>> may start to get out-of-whack. >> Robin, your explanation makes sense to me. I can post a fix for this >> imbalance later today for Andre to try. > > FWIW here's a quick hack which *should* suppress the concatenation behaviour > - if it makes Andre's system any happier then that would indeed point towards > dma_map_sg() handling being the culprit. Even so, 1f541895dae9 ("xprtrdma: Don't defer MR recovery if ro_map fails") looks like it introduced this problem. > Robin. > > ->8- > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index a2e96a5fd9a7..a6b71bad518e 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct > scatterlist *sg, int nents, >* - but doesn't fall at a segment boundary >* - and wouldn't make the resulting output segment too long >*/ > - if (cur_len && !s_iova_off && (dma_addr & seg_mask) && > + if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) && > (max_len - cur_len >= s_length)) { > /* ...then concatenate it with the previous one */ > cur_len += s_length; > @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct > scatterlist *sg, int nents, > if (s_length + s_iova_off < s_iova_len) > cur_len = 0; > } > + WARN_ON(count < nents); > return count; > } -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
On 11/02/2020 4:03 pm, Chuck Lever wrote: On Feb 11, 2020, at 10:32 AM, Robin Murphy wrote: On 11/02/2020 3:24 pm, Chuck Lever wrote: On Feb 11, 2020, at 10:12 AM, Robin Murphy wrote: On 11/02/2020 1:48 pm, Chuck Lever wrote: Andre- Thank you for the detailed report! Tom- There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw. Please keep me in the loop, let me know if there is anything I can do to help. One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation. net/sunrpc/xprtrdma/frwr_ops.c: frwr_map() 317 mr->mr_nents = 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); 319 if (!mr->mr_nents) 320 goto out_dmamap_err; Should that rather be "if (mr->mr_nents != i)" ? No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a glance like an example of how things may start to get out-of-whack. Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try. FWIW here's a quick hack which *should* suppress the concatenation behaviour - if it makes Andre's system any happier then that would indeed point towards dma_map_sg() handling being the culprit. Robin. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5fd9a7..a6b71bad518e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, * - but doesn't fall at a segment boundary * - and wouldn't make the resulting output segment too long */ - if (cur_len && !s_iova_off && (dma_addr & seg_mask) && + if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) && (max_len - cur_len >= s_length)) { /* ...then concatenate it with the previous one */ cur_len += s_length; @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, if (s_length + s_iova_off < s_iova_len) cur_len = 0; } + WARN_ON(count < nents); return count; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
> On Feb 11, 2020, at 10:32 AM, Robin Murphy wrote: > > On 11/02/2020 3:24 pm, Chuck Lever wrote: >>> On Feb 11, 2020, at 10:12 AM, Robin Murphy wrote: >>> >>> On 11/02/2020 1:48 pm, Chuck Lever wrote: Andre- Thank you for the detailed report! Tom- There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw. Please keep me in the loop, let me know if there is anything I can do to help. >>> >>> One aspect that may be worth checking is whether there's anywhere that >>> assumes a successful return value from dma_map_sg() is always the same as >>> the number of entries passed in - that's the most obvious way the iommu-dma >>> code differs (legitimately) from the previous amd-iommu implementation. >> net/sunrpc/xprtrdma/frwr_ops.c: frwr_map() >> 317 mr->mr_nents = >> 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, >> mr->mr_dir); >> 319 if (!mr->mr_nents) >> 320 goto out_dmamap_err; >> Should that rather be "if (mr->mr_nents != i)" ? > > No, that much is OK - the point is that dma_map_sg() may pack the DMA > addresses such that sg_dma_len(sg) > sg->length - however, subsequently > passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than > the original value of i) looks at a glance like an example of how things may > start to get out-of-whack. Robin, your explanation makes sense to me. I can post a fix for this imbalance later today for Andre to try. -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed
Hi Daniel On Tue, 2020-02-11 at 17:13 +0800, Daniel Drake wrote: > The PCI devices handled by intel-iommu may have a DMA requester on > another bus, such as VMD subdevices needing to use the VMD endpoint. > > The real DMA device is now used for the DMA mapping, but one case was > missed earlier, when allocating memory through (e.g.) intel_map_page(). > Confusion ensues if the iommu domain type for the subdevice does not match > the iommu domain type for the real DMA device. Is there a way to force this situation for my testing? > > For example, take the case of the subdevice handled by intel_map_page() > in a IOMMU_DOMAIN_DMA, with the real DMA device in a > IOMMU_DOMAIN_IDENTITY: > > 1. intel_map_page() checks if an IOMMU mapping is needed by calling >iommu_need_mapping() on the subdevice. Result: mapping is needed. > 2. __intel_map_single() is called to create the mapping: > - __intel_map_single() calls find_domain(). This function now returns > the IDENTITY domain corresponding to the real DMA device. > - __intel_map_single() then calls domain_get_iommu() on this "real" > domain. A failure is hit and the entire operation is aborted, because > this codepath is not intended to handle IDENTITY mappings: > if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) >return NULL; > > Fix this by using the real DMA device when checking if a mapping is > needed. The above case will then directly fall back on > dma_direct_map_page(). > > Fixes: 2b0140c69637 ("iommu/vt-d: Use pci_real_dma_dev() for mapping") > Signed-off-by: Daniel Drake > --- > > Notes: > This problem was detected with a non-upstream patch > "PCI: Add Intel remapped NVMe device support" > (https://marc.info/?l=linux-ide=156015271021615=2) > > This patch creates PCI devices in the same way as VMD, and hence > I believe VMD would hit this class of problem for any cases where > iommu domain type may mismatch between subdevice and real device, > which we have run into here. > > However this hasn't actually been tested on VMD (don't have the hardware) > so if I've missed anything and/or it's not a real issue then feel free to > drop this patch. > > drivers/iommu/intel-iommu.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 9dc37672bf89..713810f8350c 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -3587,6 +3587,9 @@ static bool iommu_need_mapping(struct device *dev) > if (iommu_dummy(dev)) > return false; > > + if (dev_is_pci(dev)) > + dev = _real_dma_dev(to_pci_dev(dev))->dev; > + > ret = identity_mapping(dev); > if (ret) { > u64 dma_mask = *dev->dma_mask; This will be a problem. We really want to use the subdevice's dma mask in case there's a situation where the subdevice only supports 32-bit dma (with the real dma requester having a 64-bit dma mask) Would this work? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc3767..8f35e6b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3582,19 +3582,24 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap process.*/ static bool iommu_need_mapping(struct device *dev) { + u64 dma_mask, required_dma_mask; int ret; if (iommu_dummy(dev)) return false; - ret = identity_mapping(dev); - if (ret) { - u64 dma_mask = *dev->dma_mask; + dma_mask = *dev->dma_mask; + if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) + dma_mask = dev->coherent_dma_mask; - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) - dma_mask = dev->coherent_dma_mask; + required_dma_mask = dma_direct_get_required_mask(dev); - if (dma_mask >= dma_direct_get_required_mask(dev)) + if (dev_is_pci(dev)) + dev = _real_dma_dev(to_pci_dev(dev))->dev; + + ret = identity_mapping(dev); + if (ret) { + if (dma_mask >= required_dma_mask) return false; /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
On 11/02/2020 3:24 pm, Chuck Lever wrote: On Feb 11, 2020, at 10:12 AM, Robin Murphy wrote: On 11/02/2020 1:48 pm, Chuck Lever wrote: Andre- Thank you for the detailed report! Tom- There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw. Please keep me in the loop, let me know if there is anything I can do to help. One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation. net/sunrpc/xprtrdma/frwr_ops.c: frwr_map() 317 mr->mr_nents = 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); 319 if (!mr->mr_nents) 320 goto out_dmamap_err; Should that rather be "if (mr->mr_nents != i)" ? No, that much is OK - the point is that dma_map_sg() may pack the DMA addresses such that sg_dma_len(sg) > sg->length - however, subsequently passing that mr->nents to dma_unmap_sg() in frwr_mr_recycle() (rather than the original value of i) looks at a glance like an example of how things may start to get out-of-whack. Robin. On Feb 11, 2020, at 2:25 AM, Joerg Roedel wrote: Adding Tom's new email address. Tom, can you have a look, please? https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar issue. On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote: Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA stopped working. But only on my AMD Ryzen systems. And so far only NFS, curiously other RDMA diagnostic tools (like qperf -cm1 rc_bw) work fine. A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd: Convert AMD iommu driver to the dma-iommu api 5.5.3-rc1, 5.6-rc1 are also not working. I verified it by booting with amd_iommu=off on the kernel cmdline - it makes everything work again. The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing fancy besides the RoCEv1 and related bits network bits like PFC and storage VLAN. Bare metal, no virtualization. The impacted systems are: ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002 ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201 pcaps off a mirror port can be provided. They show that on 5.5.x, CM succeeds, and then a couple of NFS NULL calls comes through (over RoCE), both acked, and then the rest just never goes out from the client until the mount times out and CM is torn down. No messages shows up in the kernel log on either side. I was at least expecting some scary IOMMU warnings. More serious hardware is not available for RDMA testing currently, so I dont know if a EPYC system or newer mlx5 cards would have similar issues. Intel I've only tested as server so far, that worked fine, as expected given the bisect result. git bisect start # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4 git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85 # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820 # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633 # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4 # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048 # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7 # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2 # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58 # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9 # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
> On Feb 11, 2020, at 10:12 AM, Robin Murphy wrote: > > On 11/02/2020 1:48 pm, Chuck Lever wrote: >> Andre- >> Thank you for the detailed report! >> Tom- >> There is a rich set of trace points available in the RPC/RDMA implementation >> in 5.4/5.5, fwiw. >> Please keep me in the loop, let me know if there is anything I can do to >> help. > > One aspect that may be worth checking is whether there's anywhere that > assumes a successful return value from dma_map_sg() is always the same as the > number of entries passed in - that's the most obvious way the iommu-dma code > differs (legitimately) from the previous amd-iommu implementation. net/sunrpc/xprtrdma/frwr_ops.c: frwr_map() 317 mr->mr_nents = 318 ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); 319 if (!mr->mr_nents) 320 goto out_dmamap_err; Should that rather be "if (mr->mr_nents != i)" ? > Robin. > >>> On Feb 11, 2020, at 2:25 AM, Joerg Roedel wrote: >>> >>> Adding Tom's new email address. >>> >>> Tom, can you have a look, please? >>> https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar >>> issue. >>> >>> On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote: Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA stopped working. But only on my AMD Ryzen systems. And so far only NFS, curiously other RDMA diagnostic tools (like qperf -cm1 rc_bw) work fine. A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd: Convert AMD iommu driver to the dma-iommu api 5.5.3-rc1, 5.6-rc1 are also not working. I verified it by booting with amd_iommu=off on the kernel cmdline - it makes everything work again. The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing fancy besides the RoCEv1 and related bits network bits like PFC and storage VLAN. Bare metal, no virtualization. The impacted systems are: ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002 ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201 pcaps off a mirror port can be provided. They show that on 5.5.x, CM succeeds, and then a couple of NFS NULL calls comes through (over RoCE), both acked, and then the rest just never goes out from the client until the mount times out and CM is torn down. No messages shows up in the kernel log on either side. I was at least expecting some scary IOMMU warnings. More serious hardware is not available for RDMA testing currently, so I dont know if a EPYC system or newer mlx5 cards would have similar issues. Intel I've only tested as server so far, that worked fine, as expected given the bisect result. > git bisect start > # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 > git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 > # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4 > git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85 > # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net > git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820 > # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' > of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild > git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba > # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag > '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 > git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633 > # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch > 'x86-urgent-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4 > # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' > of git://git.lwn.net/linux > git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048 > # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag > 'iommu-updates-v5.5' of > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu > git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7 > # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate > refault stalls from swap_readpage > git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2 > # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag > 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma > git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58 > # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove > duplicate error message > git bisect good
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
On 11/02/2020 1:48 pm, Chuck Lever wrote: Andre- Thank you for the detailed report! Tom- There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw. Please keep me in the loop, let me know if there is anything I can do to help. One aspect that may be worth checking is whether there's anywhere that assumes a successful return value from dma_map_sg() is always the same as the number of entries passed in - that's the most obvious way the iommu-dma code differs (legitimately) from the previous amd-iommu implementation. Robin. On Feb 11, 2020, at 2:25 AM, Joerg Roedel wrote: Adding Tom's new email address. Tom, can you have a look, please? https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar issue. On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote: Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA stopped working. But only on my AMD Ryzen systems. And so far only NFS, curiously other RDMA diagnostic tools (like qperf -cm1 rc_bw) work fine. A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd: Convert AMD iommu driver to the dma-iommu api 5.5.3-rc1, 5.6-rc1 are also not working. I verified it by booting with amd_iommu=off on the kernel cmdline - it makes everything work again. The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing fancy besides the RoCEv1 and related bits network bits like PFC and storage VLAN. Bare metal, no virtualization. The impacted systems are: ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002 ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201 pcaps off a mirror port can be provided. They show that on 5.5.x, CM succeeds, and then a couple of NFS NULL calls comes through (over RoCE), both acked, and then the rest just never goes out from the client until the mount times out and CM is torn down. No messages shows up in the kernel log on either side. I was at least expecting some scary IOMMU warnings. More serious hardware is not available for RDMA testing currently, so I dont know if a EPYC system or newer mlx5 cards would have similar issues. Intel I've only tested as server so far, that worked fine, as expected given the bisect result. git bisect start # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4 git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85 # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820 # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633 # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4 # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of git://git.lwn.net/linux git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048 # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7 # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate refault stalls from swap_readpage git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2 # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58 # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove duplicate error message git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9 # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7 # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter to iommu_ops::map git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995 # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the dev->coherent_dma_mask git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD iommu driver to the dma-iommu api -- Chuck Lever
Re: AMD IOMMU stops RDMA NFS from working since kernel 5.5 (bisected)
Andre- Thank you for the detailed report! Tom- There is a rich set of trace points available in the RPC/RDMA implementation in 5.4/5.5, fwiw. Please keep me in the loop, let me know if there is anything I can do to help. > On Feb 11, 2020, at 2:25 AM, Joerg Roedel wrote: > > Adding Tom's new email address. > > Tom, can you have a look, please? > https://bugzilla.kernel.org/show_bug.cgi?id=206461 seems to be a similar > issue. > > On Tue, Feb 11, 2020 at 06:06:54AM +0100, Andre Tomt wrote: >> Since upgrading my RDMA lab from kernel 5.4.x to 5.5.x, NFSv4 over RDMA >> stopped working. But only on my AMD Ryzen systems. And so far only NFS, >> curiously other RDMA diagnostic tools (like qperf -cm1 rc_bw) work >> fine. >> >> A git bisect points to be62dbf554c5b50718a54a359372c148cd9975c7 iommu/amd: >> Convert AMD iommu driver to the dma-iommu api >> >> 5.5.3-rc1, 5.6-rc1 are also not working. >> >> I verified it by booting with amd_iommu=off on the kernel cmdline - it makes >> everything work again. >> >> The NFS config is a pretty simple NFSv4.x only, sec=sys setup, running over >> RoCEv1 on Mellanox mlx4 hardware (ConnectX-3 Pro, fw 2.42.5000). Nothing >> fancy besides the RoCEv1 and related bits network bits like PFC and storage >> VLAN. Bare metal, no virtualization. >> >> The impacted systems are: >> ASUS ROG STRIX X399-E GAMING, with a Threadripper 1950x, BIOS 1002 >> ASUS Pro WS X570-ACE, with a Ryzen 7 3700x, BIOS 1201 >> >> pcaps off a mirror port can be provided. They show that on 5.5.x, CM >> succeeds, and then a couple of NFS NULL calls comes through (over RoCE), >> both acked, and then the rest just never goes out from the client until the >> mount times out and CM is torn down. >> >> No messages shows up in the kernel log on either side. I was at least >> expecting some scary IOMMU warnings. >> >> More serious hardware is not available for RDMA testing currently, so I dont >> know if a EPYC system or newer mlx5 cards would have similar issues. Intel >> I've only tested as server so far, that worked fine, as expected given the >> bisect result. >> >> >>> git bisect start >>> # bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 >>> git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 >>> # good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4 >>> git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85 >>> # good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge >>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net >>> git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820 >>> # bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' >>> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild >>> git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba >>> # good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag >>> '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 >>> git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633 >>> # good: [e5b3fc125d768eacd73bb4dc5019f0ce95635af4] Merge branch >>> 'x86-urgent-for-linus' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >>> git bisect good e5b3fc125d768eacd73bb4dc5019f0ce95635af4 >>> # bad: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a' of >>> git://git.lwn.net/linux >>> git bisect bad 937d6eefc716a9071f0e3bada19200de1bb9d048 >>> # bad: [1daa56bcfd8b329447e0c1b1e91c3925d08489b7] Merge tag >>> 'iommu-updates-v5.5' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu >>> git bisect bad 1daa56bcfd8b329447e0c1b1e91c3925d08489b7 >>> # good: [937790699be9c8100e5358625e7dfa8b32bd33f2] mm/page_io.c: annotate >>> refault stalls from swap_readpage >>> git bisect good 937790699be9c8100e5358625e7dfa8b32bd33f2 >>> # good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag >>> 'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma >>> git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58 >>> # good: [34d1b0895dbd10713c73615d8f532e78509e12d9] iommu/arm-smmu: Remove >>> duplicate error message >>> git bisect good 34d1b0895dbd10713c73615d8f532e78509e12d9 >>> # bad: [3c124435e8dd516df4b2fc983f4415386fd6edae] iommu/amd: Support >>> multiple PCI DMA aliases in IRQ Remapping >>> git bisect bad 3c124435e8dd516df4b2fc983f4415386fd6edae >>> # bad: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: Convert AMD >>> iommu driver to the dma-iommu api >>> git bisect bad be62dbf554c5b50718a54a359372c148cd9975c7 >>> # good: [781ca2de89bae1b1d2c96df9ef33e9a324415995] iommu: Add gfp parameter >>> to iommu_ops::map >>> git bisect good 781ca2de89bae1b1d2c96df9ef33e9a324415995 >>> # good: [6e2350207f40e24884da262976f7fd4fba387e8a] iommu/dma-iommu: Use the >>> dev->coherent_dma_mask >>> git bisect good 6e2350207f40e24884da262976f7fd4fba387e8a >>> # first bad commit: [be62dbf554c5b50718a54a359372c148cd9975c7] iommu/amd: >>> Convert AMD iommu driver to the dma-iommu api -- Chuck Lever
Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver
Hi Robin, On Mon, Jan 27, 2020 at 07:01:02PM +, Robin Murphy wrote: > > > > +static void sun50i_iommu_domain_free(struct iommu_domain *domain) > > > > +{ > > > > + struct sun50i_iommu_domain *sun50i_domain = > > > > to_sun50i_domain(domain); > > > > + struct sun50i_iommu *iommu = sun50i_domain->iommu; > > > > + unsigned long flags; > > > > + int i; > > > > + > > > > + spin_lock_irqsave(_domain->dt_lock, flags); > > > > + > > > > + for (i = 0; i < NUM_DT_ENTRIES; i++) { > > > > + phys_addr_t pt_phys; > > > > + u32 *page_table; > > > > + u32 *dte_addr; > > > > + u32 dte; > > > > + > > > > + dte_addr = _domain->dt[i]; > > > > + dte = *dte_addr; > > > > + if (!sun50i_dte_is_pt_valid(dte)) > > > > + continue; > > > > + > > > > + memset(dte_addr, 0, sizeof(*dte_addr)); > > > > + sun50i_table_flush(sun50i_domain, > > > > virt_to_phys(dte_addr), 1); > > > > > > This shouldn't be necessary - freeing a domain while it's still live is an > > > incredibly very wrong thing to do, so the hardware should have already > > > been > > > programmed to no longer walk this table by this point. > > > > We never "garbage collect" and remove the dte for the page table we > > don't use anymore elsewhere though, so couldn't we end up in a > > situation where we don't have a page table (because it has been freed) > > at the other end of our dte, but the IOMMU doesn't know about it since > > we never flushed? > > Let me reiterate: at the point of freeing, the assumption is that this > domain should be long dissociated from the hardware. Any actual invalidation > should have already happened at the point the TTB was disabled or pointed to > some other domain, therefore fiddling with pagetable pages just before you > free them back to the kernel is just pointless busywork. > > If the TTB *was* still live here, then as soon as you call free_pages() > below the DT memory could get reallocated by someone else and filled with > data that happens to look like valid pagetables, so even if you invalidate > the TLBs the hardware could just immediately go walk that data and refill > them with nonsense, thus any pretence at invalidation is in vain anyway. Thanks, that makes a lot of sense. > The fly in the soup, however, is that default domains appear to be lacking > proper detach notifications (I hadn't consciously noticed that before), so > if you've been looking at the iommu_group_release() path it might have given > the wrong impression. So what might be justifiable here is to check if the > domain being freed is the one currently active in hardware, and if so > perform a detach (i.e. disable the TTB and invalidate everything) first, > then free everything as normal. Or just handwave that we essentially never > free a default domain anyway so it's OK to just assume that we're not > freeing anything live. I'm still a bit unsure as of what to do exactly here. I haven't found a hook that would be called when a given domain doesn't have any devices attached to it. Or did you mean that I should just create a separate function, not part of the IOMMU ops? > > > > + > > > > + if (iommu->domain == domain) > > > > + return 0; > > > > + > > > > + if (iommu->domain) > > > > + sun50i_iommu_detach_device(iommu->domain, dev); > > > > + > > > > + iommu->domain = domain; > > > > + sun50i_domain->iommu = iommu; > > > > + > > > > + return pm_runtime_get_sync(iommu->dev); > > > > > > Deferring all the actual hardware pogramming to the suspend/resume hooks > > > is > > > a fiendishly clever idea that took me more than a moment to make sense of, > > > but how well does it work when RPM is compiled out or runtime-inhibited? > > > > We have a bunch of other controllers that require runtime_pm already, > > so it's going to be enabled. But that should be expressed in Kconfig. > > But it can still be inhibited from sysfs, right? We don't want driver > behaviour to be *unnecessarily* fragile to user actions, however silly they > may be. That's a good point :) > > > Furthermore, basing RPM on having a domain attached means that > > > you'll effectively never turn the IOMMU off, even when all the > > > clients are idle. It would make more sene to use device links like > > > most other drivers do to properly model the producer/consumer > > > relationship. > > > > I'm not familiar with device links for runtime_pm, I thought this was > > only useful for system-wide resume and suspend? > > See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full > advantage of this. I'll look into it, thanks! > > > > +static int __maybe_unused sun50i_iommu_resume(struct device *dev) > > > > +{ > > > > + struct sun50i_iommu_domain *sun50i_domain; > > > > + struct sun50i_iommu *iommu; > > > > + unsigned
[PATCH] iommu/vt-d: consider real PCI device when checking if mapping is needed
The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier, when allocating memory through (e.g.) intel_map_page(). Confusion ensues if the iommu domain type for the subdevice does not match the iommu domain type for the real DMA device. For example, take the case of the subdevice handled by intel_map_page() in a IOMMU_DOMAIN_DMA, with the real DMA device in a IOMMU_DOMAIN_IDENTITY: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. Result: mapping is needed. 2. __intel_map_single() is called to create the mapping: - __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. - __intel_map_single() then calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; Fix this by using the real DMA device when checking if a mapping is needed. The above case will then directly fall back on dma_direct_map_page(). Fixes: 2b0140c69637 ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Signed-off-by: Daniel Drake --- Notes: This problem was detected with a non-upstream patch "PCI: Add Intel remapped NVMe device support" (https://marc.info/?l=linux-ide=156015271021615=2) This patch creates PCI devices in the same way as VMD, and hence I believe VMD would hit this class of problem for any cases where iommu domain type may mismatch between subdevice and real device, which we have run into here. However this hasn't actually been tested on VMD (don't have the hardware) so if I've missed anything and/or it's not a real issue then feel free to drop this patch. drivers/iommu/intel-iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc37672bf89..713810f8350c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3587,6 +3587,9 @@ static bool iommu_need_mapping(struct device *dev) if (iommu_dummy(dev)) return false; + if (dev_is_pci(dev)) + dev = _real_dma_dev(to_pci_dev(dev))->dev; + ret = identity_mapping(dev); if (ret) { u64 dma_mask = *dev->dma_mask; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/intel-iommu: set as DUMMY_DEVICE_DOMAIN_INFO if no IOMMU
On Sat, Feb 8, 2020 at 2:30 PM Lu Baolu wrote: > > The devices under segment 1 are fake devices produced by > > intel-nvme-remap mentioned here https://lkml.org/lkml/2020/2/5/139 > > Has this series been accepted? Sadly not - we didn't find any consensus on the right approach, and further conversation is hindered by the questionable hardware design and lack of further communication from Intel in explaining it. It's one of the exceptional cases where we carry a significant non-upstream kernel change, because unfortunately most of the current day consumer PCs we work with have this BIOS option on by default and hence unmodified Linux can't access the storage devices. On the offchance that you have some energy to bubble this up inside Intel please let me know and we will talk more... :) That said, this thread was indeed only opened since we thought we had found a more general issue that would potentially affect other cases. The issue described does seem to highlight a possible imperfection in code flow, although it may also be reasonable to say that (without crazy downstream patches at play) if intel_iommu_add_device() fails then we have bigger problems to face anyway... > Will this help here? https://www.spinics.net/lists/iommu/msg41300.html Yes! Very useful info and a real improvement. We'll follow the same approach here. That does solve the problem we were facing, although we needed one more fixup which I've sent separately for your consideration: iommu/vt-d: consider real PCI device when checking if mapping is needed Thanks! Daniel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/dmar: ignore devices with out-of-spec domain number
VMD subdevices are created with a PCI domain ID of 0x1 or higher. These subdevices are also handled like all other PCI devices by dmar_pci_bus_notifier(). However, when dmar_alloc_pci_notify_info() take records of such devices, it will truncate the domain ID to a u16 value (in info->seg). The device at (e.g.) 1:00:02.0 is then treated by the DMAR code as if it is :00:02.0. In the unlucky event that a real device also exists at :00:02.0 and also has a device-specific entry in the DMAR table, dmar_insert_dev_scope() will crash on: BUG_ON(i >= devices_cnt); That's basically a sanity check that only one PCI device matches a single DMAR entry; in this case we seem to have two matching devices. Fix this by ignoring devices that have a domain number higher than what can be looked up in the DMAR table. This problem was carefully diagnosed by Jian-Hong Pan. Signed-off-by: Daniel Drake --- Notes: This problem was detected with a non-upstream patch "PCI: Add Intel remapped NVMe device support" (https://marc.info/?l=linux-ide=156015271021615=2) This patch creates PCI devices in the same way as VMD, and hence I believe VMD would hit the same problem that we encountered here, when a VMD-using product comes along that meets the mentioned conditions. However this hasn't actually been tested on VMD (don't have the hardware) so if I've missed anything and/or it's not a real issue then feel free to drop this patch. drivers/iommu/dmar.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 071bb42bbbc5..8f94c817a7b5 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -128,6 +129,13 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned long event) BUG_ON(dev->is_virtfn); + /* +* Ignore devices that have a domain number higher than what can +* be looked up in DMAR, e.g. VMD subdevices with domain 0x1 +*/ + if (pci_domain_nr(dev->bus) > U16_MAX) + return NULL; + /* Only generate path[] for device addition event */ if (event == BUS_NOTIFY_ADD_DEVICE) for (tmp = dev; tmp; tmp = tmp->bus->self) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 2/4] uacce: add uacce driver
On 2020/2/11 上午7:37, Greg Kroah-Hartman wrote: On Wed, Jan 15, 2020 at 10:12:46PM +0800, Zhangfei Gao wrote: From: Kenneth Lee Uacce (Unified/User-space-access-intended Accelerator Framework) targets to provide Shared Virtual Addressing (SVA) between accelerators and processes. So accelerator can access any data structure of the main cpu. This differs from the data sharing between cpu and io device, which share only data content rather than address. Since unified address, hardware and user space of process can share the same virtual address in the communication. Uacce create a chrdev for every registration, the queue is allocated to the process when the chrdev is opened. Then the process can access the hardware resource by interact with the queue file. By mmap the queue file space to user space, the process can directly put requests to the hardware without syscall to the kernel space. The IOMMU core only tracks mm<->device bonds at the moment, because it only needs to handle IOTLB invalidation and PASID table entries. However uacce needs a finer granularity since multiple queues from the same device can be bound to an mm. When the mm exits, all bound queues must be stopped so that the IOMMU can safely clear the PASID table entry and reallocate the PASID. An intermediate struct uacce_mm links uacce devices and queues. Note that an mm may be bound to multiple devices but an uacce_mm structure only ever belongs to a single device, because we don't need anything more complex (if multiple devices are bound to one mm, then we'll create one uacce_mm for each bond). uacce_device --+-- uacce_mm --+-- uacce_queue | '-- uacce_queue | '-- uacce_mm --+-- uacce_queue +-- uacce_queue '-- uacce_queue Reviewed-by: Jonathan Cameron Signed-off-by: Kenneth Lee Signed-off-by: Zaibo Xu Signed-off-by: Zhou Wang Signed-off-by: Jean-Philippe Brucker Signed-off-by: Zhangfei Gao Looks much saner now, thanks for all of the work on this: Reviewed-by: Greg Kroah-Hartman Or am I supposed to take this in my tree? If so, I can, but I need an ack for the crypto parts. That's Great, thanks Greg. For the convenience, I rebase the patchset on 5.6-rc1. Not sure is there any conflict to crypto tree. How about just pick the uacce part, patch 1 , 2. We can resend the crypto part to crypto tree. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu