Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode
On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote: > Hello, > > On 7/8/20 5:24 PM, Christoph Hellwig wrote: > > Use the DMA API bypass mechanism for direct window mappings. This uses > > common code and speed up the direct mapping case by avoiding indirect > > calls just when not using dma ops at all. It also fixes a problem where > > the sync_* methods were using the bypass check for DMA allocations, but > > those are part of the streaming ops. > > > > Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which > > has never been well defined, as is only used by a few drivers, which > > IIRC never showed up in the typical Cell blade setups that are affected > > by the ordering workaround. > > > > Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") > > Signed-off-by: Christoph Hellwig > > --- > > arch/powerpc/Kconfig | 1 + > > arch/powerpc/include/asm/device.h | 5 -- > > arch/powerpc/kernel/dma-iommu.c | 90 --- > > 3 files changed, 10 insertions(+), 86 deletions(-) > > I am seeing corruptions on a couple of POWER9 systems (boston) when > stressed with IO. stress-ng gives some results but I have first seen > it when compiling the kernel in a guest and this is still the best way > to raise the issue. > > These systems have of a SAS Adaptec controller : > > 0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe > 3 (rev 01) > > When the failure occurs, the POWERPC EEH interrupt fires and dumps > lowlevel PHB4 registers among which : > > [ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = > 0280 > [ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = > 0200 > > The bits raised identify a PPC 'TCE' error, which means it is related > to DMAs. See below for more details. > > > Reverting this patch "fixes" the issue but it is probably else where, > in some other layers or in the aacraid driver. How should I proceed > to get more information ? The aacraid DMA masks look like a mess. Can you try the hack below and see it it helps? diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c index 769af4ca9ca97e..79c6b744dbb66c 100644 --- a/drivers/scsi/aacraid/aachba.c +++ b/drivers/scsi/aacraid/aachba.c @@ -2228,18 +2228,6 @@ int aac_get_adapter_info(struct aac_dev* dev) expose_physicals = 0; } - if (dev->dac_support) { - if (!pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(64))) { - if (!dev->in_reset) - dev_info(&dev->pdev->dev, "64 Bit DAC enabled\n"); - } else if (!pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(32))) { - dev_info(&dev->pdev->dev, "DMA mask set failed, 64 Bit DAC disabled\n"); - dev->dac_support = 0; - } else { - dev_info(&dev->pdev->dev, "No suitable DMA available\n"); - rcode = -ENOMEM; - } - } /* * Deal with configuring for the individualized limits of each packet * interface. diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index adbdc3b7c7a706..dbb23b351a4e7d 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -1479,7 +1479,6 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type) struct Scsi_Host *host = aac->scsi_host_ptr; int jafo = 0; int bled; - u64 dmamask; int num_of_fibs = 0; /* @@ -1558,22 +1557,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type) kfree(aac->fsa_dev); aac->fsa_dev = NULL; - dmamask = DMA_BIT_MASK(32); quirks = aac_get_driver_ident(index)->quirks; - if (quirks & AAC_QUIRK_31BIT) - retval = pci_set_dma_mask(aac->pdev, dmamask); - else if (!(quirks & AAC_QUIRK_SRC)) - retval = pci_set_dma_mask(aac->pdev, dmamask); - else - retval = pci_set_consistent_dma_mask(aac->pdev, dmamask); - - if (quirks & AAC_QUIRK_31BIT && !retval) { - dmamask = DMA_BIT_MASK(31); - retval = pci_set_consistent_dma_mask(aac->pdev, dmamask); - } - - if (retval) - goto out; if ((retval = (*(aac_get_driver_ident(index)->init))(aac))) goto out; diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c index 8588da0a065551..d897a9d59e24a1 100644 --- a/drivers/scsi/aacraid/linit.c +++ b/drivers/scsi/aacraid/linit.c @@ -1634,8 +1634,6 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) struct list_head *insert = &aac_devices; int error; int unique_id = 0; - u64 dmamask; - int mask_bits = 0; extern int aac_sync_
Re: [PATCH v1] iommu/vt-d: Move intel_iommu_ops to header file
On Sat, Aug 29, 2020 at 07:58:46AM +0100, Christoph Hellwig wrote: > On Fri, Aug 28, 2020 at 07:05:02PM +0300, Andy Shevchenko wrote: > > Compiler is not happy about hidden declaration of intel_iommu_ops. > > > > .../drivers/iommu/intel/iommu.c:414:24: warning: symbol 'intel_iommu_ops' > > was not declared. Should it be static? > > > > Move declaration to header file to make compiler happy. > > What about a factoring out a helper that does iommu_device_sysfs_add + > iommu_device_set_ops + iommu_device_register and then mark > intel_iommu_ops static? I am okay with this proposal, but I think the better if IOMMU folks can answer to this before I'm going to invest time into it. -- With Best Regards, Andy Shevchenko ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: Use device numa domain if RHSA is missing
Hi Kevin, Thanks a lot for looking at my patch. On 8/28/20 10:13 AM, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, August 27, 2020 1:57 PM If there are multiple NUMA domains but the RHSA is missing in ACPI/DMAR table, we could default to the device NUMA domain as fall back. This also benefits the vIOMMU use case where only a single vIOMMU is exposed, hence no RHSA will be present but device numa domain can be correct. this benefits vIOMMU but not necessarily only applied to single-vIOMMU case. The logic still holds in multiple vIOMMU cases as long as RHSA is not provided. Yes. Will refine the description. Cc: Jacob Pan Cc: Kevin Tian Cc: Ashok Raj Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index e0516d64d7a3..bce158468abf 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -700,12 +700,41 @@ static int domain_update_iommu_superpage(struct dmar_domain *domain, return fls(mask); } +static int domain_update_device_node(struct dmar_domain *domain) +{ + struct device_domain_info *info; + int nid = NUMA_NO_NODE; + + assert_spin_locked(&device_domain_lock); + + if (list_empty(&domain->devices)) + return NUMA_NO_NODE; + + list_for_each_entry(info, &domain->devices, link) { + if (!info->dev) + continue; + + nid = dev_to_node(info->dev); + if (nid != NUMA_NO_NODE) + break; + } There could be multiple device numa nodes as devices within the same domain may sit behind different IOMMUs. Of course there is no perfect answer in such situation, and this patch is still an obvious improvement on current always-on-node0 policy. But some comment about such implication is welcomed. I will add some comments here. Without more knowledge, currently we choose to use the first hit. + + return nid; +} + /* Some capabilities may be different across iommus */ static void domain_update_iommu_cap(struct dmar_domain *domain) { domain_update_iommu_coherency(domain); domain->iommu_snooping = domain_update_iommu_snooping(NULL); domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); + + /* +* If RHSA is missing, we should default to the device numa domain +* as fall back. +*/ + if (domain->nid == NUMA_NO_NODE) + domain->nid = domain_update_device_node(domain); } struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, @@ -5086,8 +5115,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) if (type == IOMMU_DOMAIN_DMA) intel_init_iova_domain(dmar_domain); - domain_update_iommu_cap(dmar_domain); - Is it intended or by mistake? If the former, looks it is a separate fix... It's a cleanup. When a domain is newly created, this function is actually a no-op. domain = &dmar_domain->domain; domain->geometry.aperture_start = 0; domain->geometry.aperture_end = -- 2.17.1 Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI
Hi Thomas, On 8/26/20 7:16 PM, Thomas Gleixner wrote: This is the second version of providing a base to support device MSI (non PCI based) and on top of that support for IMS (Interrupt Message Storm) based devices in a halfways architecture independent way. After applying this patch series, the dmar_alloc_hwirq() helper doesn't work anymore during boot. This causes the IOMMU driver to fail to register the DMA fault handler and abort the IOMMU probe processing. Is this a known issue? Best regards, baolu The first version can be found here: https://lore.kernel.org/r/20200821002424.119492...@linutronix.de It's still a mixed bag of bug fixes, cleanups and general improvements which are worthwhile independent of device MSI. There are quite a bunch of issues to solve: - X86 does not use the device::msi_domain pointer for historical reasons and due to XEN, which makes it impossible to create an architecture agnostic device MSI infrastructure. - X86 has it's own msi_alloc_info data type which is pointlessly different from the generic version and does not allow to share code. - The logic of composing MSI messages in an hierarchy is busted at the core level and of course some (x86) drivers depend on that. - A few minor shortcomings as usual This series addresses that in several steps: 1) Accidental bug fixes iommu/amd: Prevent NULL pointer dereference 2) Janitoring x86/init: Remove unused init ops PCI: vmd: Dont abuse vector irqomain as parent x86/msi: Remove pointless vcpu_affinity callback 3) Sanitizing the composition of MSI messages in a hierarchy genirq/chip: Use the first chip in irq_chip_compose_msi_msg() x86/msi: Move compose message callback where it belongs 4) Simplification of the x86 specific interrupt allocation mechanism x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency x86/irq: Add allocation type for parent domain retrieval iommu/vt-d: Consolidate irq domain getter iommu/amd: Consolidate irq domain getter iommu/irq_remapping: Consolidate irq domain lookup 5) Consolidation of the X86 specific interrupt allocation mechanism to be as close as possible to the generic MSI allocation mechanism which allows to get rid of quite a bunch of x86'isms which are pointless x86/irq: Prepare consolidation of irq_alloc_info x86/msi: Consolidate HPET allocation x86/ioapic: Consolidate IOAPIC allocation x86/irq: Consolidate DMAR irq allocation x86/irq: Consolidate UV domain allocation PCI/MSI: Rework pci_msi_domain_calc_hwirq() x86/msi: Consolidate MSI allocation x86/msi: Use generic MSI domain ops 6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs() x86/irq: Move apic_post_init() invocation to one place x86/pci: Reducde #ifdeffery in PCI init code x86/irq: Initialize PCI/MSI domain at PCI init time irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI PCI/MSI: Provide pci_dev_has_special_msi_domain() helper x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() x86/xen: Rework MSI teardown x86/xen: Consolidate XEN-MSI init irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() x86/xen: Wrap XEN MSI management into irqdomain iommm/vt-d: Store irq domain in struct device iommm/amd: Store irq domain in struct device x86/pci: Set default irq domain in pcibios_add_device() PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable x86/irq: Cleanup the arch_*_msi_irqs() leftovers x86/irq: Make most MSI ops XEN private iommu/vt-d: Remove domain search for PCI/MSI[X] iommu/amd: Remove domain search for PCI/MSI 7) X86 specific preparation for device MSI x86/irq: Add DEV_MSI allocation type x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI 8) Generic device MSI infrastructure platform-msi: Provide default irq_chip:: Ack genirq/proc: Take buslock on affinity write genirq/msi: Provide and use msi_domain_set_default_info_flags() platform-msi: Add device MSI infrastructure irqdomain/msi: Provide msi_alloc/free_store() callbacks 9) POC of IMS (Interrupt Message Storm) irq domain and irqchip implementations for both device array and queue storage. irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING Changes vs. V1: - Addressed various review comments and addressed the 0day fallout. - Corrected the XEN logic (Jürgen) - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn) - Fixed the compose MSI message inconsistency - Ensure that the necessary flags are set for device SMI - Make the irq bus logic work for affinity setting to prepare suppor
Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround
27.08.2020 18:54, Thierry Reding пишет: ... >> The Tegra DRM has a very special quirk for ARM32 that was added in this >> commit [2] and driver relies on checking of whether explicit or implicit >> IOMMU is used in order to activate the quirk. >> >> [2] >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=273da5a046965ccf0ec79eb63f2d5173467e20fa >> >> Once the implicit IOMMU is used for the DRM driver, the quirk no longer >> works (if I'm not missing something). This problem needs to be resolved >> before implicit IOMMU could be used by the Tegra DRM on ARM32. ... > I do have a patch lying around somewhere that implements the mapping > cache that was referenced in the above commit. Let me know if I should > dig that up and send it out. Hello, Thierry! It certainly will be interesting to take a look at yours patch! I think that the caching shouldn't be strictly necessary for keeping the current workaround working and it should be possible to keep the code as-is by replacing the domain-type checking with the SoC-generation check in the Tegra DRM driver. In general, IMO it should be better to stash the complex changes until we'll get closer to adopting the new UAPI as it will certainly touch the aspect of the per-device mappings. But if yours patch is less than 100 LOC, then maybe we could consider applying it right now! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma-mapping fix for 5.9
The pull request you sent on Sun, 30 Aug 2020 08:31:35 +0200: > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c4011283a7d5d64a50991dd3baa9acdf3d49092c Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode
Hello, On 7/8/20 5:24 PM, Christoph Hellwig wrote: > Use the DMA API bypass mechanism for direct window mappings. This uses > common code and speed up the direct mapping case by avoiding indirect > calls just when not using dma ops at all. It also fixes a problem where > the sync_* methods were using the bypass check for DMA allocations, but > those are part of the streaming ops. > > Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which > has never been well defined, as is only used by a few drivers, which > IIRC never showed up in the typical Cell blade setups that are affected > by the ordering workaround. > > Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") > Signed-off-by: Christoph Hellwig > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/include/asm/device.h | 5 -- > arch/powerpc/kernel/dma-iommu.c | 90 --- > 3 files changed, 10 insertions(+), 86 deletions(-) I am seeing corruptions on a couple of POWER9 systems (boston) when stressed with IO. stress-ng gives some results but I have first seen it when compiling the kernel in a guest and this is still the best way to raise the issue. These systems have of a SAS Adaptec controller : 0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01) When the failure occurs, the POWERPC EEH interrupt fires and dumps lowlevel PHB4 registers among which : [ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = 0280 [ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = 0200 The bits raised identify a PPC 'TCE' error, which means it is related to DMAs. See below for more details. Reverting this patch "fixes" the issue but it is probably else where, in some other layers or in the aacraid driver. How should I proceed to get more information ? Thanks, C. [ 2054.970339] EEH: Frozen PE#1fd on PHB#3 detected [ 2054.970375] EEH: PE location: UOPWR.BOS0019-Node0-Onboard SAS, PHB location: N/A [ 2179.249415973,3] PHB#0003[0:3]: brdgCtl = 0002 [ 2179.249515795,3] PHB#0003[0:3]: deviceStatus = 00060040 [ 2179.249596452,3] PHB#0003[0:3]: slotStatus = 00402000 [ 2179.249633728,3] PHB#0003[0:3]: linkStatus = a0830008 [ 2179.249674807,3] PHB#0003[0:3]: devCmdStatus = 00100107 [ 2179.249725974,3] PHB#0003[0:3]: devSecStatus = 00100107 [ 2179.249773550,3] PHB#0003[0:3]: rootErrorStatus = [ 2179.249809823,3] PHB#0003[0:3]: corrErrorStatus = [ 2179.249850439,3] PHB#0003[0:3]:uncorrErrorStatus = [ 2179.249887411,3] PHB#0003[0:3]: devctl = 0040 [ 2179.249928677,3] PHB#0003[0:3]: devStat = 0006 [ 2179.249967150,3] PHB#0003[0:3]: tlpHdr1 = [ 2179.250054987,3] PHB#0003[0:3]: tlpHdr2 = [ 2179.250146600,3] PHB#0003[0:3]: tlpHdr3 = [ 2179.250262780,3] PHB#0003[0:3]: tlpHdr4 = [ 2179.250343101,3] PHB#0003[0:3]: sourceId = [ 2179.250514264,3] PHB#0003[0:3]: nFir = [ 2179.250717971,3] PHB#0003[0:3]: nFirMask = 0030001c [ 2179.250791474,3] PHB#0003[0:3]: nFirWOF = [ 2179.250842054,3] PHB#0003[0:3]: phbPlssr = 001c [ 2179.250886003,3] PHB#0003[0:3]: phbCsr = 001c [ 2179.250929859,3] PHB#0003[0:3]: lemFir = 00010080 [ 2179.250973720,3] PHB#0003[0:3]: lemErrorMask = [ 2179.251018622,3] PHB#0003[0:3]: lemWOF = 0080 [ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = 0280 [ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = 0200 [ 2179.251162218,3] PHB#0003[0:3]: phbErrorLog0 = 214898000240 [ 2179.251206251,3] PHB#0003[0:3]: phbErrorLog1 = a0084000 [ 2179.251253096,3] PHB#0003[0:3]:phbTxeErrorStatus = [ 2179.265087656,3] PHB#0003[0:3]: phbTxeFirstErrorStatus = [ 2179.265142247,3] PHB#0003[0:3]: phbTxeErrorLog0 = [ 2179.265189734,3] PHB#0003[0:3]: phbTxeErrorLog1 = [ 2179.266335296,3] PHB#0003[0:3]: phbRxeArbErrorStatus = [ 2179.266380366,3] PHB#0003[0:3]: phbRxeArbFrstErrorStatus = [ 2179.266426523,3] PHB#0003[0:3]: phbRxeArbErrorLog0 = [ 2179.267537283,3] PHB#0003[0:3]: phbRxeArbErrorLog1 = [ 2179.267581708,3] PHB#0003[0:3]: phbRxeMrgErrorStatus = [ 2179.267628324,3] PHB#0003[0:3]: phbRxeMrgFrstErrorStatus = [ 2179.268748771,3] PHB#0003[0:3]: phbRxeMrgErrorLog