Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

2020-08-30 Thread Christoph Hellwig
On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote:
> Hello,
> 
> On 7/8/20 5:24 PM, Christoph Hellwig wrote:
> > Use the DMA API bypass mechanism for direct window mappings.  This uses
> > common code and speed up the direct mapping case by avoiding indirect
> > calls just when not using dma ops at all.  It also fixes a problem where
> > the sync_* methods were using the bypass check for DMA allocations, but
> > those are part of the streaming ops.
> > 
> > Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
> > has never been well defined, as is only used by a few drivers, which
> > IIRC never showed up in the typical Cell blade setups that are affected
> > by the ordering workaround.
> > 
> > Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  arch/powerpc/Kconfig  |  1 +
> >  arch/powerpc/include/asm/device.h |  5 --
> >  arch/powerpc/kernel/dma-iommu.c   | 90 ---
> >  3 files changed, 10 insertions(+), 86 deletions(-)
> 
> I am seeing corruptions on a couple of POWER9 systems (boston) when
> stressed with IO. stress-ng gives some results but I have first seen
> it when compiling the kernel in a guest and this is still the best way
> to raise the issue.
> 
> These systems have of a SAS Adaptec controller :
> 
>   0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 
> 3 (rev 01)
> 
> When the failure occurs, the POWERPC EEH interrupt fires and dumps
> lowlevel PHB4 registers among which :
> 
>   [ 2179.251069490,3] PHB#0003[0:3]:   phbErrorStatus = 
> 0280
>   [ 2179.251117476,3] PHB#0003[0:3]:  phbFirstErrorStatus = 
> 0200
> 
> The bits raised identify a PPC 'TCE' error, which means it is related
> to DMAs. See below for more details.
> 
> 
> Reverting this patch "fixes" the issue but it is probably else where,
> in some other layers or in the aacraid driver. How should I proceed 
> to get more information ?

The aacraid DMA masks look like a mess.  Can you try the hack
below and see it it helps?

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 769af4ca9ca97e..79c6b744dbb66c 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -2228,18 +2228,6 @@ int aac_get_adapter_info(struct aac_dev* dev)
expose_physicals = 0;
}
 
-   if (dev->dac_support) {
-   if (!pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(64))) {
-   if (!dev->in_reset)
-   dev_info(&dev->pdev->dev, "64 Bit DAC 
enabled\n");
-   } else if (!pci_set_dma_mask(dev->pdev, DMA_BIT_MASK(32))) {
-   dev_info(&dev->pdev->dev, "DMA mask set failed, 64 Bit 
DAC disabled\n");
-   dev->dac_support = 0;
-   } else {
-   dev_info(&dev->pdev->dev, "No suitable DMA 
available\n");
-   rcode = -ENOMEM;
-   }
-   }
/*
 * Deal with configuring for the individualized limits of each packet
 * interface.
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index adbdc3b7c7a706..dbb23b351a4e7d 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1479,7 +1479,6 @@ static int _aac_reset_adapter(struct aac_dev *aac, int 
forced, u8 reset_type)
struct Scsi_Host *host = aac->scsi_host_ptr;
int jafo = 0;
int bled;
-   u64 dmamask;
int num_of_fibs = 0;
 
/*
@@ -1558,22 +1557,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int 
forced, u8 reset_type)
kfree(aac->fsa_dev);
aac->fsa_dev = NULL;
 
-   dmamask = DMA_BIT_MASK(32);
quirks = aac_get_driver_ident(index)->quirks;
-   if (quirks & AAC_QUIRK_31BIT)
-   retval = pci_set_dma_mask(aac->pdev, dmamask);
-   else if (!(quirks & AAC_QUIRK_SRC))
-   retval = pci_set_dma_mask(aac->pdev, dmamask);
-   else
-   retval = pci_set_consistent_dma_mask(aac->pdev, dmamask);
-
-   if (quirks & AAC_QUIRK_31BIT && !retval) {
-   dmamask = DMA_BIT_MASK(31);
-   retval = pci_set_consistent_dma_mask(aac->pdev, dmamask);
-   }
-
-   if (retval)
-   goto out;
 
if ((retval = (*(aac_get_driver_ident(index)->init))(aac)))
goto out;
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 8588da0a065551..d897a9d59e24a1 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1634,8 +1634,6 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
struct list_head *insert = &aac_devices;
int error;
int unique_id = 0;
-   u64 dmamask;
-   int mask_bits = 0;
extern int aac_sync_

Re: [PATCH v1] iommu/vt-d: Move intel_iommu_ops to header file

2020-08-30 Thread Andy Shevchenko
On Sat, Aug 29, 2020 at 07:58:46AM +0100, Christoph Hellwig wrote:
> On Fri, Aug 28, 2020 at 07:05:02PM +0300, Andy Shevchenko wrote:
> > Compiler is not happy about hidden declaration of intel_iommu_ops.
> > 
> > .../drivers/iommu/intel/iommu.c:414:24: warning: symbol 'intel_iommu_ops' 
> > was not declared. Should it be static?
> > 
> > Move declaration to header file to make compiler happy.
> 
> What about a factoring out a helper that does iommu_device_sysfs_add +
> iommu_device_set_ops + iommu_device_register and then mark
> intel_iommu_ops static?

I am okay with this proposal, but I think the better if IOMMU folks can answer
to this before I'm going to invest time into it.

-- 
With Best Regards,
Andy Shevchenko


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Use device numa domain if RHSA is missing

2020-08-30 Thread Lu Baolu

Hi Kevin,

Thanks a lot for looking at my patch.

On 8/28/20 10:13 AM, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, August 27, 2020 1:57 PM

If there are multiple NUMA domains but the RHSA is missing in ACPI/DMAR
table, we could default to the device NUMA domain as fall back. This also
benefits the vIOMMU use case where only a single vIOMMU is exposed,
hence
no RHSA will be present but device numa domain can be correct.


this benefits vIOMMU but not necessarily only applied to single-vIOMMU
case. The logic still holds in multiple vIOMMU cases as long as RHSA is
not provided.


Yes. Will refine the description.





Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Ashok Raj 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c | 31 +--
  1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e0516d64d7a3..bce158468abf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -700,12 +700,41 @@ static int
domain_update_iommu_superpage(struct dmar_domain *domain,
return fls(mask);
  }

+static int domain_update_device_node(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   int nid = NUMA_NO_NODE;
+
+   assert_spin_locked(&device_domain_lock);
+
+   if (list_empty(&domain->devices))
+   return NUMA_NO_NODE;
+
+   list_for_each_entry(info, &domain->devices, link) {
+   if (!info->dev)
+   continue;
+
+   nid = dev_to_node(info->dev);
+   if (nid != NUMA_NO_NODE)
+   break;
+   }


There could be multiple device numa nodes as devices within the
same domain may sit behind different IOMMUs. Of course there
is no perfect answer in such situation, and this patch is still an
obvious improvement on current always-on-node0 policy. But
some comment about such implication is welcomed.


I will add some comments here. Without more knowledge, currently we
choose to use the first hit.




+
+   return nid;
+}
+
  /* Some capabilities may be different across iommus */
  static void domain_update_iommu_cap(struct dmar_domain *domain)
  {
domain_update_iommu_coherency(domain);
domain->iommu_snooping =
domain_update_iommu_snooping(NULL);
domain->iommu_superpage =
domain_update_iommu_superpage(domain, NULL);
+
+   /*
+* If RHSA is missing, we should default to the device numa domain
+* as fall back.
+*/
+   if (domain->nid == NUMA_NO_NODE)
+   domain->nid = domain_update_device_node(domain);
  }

  struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8
bus,
@@ -5086,8 +5115,6 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
if (type == IOMMU_DOMAIN_DMA)
intel_init_iova_domain(dmar_domain);

-   domain_update_iommu_cap(dmar_domain);
-


Is it intended or by mistake? If the former, looks it is a separate fix...


It's a cleanup. When a domain is newly created, this function is
actually a no-op.




domain = &dmar_domain->domain;
domain->geometry.aperture_start = 0;
domain->geometry.aperture_end   =
--
2.17.1




Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-08-30 Thread Lu Baolu

Hi Thomas,

On 8/26/20 7:16 PM, Thomas Gleixner wrote:

This is the second version of providing a base to support device MSI (non
PCI based) and on top of that support for IMS (Interrupt Message Storm)
based devices in a halfways architecture independent way.


After applying this patch series, the dmar_alloc_hwirq() helper doesn't
work anymore during boot. This causes the IOMMU driver to fail to
register the DMA fault handler and abort the IOMMU probe processing.
Is this a known issue?

Best regards,
baolu



The first version can be found here:

 https://lore.kernel.org/r/20200821002424.119492...@linutronix.de

It's still a mixed bag of bug fixes, cleanups and general improvements
which are worthwhile independent of device MSI.

There are quite a bunch of issues to solve:

   - X86 does not use the device::msi_domain pointer for historical reasons
 and due to XEN, which makes it impossible to create an architecture
 agnostic device MSI infrastructure.

   - X86 has it's own msi_alloc_info data type which is pointlessly
 different from the generic version and does not allow to share code.

   - The logic of composing MSI messages in an hierarchy is busted at the
 core level and of course some (x86) drivers depend on that.

   - A few minor shortcomings as usual

This series addresses that in several steps:

  1) Accidental bug fixes

   iommu/amd: Prevent NULL pointer dereference

  2) Janitoring

   x86/init: Remove unused init ops
   PCI: vmd: Dont abuse vector irqomain as parent
   x86/msi: Remove pointless vcpu_affinity callback

  3) Sanitizing the composition of MSI messages in a hierarchy
  
   genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

   x86/msi: Move compose message callback where it belongs

  4) Simplification of the x86 specific interrupt allocation mechanism

   x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
   x86/irq: Add allocation type for parent domain retrieval
   iommu/vt-d: Consolidate irq domain getter
   iommu/amd: Consolidate irq domain getter
   iommu/irq_remapping: Consolidate irq domain lookup

  5) Consolidation of the X86 specific interrupt allocation mechanism to be as 
close
 as possible to the generic MSI allocation mechanism which allows to get rid
 of quite a bunch of x86'isms which are pointless

   x86/irq: Prepare consolidation of irq_alloc_info
   x86/msi: Consolidate HPET allocation
   x86/ioapic: Consolidate IOAPIC allocation
   x86/irq: Consolidate DMAR irq allocation
   x86/irq: Consolidate UV domain allocation
   PCI/MSI: Rework pci_msi_domain_calc_hwirq()
   x86/msi: Consolidate MSI allocation
   x86/msi: Use generic MSI domain ops

   6) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()

   x86/irq: Move apic_post_init() invocation to one place
   x86/pci: Reducde #ifdeffery in PCI init code
   x86/irq: Initialize PCI/MSI domain at PCI init time
   irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
   PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
   PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
   x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
   x86/xen: Rework MSI teardown
   x86/xen: Consolidate XEN-MSI init
   irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
   x86/xen: Wrap XEN MSI management into irqdomain
   iommm/vt-d: Store irq domain in struct device
   iommm/amd: Store irq domain in struct device
   x86/pci: Set default irq domain in pcibios_add_device()
   PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
   x86/irq: Cleanup the arch_*_msi_irqs() leftovers
   x86/irq: Make most MSI ops XEN private
   iommu/vt-d: Remove domain search for PCI/MSI[X]
   iommu/amd: Remove domain search for PCI/MSI

   7) X86 specific preparation for device MSI

   x86/irq: Add DEV_MSI allocation type
   x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI

   8) Generic device MSI infrastructure
   platform-msi: Provide default irq_chip:: Ack
   genirq/proc: Take buslock on affinity write
   genirq/msi: Provide and use msi_domain_set_default_info_flags()
   platform-msi: Add device MSI infrastructure
   irqdomain/msi: Provide msi_alloc/free_store() callbacks

   9) POC of IMS (Interrupt Message Storm) irq domain and irqchip
  implementations for both device array and queue storage.

   irqchip: Add IMS (Interrupt Message Storm) driver - NOT FOR MERGING

Changes vs. V1:

- Addressed various review comments and addressed the 0day fallout.
  - Corrected the XEN logic (Jürgen)
  - Make the arch fallback in PCI/MSI opt-in not opt-out (Bjorn)

- Fixed the compose MSI message inconsistency

- Ensure that the necessary flags are set for device SMI

- Make the irq bus logic work for affinity setting to prepare
  suppor

Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-30 Thread Dmitry Osipenko
27.08.2020 18:54, Thierry Reding пишет:
...
>> The Tegra DRM has a very special quirk for ARM32 that was added in this
>> commit [2] and driver relies on checking of whether explicit or implicit
>> IOMMU is used in order to activate the quirk.
>>
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=273da5a046965ccf0ec79eb63f2d5173467e20fa
>>
>> Once the implicit IOMMU is used for the DRM driver, the quirk no longer
>> works (if I'm not missing something). This problem needs to be resolved
>> before implicit IOMMU could be used by the Tegra DRM on ARM32.
...
> I do have a patch lying around somewhere that implements the mapping
> cache that was referenced in the above commit. Let me know if I should
> dig that up and send it out.

Hello, Thierry! It certainly will be interesting to take a look at yours
patch!

I think that the caching shouldn't be strictly necessary for keeping the
current workaround working and it should be possible to keep the code
as-is by replacing the domain-type checking with the SoC-generation
check in the Tegra DRM driver.

In general, IMO it should be better to stash the complex changes until
we'll get closer to adopting the new UAPI as it will certainly touch the
aspect of the per-device mappings. But if yours patch is less than 100
LOC, then maybe we could consider applying it right now!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] dma-mapping fix for 5.9

2020-08-30 Thread pr-tracker-bot
The pull request you sent on Sun, 30 Aug 2020 08:31:35 +0200:

> git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c4011283a7d5d64a50991dd3baa9acdf3d49092c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

2020-08-30 Thread Cédric Le Goater
Hello,

On 7/8/20 5:24 PM, Christoph Hellwig wrote:
> Use the DMA API bypass mechanism for direct window mappings.  This uses
> common code and speed up the direct mapping case by avoiding indirect
> calls just when not using dma ops at all.  It also fixes a problem where
> the sync_* methods were using the bypass check for DMA allocations, but
> those are part of the streaming ops.
> 
> Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
> has never been well defined, as is only used by a few drivers, which
> IIRC never showed up in the typical Cell blade setups that are affected
> by the ordering workaround.
> 
> Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/Kconfig  |  1 +
>  arch/powerpc/include/asm/device.h |  5 --
>  arch/powerpc/kernel/dma-iommu.c   | 90 ---
>  3 files changed, 10 insertions(+), 86 deletions(-)

I am seeing corruptions on a couple of POWER9 systems (boston) when
stressed with IO. stress-ng gives some results but I have first seen
it when compiling the kernel in a guest and this is still the best way
to raise the issue.

These systems have of a SAS Adaptec controller :

  0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 
(rev 01)

When the failure occurs, the POWERPC EEH interrupt fires and dumps
lowlevel PHB4 registers among which :
  
  [ 2179.251069490,3] PHB#0003[0:3]:   phbErrorStatus = 0280
  [ 2179.251117476,3] PHB#0003[0:3]:  phbFirstErrorStatus = 0200

The bits raised identify a PPC 'TCE' error, which means it is related
to DMAs. See below for more details.


Reverting this patch "fixes" the issue but it is probably else where,
in some other layers or in the aacraid driver. How should I proceed 
to get more information ?

Thanks,

C.


[ 2054.970339] EEH: Frozen PE#1fd on PHB#3 detected
[ 2054.970375] EEH: PE location: UOPWR.BOS0019-Node0-Onboard SAS, PHB location: 
N/A
[ 2179.249415973,3] PHB#0003[0:3]:  brdgCtl = 0002
[ 2179.249515795,3] PHB#0003[0:3]: deviceStatus = 00060040
[ 2179.249596452,3] PHB#0003[0:3]:   slotStatus = 00402000
[ 2179.249633728,3] PHB#0003[0:3]:   linkStatus = a0830008
[ 2179.249674807,3] PHB#0003[0:3]: devCmdStatus = 00100107
[ 2179.249725974,3] PHB#0003[0:3]: devSecStatus = 00100107
[ 2179.249773550,3] PHB#0003[0:3]:  rootErrorStatus = 
[ 2179.249809823,3] PHB#0003[0:3]:  corrErrorStatus = 
[ 2179.249850439,3] PHB#0003[0:3]:uncorrErrorStatus = 
[ 2179.249887411,3] PHB#0003[0:3]:   devctl = 0040
[ 2179.249928677,3] PHB#0003[0:3]:  devStat = 0006
[ 2179.249967150,3] PHB#0003[0:3]:  tlpHdr1 = 
[ 2179.250054987,3] PHB#0003[0:3]:  tlpHdr2 = 
[ 2179.250146600,3] PHB#0003[0:3]:  tlpHdr3 = 
[ 2179.250262780,3] PHB#0003[0:3]:  tlpHdr4 = 
[ 2179.250343101,3] PHB#0003[0:3]: sourceId = 
[ 2179.250514264,3] PHB#0003[0:3]: nFir = 
[ 2179.250717971,3] PHB#0003[0:3]: nFirMask = 0030001c
[ 2179.250791474,3] PHB#0003[0:3]:  nFirWOF = 
[ 2179.250842054,3] PHB#0003[0:3]: phbPlssr = 001c
[ 2179.250886003,3] PHB#0003[0:3]:   phbCsr = 001c
[ 2179.250929859,3] PHB#0003[0:3]:   lemFir = 00010080
[ 2179.250973720,3] PHB#0003[0:3]: lemErrorMask = 
[ 2179.251018622,3] PHB#0003[0:3]:   lemWOF = 0080
[ 2179.251069490,3] PHB#0003[0:3]:   phbErrorStatus = 0280
[ 2179.251117476,3] PHB#0003[0:3]:  phbFirstErrorStatus = 0200
[ 2179.251162218,3] PHB#0003[0:3]: phbErrorLog0 = 214898000240
[ 2179.251206251,3] PHB#0003[0:3]: phbErrorLog1 = a0084000
[ 2179.251253096,3] PHB#0003[0:3]:phbTxeErrorStatus = 
[ 2179.265087656,3] PHB#0003[0:3]:   phbTxeFirstErrorStatus = 
[ 2179.265142247,3] PHB#0003[0:3]:  phbTxeErrorLog0 = 
[ 2179.265189734,3] PHB#0003[0:3]:  phbTxeErrorLog1 = 
[ 2179.266335296,3] PHB#0003[0:3]: phbRxeArbErrorStatus = 
[ 2179.266380366,3] PHB#0003[0:3]: phbRxeArbFrstErrorStatus = 
[ 2179.266426523,3] PHB#0003[0:3]:   phbRxeArbErrorLog0 = 
[ 2179.267537283,3] PHB#0003[0:3]:   phbRxeArbErrorLog1 = 
[ 2179.267581708,3] PHB#0003[0:3]: phbRxeMrgErrorStatus = 
[ 2179.267628324,3] PHB#0003[0:3]: phbRxeMrgFrstErrorStatus = 
[ 2179.268748771,3] PHB#0003[0:3]:   phbRxeMrgErrorLog