RE: [PATCH] Documentation: x86: rework IOMMU documentation

2022-04-27 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Jacob Pan 
> Sent: Tuesday, April 26, 2022 12:45 PM
> To: Deucher, Alexander 
> Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org;
> cor...@lwn.net; h...@zytor.com; x...@kernel.org;
> dave.han...@linux.intel.com; b...@alien8.de; mi...@redhat.com;
> t...@linutronix.de; j...@8bytes.org; Suthikulpanit, Suravee
> ; w...@kernel.org; iommu@lists.linux-
> foundation.org; robin.mur...@arm.com; Hegde, Vasant
> ; jacob.jun@intel.com; Lu, Baolu
> 
> Subject: Re: [PATCH] Documentation: x86: rework IOMMU documentation
> 
> Hi Alex,
> 
> Thanks for doing this, really helps to catch up the current state. Please see 
> my
> comments inline.
> 
> On Fri, 22 Apr 2022 16:06:07 -0400, Alex Deucher
>  wrote:
> 
> > Add preliminary documentation for AMD IOMMU and combine with the
> > existing Intel IOMMU documentation and clean up and modernize some of
> > the existing documentation to align with the current state of the
> > kernel.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >
> > V2: Incorporate feedback from Robin to clarify IOMMU vs DMA engine (e.g.,
> > a device) and document proper DMA API.  Also correct the fact that
> > the AMD IOMMU is not limited to managing PCI devices.
> > v3: Fix spelling and rework text as suggested by Vasant
> > v4: Combine Intel and AMD documents into a single document as suggested
> > by Dave Hansen
> > v5: Clarify that keywords are related to ACPI, grammatical fixes
> > v6: Make more stuff common based on feedback from Robin
> >
> >  Documentation/x86/index.rst   |   2 +-
> >  Documentation/x86/intel-iommu.rst | 115 
> >  Documentation/x86/iommu.rst   | 143
> ++
> >  3 files changed, 144 insertions(+), 116 deletions(-)  delete mode
> > 100644 Documentation/x86/intel-iommu.rst  create mode 100644
> > Documentation/x86/iommu.rst
> >
> > diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> > index f498f1d36cd3..6f8409fe0674 100644
> > --- a/Documentation/x86/index.rst
> > +++ b/Documentation/x86/index.rst
> > @@ -21,7 +21,7 @@ x86-specific Documentation
> > tlb
> > mtrr
> > pat
> > -   intel-iommu
> > +   iommu
> > intel_txt
> > amd-memory-encryption
> > pti
> > diff --git a/Documentation/x86/intel-iommu.rst
> > b/Documentation/x86/intel-iommu.rst deleted file mode 100644 index
> > 099f13d51d5f..
> > --- a/Documentation/x86/intel-iommu.rst
> > +++ /dev/null
> > @@ -1,115 +0,0 @@
> > -===
> > -Linux IOMMU Support
> > -===
> > -
> > -The architecture spec can be obtained from the below location.
> > -
> > -
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
> >
> intel.com%2Fcontent%2Fdam%2Fwww%2Fpublic%2Fus%2Fen%2Fdocuments
> %2Fprodu
> > ct-specifications%2Fvt-directed-io-
> spec.pdfdata=05%7C01%7Calexand
> >
> er.deucher%40amd.com%7C929847a4b2524432d1a608da27a3c9b0%7C3dd
> 8961fe488
> >
> 4e608e11a82d994e183d%7C0%7C0%7C637865881851295857%7CUnknow
> n%7CTWFpbGZs
> >
> b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D
> >
> %7C3000%7C%7C%7Csdata=KaPkvBSHWbn1cKBRzyk9H%2BuNDll%2Bq
> 3wPfR3SFVA
> > LwkU%3Dreserved=0
> > -
> > -This guide gives a quick cheat sheet for some basic understanding.
> > -
> > -Some Keywords
> > -
> > -- DMAR - DMA remapping
> > -- DRHD - DMA Remapping Hardware Unit Definition
> > -- RMRR - Reserved memory Region Reporting Structure
> > -- ZLR  - Zero length reads from PCI devices
> > -- IOVA - IO Virtual address.
> > -
> I feel this combined document only focus on IOVA and DMA APIs, it is
> considered as legacy DMA after scalable mode is introduced by Intel to
> support DMA with PASID, shared virtual addressing (SVA).
> Perhaps, we can also combine ./Documentation/x86/sva.rst

I think it would make sense to take that up in a separate patch set.  

> 
> With scalable mode, it affects boot messages, fault reporting, etc. I am not
> saying no to this document, just suggesting. I don't know where AMD is at in
> terms of PASID support but there are lots of things in common between VT-d
> and ARM's SMMU in terms of PASID/SVA. Should we broaden the purpose of
> this document even further?

I think that would make sense for a future clean up.  I'd like to land the 
current clean up first.

AMD's IOMMU driver has supported PASID for probably 8-10 years.  When we 
originally 

RE: [PATCH v4] Documentation: x86: rework IOMMU documentation

2022-04-22 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Robin Murphy 
> Sent: Friday, April 22, 2022 3:41 PM
> To: Deucher, Alexander ; linux-
> d...@vger.kernel.org; linux-ker...@vger.kernel.org; cor...@lwn.net;
> h...@zytor.com; x...@kernel.org; dave.han...@linux.intel.com;
> b...@alien8.de; mi...@redhat.com; t...@linutronix.de; j...@8bytes.org;
> Suthikulpanit, Suravee ; w...@kernel.org;
> iommu@lists.linux-foundation.org; Hegde, Vasant 
> Subject: Re: [PATCH v4] Documentation: x86: rework IOMMU documentation
> 
> On 2022-04-22 18:54, Alex Deucher wrote:
> [...]
> > +Intel Specific Notes
> > +
> > +
> > +Graphics Problems?
> > +^^
> > +
> > +If you encounter issues with graphics devices, you can try adding
> > +option intel_iommu=igfx_off to turn off the integrated graphics engine.
> > +If this fixes anything, please ensure you file a bug reporting the problem.
> > +
> > +Some exceptions to IOVA
> > +^^^
> > +
> > +Interrupt ranges are not address translated, (0xfee0 - 0xfeef).
> > +The same is true for peer to peer transactions. Hence we reserve the
> > +address from PCI MMIO ranges so they are not allocated for IOVA
> addresses.
> 
> Note that this should be true for both drivers.
> 
> > +
> > +AMD Specific Notes
> > +--
> > +
> > +Graphics Problems?
> > +^^
> > +
> > +If you encounter issues with integrated graphics devices, you can try
> > +adding option iommu=pt to the kernel command line use a 1:1 mapping
> > +for the IOMMU.  If this fixes anything, please ensure you file a bug
> reporting the problem.
> 
> And indeed this is a generic option. I reckon we could simply merge these two
> sections together, with the first paragraph being something like:
> 
> If you encounter issues with integrated graphics devices, you can try adding
> the option "iommu.passthrough=1", or the equivalent "iommu=pt", to the
> kernel command line to use a 1:1 mapping for the IOMMU in general.  On
> Intel you can also try "intel_iommu=igfx_off" to turn off translation 
> specifically
> for the integrated graphics engine only.  If this fixes anything, please 
> ensure
> you file a bug reporting the problem.
> 
> > +
> > +Fault reporting
> > +---
> > +When errors are reported, the IOMMU signals via an interrupt. The
> > +fault reason and device that caused it is printed on the console.
> > +
> > +
> > +Kernel Log Samples
> > +--
> > +
> > +Intel Boot Messages
> > +^^^
> > +
> > +Something like this gets printed indicating presence of DMAR tables
> > +in ACPI.
> > +
> > +::
> > +
> > +   ACPI: DMAR (v001 A M I  OEMDMAR  0x0001 MSFT
> 0x0097) @
> > +0x7f5b5ef0
> > +
> > +When DMAR is being processed and initialized by ACPI, prints DMAR
> > +locations and any RMRR's processed
> > +
> > +::
> > +
> > +   ACPI DMAR:Host address width 36
> > +   ACPI DMAR:DRHD (flags: 0x)base: 0xfed9
> > +   ACPI DMAR:DRHD (flags: 0x)base: 0xfed91000
> > +   ACPI DMAR:DRHD (flags: 0x0001)base: 0xfed93000
> > +   ACPI DMAR:RMRR base: 0x000ed000 end:
> 0x000e
> > +   ACPI DMAR:RMRR base: 0x7f60 end:
> 0x7fff
> > +
> > +When DMAR is enabled for use, you will notice
> > +
> > +::
> > +
> > +   PCI-DMA: Using DMAR IOMMU
> > +
> > +Intel Fault reporting
> > +^
> > +
> > +::
> > +
> > +   DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
> > +   DMAR:[fault reason 05] PTE Write access is not set
> > +   DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
> > +   DMAR:[fault reason 05] PTE Write access is not set
> > +
> > +AMD Boot Messages
> > +^
> > +
> > +Something like this gets printed indicating presence of the IOMMU.
> > +
> > +::
> > +
> > +   iommu: Default domain type: Translated
> > +   iommu: DMA domain TLB invalidation policy: lazy mode
> 
> Similarly, that's common IOMMU API reporting which will be seen on all
> architectures (let alone IOMMU drivers). Maybe some of the messages from
> print_iommu_info() might be better AMD-specific examples?
> 

All good points.  I've integrated these suggestions and will send out a new 
version.

Thanks!

Alex

> Cheers,
> Robin.
> 
> > +
> > +AMD Fault reporting
> > +^^^
> > +
> > +::
> > +
> > +   AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007
> address=0xc02000 flags=0x]
> > +   AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0
> domain=0x0007
> > +address=0xc02000 flags=0x]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4] Documentation: x86: rework IOMMU documentation

2022-04-22 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Deucher, Alexander 
> Sent: Friday, April 22, 2022 1:54 PM
> To: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org;
> cor...@lwn.net; h...@zytor.com; x...@kernel.org;
> dave.han...@linux.intel.com; b...@alien8.de; mi...@redhat.com;
> t...@linutronix.de; j...@8bytes.org; Suthikulpanit, Suravee
> ; w...@kernel.org; iommu@lists.linux-
> foundation.org; robin.mur...@arm.com; Hegde, Vasant
> 
> Cc: Deucher, Alexander 
> Subject: [PATCH v4] Documentation: x86: rework IOMMU documentation
> 
> Add preliminary documentation for AMD IOMMU and combine with the
> existing Intel IOMMU documentation and clean up and modernize some of the
> existing documentation to align with the current state of the kernel.
> 
> Signed-off-by: Alex Deucher 
> ---
> 
> V2: Incorporate feedback from Robin to clarify IOMMU vs DMA engine (e.g.,
> a device) and document proper DMA API.  Also correct the fact that
> the AMD IOMMU is not limited to managing PCI devices.
> v3: Fix spelling and rework text as suggested by Vasant
> v4: Combine Intel and AMD documents into a single document as suggested
> by Dave Hansen
> 
>  Documentation/x86/index.rst   |   2 +-
>  Documentation/x86/intel-iommu.rst | 115 --
>  Documentation/x86/iommu.rst   | 153
> ++
>  3 files changed, 154 insertions(+), 116 deletions(-)  delete mode 100644
> Documentation/x86/intel-iommu.rst  create mode 100644
> Documentation/x86/iommu.rst
> 
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> index f498f1d36cd3..6f8409fe0674 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -21,7 +21,7 @@ x86-specific Documentation
> tlb
> mtrr
> pat
> -   intel-iommu
> +   iommu
> intel_txt
> amd-memory-encryption
> pti
> diff --git a/Documentation/x86/intel-iommu.rst b/Documentation/x86/intel-
> iommu.rst
> deleted file mode 100644
> index 099f13d51d5f..
> --- a/Documentation/x86/intel-iommu.rst
> +++ /dev/null
> @@ -1,115 +0,0 @@
> -===
> -Linux IOMMU Support
> -===
> -
> -The architecture spec can be obtained from the below location.
> -
> -http://www.intel.com/content/dam/www/public/us/en/documents/product-
> specifications/vt-directed-io-spec.pdf
> -
> -This guide gives a quick cheat sheet for some basic understanding.
> -
> -Some Keywords
> -
> -- DMAR - DMA remapping
> -- DRHD - DMA Remapping Hardware Unit Definition
> -- RMRR - Reserved memory Region Reporting Structure
> -- ZLR  - Zero length reads from PCI devices
> -- IOVA - IO Virtual address.
> -
> -Basic stuff
> 
> -
> -ACPI enumerates and lists the different DMA engines in the platform, and -
> device scope relationships between PCI devices and which DMA engine
> controls -them.
> -
> -What is RMRR?
> --
> -
> -There are some devices the BIOS controls, for e.g USB devices to perform
> -PS2 emulation. The regions of memory used for these devices are marked -
> reserved in the e820 map. When we turn on DMA translation, DMA to those -
> regions will fail. Hence BIOS uses RMRR to specify these regions along with -
> devices that need to access these regions. OS is expected to setup -unity
> mappings for these regions for these devices to access these regions.
> -
> -How is IOVA generated?
> ---
> -
> -Well behaved drivers call pci_map_*() calls before sending command to
> device -that needs to perform DMA. Once DMA is completed and mapping is
> no longer -required, device performs a pci_unmap_*() calls to unmap the
> region.
> -
> -The Intel IOMMU driver allocates a virtual address per domain. Each PCIE -
> device has its own domain (hence protection). Devices under p2p bridges -
> share the virtual address with all devices under the p2p bridge due to -
> transaction id aliasing for p2p bridges.
> -
> -IOVA generation is pretty generic. We used the same technique as vmalloc() -
> but these are not global address spaces, but separate for each domain.
> -Different DMA engines may support different number of domains.
> -
> -We also allocate guard pages with each mapping, so we can attempt to catch -
> any overflow that might happen.
> -
> -
> -Graphics Problems?
> ---
> -If you encounter issues with graphics devices, you can try adding -option
> intel_iommu=igfx_off to turn off the integrated graphics engine.
> -If this fixes anything, please ensure you file a bug reporting the problem.
> -
> -Some exceptions to IOVA
> 
> -Interrupt ranges a

RE: [PATCH V3 1/2] Documentation: x86: Add documentation for AMD IOMMU

2022-03-30 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Dave Hansen 
> Sent: Tuesday, March 29, 2022 11:25 AM
> To: Deucher, Alexander ; linux-
> d...@vger.kernel.org; linux-ker...@vger.kernel.org; cor...@lwn.net;
> h...@zytor.com; x...@kernel.org; dave.han...@linux.intel.com;
> b...@alien8.de; mi...@redhat.com; t...@linutronix.de; j...@8bytes.org;
> Suthikulpanit, Suravee ; w...@kernel.org;
> iommu@lists.linux-foundation.org; robin.mur...@arm.com; Hegde, Vasant
> 
> Subject: Re: [PATCH V3 1/2] Documentation: x86: Add documentation for
> AMD IOMMU
> 
> On 3/28/22 10:28, Alex Deucher wrote:
> > +How is IOVA generated?
> > +--
> > +
> > +Well behaved drivers call dma_map_*() calls before sending command to
> > +device that needs to perform DMA. Once DMA is completed and mapping
> > +is no longer required, driver performs dma_unmap_*() calls to unmap the
> region.
> > +
> > +Fault reporting
> > +---
> > +
> > +When errors are reported, the IOMMU signals via an interrupt. The
> > +fault reason and device that caused it is printed on the console.
> 
> Just scanning this, it looks *awfully* generic.  Is anything in here AMD-
> specific?  Should this be in an AMD-specific file?

There is some information about the ACPI tables used to enumerate the IOMMUs 
and a link to the AMD IOMMU programming documentation.  Would you prefer I just 
create a combined x86 IOMMU document?

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] Documentation: x86: add documenation for AMD IOMMU

2022-03-09 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Robin Murphy 
> Sent: Tuesday, March 8, 2022 3:09 PM
> To: Deucher, Alexander ; linux-
> d...@vger.kernel.org; linux-ker...@vger.kernel.org; cor...@lwn.net;
> h...@zytor.com; x...@kernel.org; dave.han...@linux.intel.com;
> b...@alien8.de; mi...@redhat.com; t...@linutronix.de; j...@8bytes.org;
> Suthikulpanit, Suravee ; w...@kernel.org;
> iommu@lists.linux-foundation.org
> Subject: Re: [PATCH] Documentation: x86: add documenation for AMD
> IOMMU
> 
> On 2022-03-08 19:04, Alex Deucher via iommu wrote:
> > Add preliminary documenation for AMD IOMMU.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   Documentation/x86/amd-iommu.rst   | 85
> +++
> >   Documentation/x86/index.rst   |  1 +
> >   Documentation/x86/intel-iommu.rst |  2 +-
> >   3 files changed, 87 insertions(+), 1 deletion(-)
> >   create mode 100644 Documentation/x86/amd-iommu.rst
> >
> > diff --git a/Documentation/x86/amd-iommu.rst
> > b/Documentation/x86/amd-iommu.rst new file mode 100644 index
> > ..89820140fefa
> > --- /dev/null
> > +++ b/Documentation/x86/amd-iommu.rst
> > @@ -0,0 +1,85 @@
> > +=
> > +AMD IOMMU Support
> > +=
> > +
> > +The architecture spec can be obtained from the below location.
> > +
> >
> +https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> ww
> >
> +.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F48882_IOMMU.pdfda
> ta=04%7C
> >
> +01%7Calexander.deucher%40amd.com%7C3adb51f8c3f1435e0deb08da013f
> 8172%7
> >
> +C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823669974023501
> %7CUnkn
> >
> +own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWw
> >
> +iLCJXVCI6Mn0%3D%7C3000sdata=9Wq07GM%2BdT9xt%2FCZ3xhue
> %2BrNIe6CnD
> > +cG32kwqosUEZ8%3Dreserved=0
> > +
> > +This guide gives a quick cheat sheet for some basic understanding.
> > +
> > +Some Keywords
> > +
> > +- IVRS - I/O Virtualization Reporting Structure
> > +- IVDB - I/O Virtualization Definition Block
> > +- IVHD - I/O Virtualization Hardware Definition
> > +- IOVA - I/O Virtual Address.
> > +
> > +Basic stuff
> > +---
> > +
> > +ACPI enumerates and lists the different DMA engines in the platform,
> > +and device scope relationships between PCI devices and which DMA
> > +engine controls them.
> 
> "DMA engine" typically means a dedicated device for peripheral-to-memory
> or memory-to-memory transfers, or the responsible block within a general
> DMA-capable endpoint. In the context of the original Intel doc from whence I
> see this is copied, this probably should have said "DMAR unit"
> or similar; here I'd suggest picking your favourite vendor-appropriate term
> for "instance of IOMMU translation hardware". Let's not promote confusion
> more than necessary.
> 
> > +
> > +What is IVRS?
> > +-
> > +
> > +The architecture defines an ACPI-compatible data structure called an
> > +I/O Virtualization Reporting Structure (IVRS) that is used to convey
> > +information related to I/O virtualization to system software.  The
> > +IVRS describes the configuration and capabilities of the IOMMUs
> > +contained in the platform as well as information about the devices that
> each IOMMU virtualizes.
> > +
> > +The IVRS provides information about the following:
> > +- IOMMUs present in the platform including their capabilities and
> > +proper configuration
> > +- System I/O topology relevant to each IOMMU
> > +- Peripheral devices that cannot be otherwise enumerated
> > +- Memory regions used by SMI/SMM, platform firmware, and platform
> > +hardware. These are generally exclusion ranges to be configured by
> system software.
> > +
> > +How is IOVA generated?
> > +--
> > +
> > +Well behaved drivers call pci_map_*() calls before sending command to
> > +device
> 
> Horribly out-of-date drivers call pci_map_*(). Modern well-behaved drivers
> call dma_map_*() ;)
> 
> > +that needs to perform DMA. Once DMA is completed and mapping is no
> > +longer required, device performs a pci_unmap_*() calls to unmap the
> region.
> > +
> > +The AMD IOMMU driver allocates a virtual address per domain. Each
> > +PCIE device has its own domain (hence protection). Devices under p2p
> > +bridges share the virtual address with all devices under the p2p
> > +bridge due to transaction id aliasing for p2p bridg

RE: [PATCH] iommu/iova: kmemleak when disable SRIOV.

2021-08-03 Thread Deucher, Alexander via iommu
[Public]

> -Original Message-
> From: Zhou, Peng Ju 
> Sent: Tuesday, August 3, 2021 1:51 AM
> To: Chris Wilson ; Robin Murphy
> ; iommu@lists.linux-foundation.org
> Cc: Deucher, Alexander ; Wang, Yin
> ; w...@kernel.org; Chang, HaiJun
> ; Deng, Emily 
> Subject: RE: [PATCH] iommu/iova: kmemleak when disable SRIOV.
> 
> [AMD Official Use Only]
> 
> Hi Chris
> I hit kmemleak with your following patch, Can you help to fix it?
> 
> According to the info in this thread, it seems the patch doesn't merge into
> iommu mainline branch, but I can get your patch from my kernel: 5.11.0

If this patch is not upstream, it probably ended up in our tree via drm-tip or 
drm-misc last time we synced up.  If that is the case and the patch is not 
upstream, you can just revert the patch from our tree.

Alex

> 
> 
> commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d
> Author: Chris Wilson 
> Date:   Sat Jan 16 11:10:35 2021 +
> 
> iommu/iova: Use bottom-up allocation for DMA32
> 
> Since DMA32 allocations are currently allocated top-down from the 4G
> boundary, this causes fragmentation and reduces the maximise allocation
> size. On some systems, the dma address space may be extremely
> constrained by PCIe windows, requiring a stronger anti-fragmentation
> strategy.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2929
> Signed-off-by: Chris Wilson 
> 
> 
> --
> BW
> Pengju Zhou
> 
> 
> 
> 
> 
> > -Original Message-
> > From: Robin Murphy 
> > Sent: Tuesday, July 27, 2021 10:23 PM
> > To: Zhou, Peng Ju ; iommu@lists.linux-
> > foundation.org
> > Cc: Deucher, Alexander ; Wang, Yin
> > ; w...@kernel.org
> > Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV.
> >
> > On 2021-07-27 15:05, Zhou, Peng Ju wrote:
> > > [AMD Official Use Only]
> > >
> > > Hi Robin
> > > The patch which add "head" :
> > >
> > > commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d
> > > Author: Chris Wilson 
> > > Date:   Sat Jan 16 11:10:35 2021 +
> > >
> > >  iommu/iova: Use bottom-up allocation for DMA32
> > >
> > >  Since DMA32 allocations are currently allocated top-down from the 4G
> > >  boundary, this causes fragmentation and reduces the maximise
> allocation
> > >  size. On some systems, the dma address space may be extremely
> > >  constrained by PCIe windows, requiring a stronger anti-fragmentation
> > >  strategy.
> > >
> > >  Closes:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.f
> > reedesktop.org%2Fdrm%2Fintel%2F-
> >
> %2Fissues%2F2929data=04%7C01%7CPengJu.Zhou%40amd.com%7C4
> 7f
> >
> c4308f6044a379ed908d9510a19b1%7C3dd8961fe4884e608e11a82d994e183d
> >
> %7C0%7C0%7C637629927137121754%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> o
> >
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C20
> 00
> >
> sdata=iO5%2FKSW8KV1UZtwGU3oiZpYqiR4eBNcSpF3%2Ft6uSDpY%3D
> &
> > amp;reserved=0
> > >  Signed-off-by: Chris Wilson 
> >
> > ...which is not in mainline. I've never even seen it posted for review.
> > In fact two search engines can't seem to find any trace of that SHA or patch
> > subject on the internet at all.
> >
> > By all means tell Chris that his patch, wherever you got it from, is broken,
> but
> > once again there's nothing the upstream maintainers/reviewers can do
> about
> > code which isn't upstream.
> >
> > Thanks,
> > Robin.
> >
> > > --
> > > BW
> > > Pengju Zhou
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: Robin Murphy 
> > >> Sent: Tuesday, July 27, 2021 4:52 PM
> > >> To: Zhou, Peng Ju ; iommu@lists.linux-
> > >> foundation.org
> > >> Cc: Deucher, Alexander ; Wang, Yin
> > >> ; w...@kernel.org
> > >> Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV.
> > >>
> > >> On 2021-07-27 05:46, Zhou, Peng Ju wrote:
> > >>> [AMD Official Use Only]
> > >>>
> > >>> Hi Robin
> > >>> 1. it is not a good manner to free a statically allocated object(in
> > >>> this case, it
> > >> is iovad->head) dynamically even though the free only occurred when
&g

RE: [PATCH] iommu/amd: Fix section mismatch warning for detect_ivrs()

2021-06-08 Thread Deucher, Alexander via iommu
[AMD Public Use]

> -Original Message-
> From: Joerg Roedel 
> Sent: Tuesday, June 8, 2021 8:29 AM
> To: Joerg Roedel ; Will Deacon 
> Cc: Deucher, Alexander ;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Joerg
> Roedel 
> Subject: [PATCH] iommu/amd: Fix section mismatch warning for
> detect_ivrs()
> 
> From: Joerg Roedel 
> 
> A recent commit introduced this section mismatch warning:
> 
>   WARNING: modpost: vmlinux.o(.text.unlikely+0x22a1f): Section
> mismatch in reference from the function detect_ivrs() to the variable
> .init.data:amd_iommu_force_enable
> 
> The reason is that detect_ivrs() is not marked __init while it should be,
> because it is only called from another __init function. Mark
> detect_ivrs() __init to get rid of the warning.
> 
> Fixes: b1e650db2cc4 ("iommu/amd: Add amd_iommu=force_enable
> option")
> Signed-off-by: Joerg Roedel 

Acked-by: Alex Deucher 

> ---
>  drivers/iommu/amd/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index
> 4e4fb0f4e412..46280e6e1535 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2817,7 +2817,7 @@ static int amd_iommu_enable_interrupts(void)
>   return ret;
>  }
> 
> -static bool detect_ivrs(void)
> +static bool __init detect_ivrs(void)
>  {
>   struct acpi_table_header *ivrs_base;
>   acpi_status status;
> --
> 2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/amd: Add amd_iommu=force_enable option

2021-04-22 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Joerg Roedel 
> Sent: Thursday, April 22, 2021 9:07 AM
> To: iommu@lists.linux-foundation.org
> Cc: Joerg Roedel ; Will Deacon ;
> Deucher, Alexander ;
> d1nu...@protonmail.com; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Joerg Roedel 
> Subject: [PATCH] iommu/amd: Add amd_iommu=force_enable option
> 
> From: Joerg Roedel 
> 
> Add this option to enable the IOMMU on platforms like AMD Stoney, where
> the kernel usually disables it because it may cause problems in some
> scenarios.
> 
> Signed-off-by: Joerg Roedel 

Acked-by: Alex Deucher 

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 3 +++
>  drivers/iommu/amd/init.c| 7 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 04545725f187..c9573f726707 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -303,6 +303,9 @@
> allowed anymore to lift isolation
> requirements as needed. This
> option
> does not override iommu=pt
> + force_enable - Force enable the IOMMU on
> platforms known
> +to be buggy with IOMMU enabled. Use
> this
> +option with care.
> 
>   amd_iommu_dump= [HW,X86-64]
>   Enable AMD IOMMU driver option to dump the ACPI
> table diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 321f5906e6ed..3e2295d3b3e2 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -155,6 +155,7 @@ static int amd_iommu_xt_mode =
> IRQ_REMAP_XAPIC_MODE;
> 
>  static bool amd_iommu_detected;
>  static bool __initdata amd_iommu_disabled;
> +static bool __initdata amd_iommu_force_enable;
>  static int amd_iommu_target_ivhd_type;
> 
>  u16 amd_iommu_last_bdf;  /* largest PCI device id we
> have
> @@ -2882,6 +2883,9 @@ static bool detect_ivrs(void)
> 
>   acpi_put_table(ivrs_base);
> 
> + if (amd_iommu_force_enable)
> + goto out;
> +
>   /* Don't use IOMMU if there is Stoney Ridge graphics */
>   for (i = 0; i < 32; i++) {
>   u32 pci_id;
> @@ -2893,6 +2897,7 @@ static bool detect_ivrs(void)
>   }
>   }
> 
> +out:
>   /* Make sure ACS will be enabled during PCI probe */
>   pci_request_acs();
> 
> @@ -3148,6 +3153,8 @@ static int __init parse_amd_iommu_options(char
> *str)
>   for (; *str; ++str) {
>   if (strncmp(str, "fullflush", 9) == 0)
>   amd_iommu_unmap_flush = true;
> + if (strncmp(str, "force_enable", 12) == 0)
> + amd_iommu_force_enable = true;
>   if (strncmp(str, "off", 3) == 0)
>   amd_iommu_disabled = true;
>   if (strncmp(str, "force_isolation", 15) == 0)
> --
> 2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-10 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Sent: Thursday, December 10, 2020 5:48 AM
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Alright. Done that.
> This should be it finally I believe.
> Which will be the initial kernel-version that incorporates that?

Looks good to me.  Bjorn, can you pick this up for PCI?

Alex

> 
> -----Original Message-
> From: Deucher, Alexander 
> Sent: Mittwoch, 9. Dezember 2020 15:24
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> Huang, Ray ; Kuehling, Felix 
> 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Wednesday, December 9, 2020 2:59 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Alex,
> >
> > I had to revise the patch. Please see attachment. It is actually two 
> > more SSIDs affected to that.
> 
> Other than some minor whitespace issues, the patch looks fine to me.
> Please align the subsystem_device lines and put the closing 
> parenthesis on the same line as the last check.
> 
> Thanks!
> 
> Alex
> 
> >
> > Best regards,
> > Edgar
> >
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Dienstag, 8. Dezember 2020 09:23
> > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > ; 'Kuehling, Felix' 
> > Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org'
> > ; 'linux-...@vger.kernel.org'  > p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> > ; 'Bjorn Helgaas'
> > ; 'Joerg Roedel' ; 'Zhu, 
> > Changfeng' 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Applied the patch as in attachment. Verified that ATS for GPU-Device 
> > had been disabled. See attachment "dmesg_ATS.log".
> >
> > Was running that build over night successfully.
> >
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Montag, 7. Dezember 2020 05:53
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Hi Alex,
> >
> > I believe in the patch file, this
> > +   (pdev->subsystem_device == 0x0c19 ||
> > +    pdev->subsystem_device == 0x0c10))
> >
> > Has to be changed to:
> > +   (pdev->subsystem_device == 0xce19 ||
> > +pdev->subsystem_device == 0xcc10))
> >
> > Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> > another one would "ea50:cc08".
> >
> > I will apply that patch and feedback the results soon plus the patch 
> > file that I actually had applied.
> >
> >
> > -Original Message-----
> > From: Deucher, Alexander 
> > Sent: Montag, 30. November 2020 19:36
> > To: Merger, Edgar [AUTOSOL/MAS/AUGS]
> ;
> > Huang, Ray ; Kuehling, Felix 
> > 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > [AMD Public Use]
> >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > 
> > > Sent: Thursday, November 26, 2020 4:24 AM
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > >

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-09 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Sent: Wednesday, December 9, 2020 2:59 AM
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Alex,
> 
> I had to revise the patch. Please see attachment. It is actually two 
> more SSIDs affected to that.

Other than some minor whitespace issues, the patch looks fine to me.  Please 
align the subsystem_device lines and put the closing parenthesis on the same 
line as the last check.

Thanks!

Alex

> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Dienstag, 8. Dezember 2020 09:23
> To: 'Deucher, Alexander' ; 'Huang, Ray'
> ; 'Kuehling, Felix' 
> Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org' 
> ; 'linux-...@vger.kernel.org'  p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> ; 'Bjorn Helgaas'
> ; 'Joerg Roedel' ; 'Zhu, 
> Changfeng' 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Applied the patch as in attachment. Verified that ATS for GPU-Device 
> had been disabled. See attachment "dmesg_ATS.log".
> 
> Was running that build over night successfully.
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Montag, 7. Dezember 2020 05:53
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Hi Alex,
> 
> I believe in the patch file, this
> + (pdev->subsystem_device == 0x0c19 ||
> +  pdev->subsystem_device == 0x0c10))
> 
> Has to be changed to:
> + (pdev->subsystem_device == 0xce19 ||
> +  pdev->subsystem_device == 0xcc10))
> 
> Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> another one would "ea50:cc08".
> 
> I will apply that patch and feedback the results soon plus the patch 
> file that I actually had applied.
> 
> 
> -Original Message-
> From: Deucher, Alexander 
> Sent: Montag, 30. November 2020 19:36
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> Huang, Ray ; Kuehling, Felix 
> 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Thursday, November 26, 2020 4:24 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Alex,
> >
> > This is pretty much the same patch as what I have received from 
> > Joerg previously, except that it is tied to the particular Emerson 
> > platform and its derivatives (listed with Subsystem IDs).
> 
> Right.  As per my original point, I don't want to disable ATS on all 
> Picasso chips because doing so would break GPU compute on them, so I'd 
> like to apply this quirk as narrowly as possible.
> 
> >
> > Below patch was what Joerg provided me and I successfully tested.
> >
> > This diff to the kernel should do that:
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
> > f70692ac79c5..3911b0ec57ba 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5176,6 +5176,8 @@
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> > 0x6900, quirk_amd_harvest_no_ats);
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, 
> > quirk_amd_harvest_no_ats);
> >  /* AMD Navi14 dGPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, 
> > quirk_amd_harvest_no_ats);
> > +/* AMD Raven platform iGPU */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, 
> > +quirk_amd_harvest_no_ats);
> >  #endif /* CONFIG_PC

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-30 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Thursday, November 26, 2020 4:24 AM
> To: Deucher, Alexander ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Alex,
> 
> This is pretty much the same patch as what I have received from Joerg
> previously, except that it is tied to the particular Emerson platform and its
> derivatives (listed with Subsystem IDs).

Right.  As per my original point, I don't want to disable ATS on all Picasso 
chips because doing so would break GPU compute on them, so I'd like to apply 
this quirk as narrowly as possible.

> 
> Below patch was what Joerg provided me and I successfully tested.
> 
> This diff to the kernel should do that:
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> f70692ac79c5..3911b0ec57ba 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> 0x6900, quirk_amd_harvest_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> quirk_amd_harvest_no_ats);
> +/* AMD Raven platform iGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
> 
> So far I have seen this issue on two instances of this chip, but I must admit
> that I did test only two of them to this extent, so I guess it is not a bad 
> chip in
> particular, but the chips we use are from the same production lot, so it might
> be a systematical problem of that production lot?
> 
> UEFI-Setup shows:
> Processor Family: 17h
> Procossor Model: 20h - 2Fh
> CPUID: 00820F01
> Microcode Patch Level: 8200103
> 
> Looking at the chip-die I found that this is a fully qualified IP Silicon 
> (according
> to Ryzen Embedded R1000 SOC Interlock).
> YE1305C9T20FG
> BI2015SUY
> 9JB6496P00123
> 2016 AMD
> DIFFUSED IN USA
> MADE IN CHINA
> 
> Currently used SBIOS is a branch from "EmbeddedPI-FP5 1.2.0.3RC3".
> 
> In the future our SBIOS might merge with EmbeddedPI-FP5_1.2.0.5RC3.
> 

I think it's more likely an sbios issue, so hopefully the new release fixes it.

Alex

> 
> 
> 
> -Original Message-
> From: Deucher, Alexander 
> Sent: Mittwoch, 25. November 2020 17:08
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ;
> Huang, Ray ; Kuehling, Felix
> 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Wednesday, November 25, 2020 5:04 AM
> > To: Deucher, Alexander ; Huang, Ray
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn
> > Helgaas ; Joerg Roedel ; Zhu,
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> > broken
> >
> > I do have also other problems with this unit, when IOMMU is enabled
> > and pci=noats is not set as kernel parameter.
> >
> > [ 2004.265906] amdgpu :0b:00.0: [drm:amdgpu_ib_ring_tests
> > [amdgpu]]
> > *ERROR* IB test failed on gfx (-110).
> > [ 2004.266024] [drm:amdgpu_device_delayed_init_work_handler
> [amdgpu]]
> > *ERROR* ib ring test failed (-110).
> >
> 
> Is this seen on all instances of this chip or only specific silicon?  I.e., 
> could this
> be a bad chip?  Would it be possible to test a newer sbios?  I think the
> attached patch should work if we can't get it fixed on the platform side.  It
> should only enable the quirk on your particular platform.
> 
> Alex
> 
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Mittwoch, 25. November 2020 10:16
> > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > ; 'Kuehling, Felix' 
> > Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org'
> > ; 'linux-...@vger.kernel.org'  > p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> > ; 'Bjorn Helgaas'
> > ; 'Joerg Roedel' ; 'Zhu,
> > Chan

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-25 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Wednesday, November 25, 2020 5:04 AM
> To: Deucher, Alexander ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> I do have also other problems with this unit, when IOMMU is enabled and
> pci=noats is not set as kernel parameter.
> 
> [ 2004.265906] amdgpu :0b:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
> *ERROR* IB test failed on gfx (-110).
> [ 2004.266024] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]]
> *ERROR* ib ring test failed (-110).
> 

Is this seen on all instances of this chip or only specific silicon?  I.e., 
could this be a bad chip?  Would it be possible to test a newer sbios?  I think 
the attached patch should work if we can't get it fixed on the platform side.  
It should only enable the quirk on your particular platform.

Alex


> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Mittwoch, 25. November 2020 10:16
> To: 'Deucher, Alexander' ; 'Huang, Ray'
> ; 'Kuehling, Felix' 
> Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org'  ker...@vger.kernel.org>; 'linux-...@vger.kernel.org'  p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> ; 'Bjorn Helgaas'
> ; 'Joerg Roedel' ; 'Zhu,
> Changfeng' 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Remark:
> 
> Systems with R1305G APU (which show the issue) have the following VGA-
> Controller:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Picasso (rev cf)
> 
> Systems with V1404I APU (which do not show the issue) have the following
> VGA-Controller:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83)
> 
> "rev cf" vs. "ref 83" is probably what you where referring to with PCI 
> Revision
> ID.
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Mittwoch, 25. November 2020 07:05
> To: 'Deucher, Alexander' ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> I see that problem only on systems that use a R1305G APU
> 
> sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
> 
> shows
> 
> VCE feature version: 0, firmware version: 0x UVD feature version: 0,
> firmware version: 0x MC feature version: 0, firmware version:
> 0x ME feature version: 50, firmware version: 0x00a3 PFP
> feature version: 50, firmware version: 0x00bb CE feature version: 50,
> firmware version: 0x004f RLC feature version: 1, firmware version:
> 0x0049 RLC SRLC feature version: 1, firmware version: 0x0001 RLC
> SRLG feature version: 1, firmware version: 0x0001 RLC SRLS feature
> version: 1, firmware version: 0x0001 MEC feature version: 50, firmware
> version: 0x01b5
> MEC2 feature version: 50, firmware version: 0x01b5 SOS feature version:
> 0, firmware version: 0x ASD feature version: 0, firmware version:
> 0x2130 TA XGMI feature version: 0, firmware version: 0x TA
> RAS feature version: 0, firmware version: 0x SMC feature version: 0,
> firmware version: 0x2527
> SDMA0 feature version: 41, firmware version: 0x00a9 VCN feature
> version: 0, firmware version: 0x0110901c DMCU feature version: 0, firmware
> version: 0x0001 VBIOS version: 113-RAVEN2-117
> 
> We are also using V1404I APU on the same boards and I haven´t seen the
> issue on those boards
> 
> These boards give me slightly different info: sudo cat
> /sys/kernel/debug/dri/0/amdgpu_firmware_info
> 
> VCE feature version: 0, firmware version: 0x UVD feature version: 0,
> firmware version: 0x MC feature version: 0, firmware version:
> 0x ME feature version: 47, firmware version: 0x00a2 PFP
> feature version: 47, firmware version: 0x00b9 CE feature version: 47,
> firmware version: 0x004e RLC feature version: 1, firmware version:
> 0x0213 RLC SRLC feature version: 1, firmware version: 0x0001 RLC
> SRLG feature version: 1, firmware version: 0x0001 RLC SRLS feature
> version: 1, firmware version: 0x0001 MEC feature version: 47, firmware
> version: 0x01ab
> MEC2 feature 

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-24 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Tuesday, November 24, 2020 2:29 AM
> To: Huang, Ray ; Kuehling, Felix
> 
> Cc: Will Deacon ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Module Version : PiccasoCpu 10
> AGESA Version   : PiccasoPI 100A
> 
> I did not try to enter the system in any other way (like via ssh) than via
> Desktop.

You can get this information from the amdgpu driver.  E.g., sudo cat 
/sys/kernel/debug/dri/0/amdgpu_firmware_info .  Also what is the PCI revision 
id of your chip (from lspci)?  Also are you just seeing this on specific 
versions of the sbios?

Thanks,

Alex


> 
> -Original Message-
> From: Huang Rui 
> Sent: Dienstag, 24. November 2020 07:43
> To: Kuehling, Felix 
> Cc: Will Deacon ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org; linux-
> p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> ; Merger, Edgar [AUTOSOL/MAS/AUGS]
> ; Joerg Roedel ;
> Changfeng Zhu 
> Subject: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> 
> On Tue, Nov 24, 2020 at 06:51:11AM +0800, Kuehling, Felix wrote:
> > On 2020-11-23 5:33 p.m., Will Deacon wrote:
> > > On Mon, Nov 23, 2020 at 09:04:14PM +, Deucher, Alexander wrote:
> > >> [AMD Public Use]
> > >>
> > >>> -Original Message-
> > >>> From: Will Deacon 
> > >>> Sent: Monday, November 23, 2020 8:44 AM
> > >>> To: linux-ker...@vger.kernel.org
> > >>> Cc: linux-...@vger.kernel.org; iommu@lists.linux-foundation.org;
> > >>> Will Deacon ; Bjorn Helgaas
> > >>> ; Deucher, Alexander
> > >>> ; Edgar Merger
> > >>> ; Joerg Roedel 
> > >>> Subject: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> > >>>
> > >>> Edgar Merger reports that the AMD Raven GPU does not work reliably
> > >>> on his system when the IOMMU is enabled:
> > >>>
> > >>>| [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> > >>> signaled seq=1, emitted seq=3
> > >>>| [...]
> > >>>| amdgpu :0b:00.0: GPU reset begin!
> > >>>| AMD-Vi: Completion-Wait loop timed out
> > >>>| iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> > >>> device=0b:00.0 address=0x38edc0970]
> > >>>
> > >>> This is indicative of a hardware/platform configuration issue so,
> > >>> since disabling ATS has been shown to resolve the problem, add a
> > >>> quirk to match this particular device while Edgar follows-up with AMD
> for more information.
> > >>>
> > >>> Cc: Bjorn Helgaas 
> > >>> Cc: Alex Deucher 
> > >>> Reported-by: Edgar Merger 
> > >>> Suggested-by: Joerg Roedel 
> > >>> Link:
> > >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-
> 3A__lore%26d%3DDwIDAw%26c%3DjOURTkCZzT8tVB5xPEYIm3YJGoxoTaQs
> QPzPKJGaWbo%26r%3DBJxhacqqa4K1PJGm6_-
> 862rdSP13_P6LVp7j_9l1xmg%26m%3DlNXu2xwvyxEZ3PzoVmXMBXXS55jsmf
> DicuQFJqkIOH4%26s%3D_5VDNCRQdA7AhsvvZ3TJJtQZ2iBp9c9tFHIleTYT_ZM
> %26e%3Ddata=04%7C01%7CAlexander.Deucher%40amd.com%7C6d5f
> a241f9634692c03908d8904a942c%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637417997272974427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100
> 0sdata=OEgYlw%2F1YP0C%2FnWBRQUxwBH56mGOJxYMWSQ%2Fj1Y
> 9f6Q%3Dreserved=0 .
> > >>> kernel.org/linux-
> > >>>
> iommu/MWHPR10MB1310F042A30661D4158520B589FC0@MWHPR10M
> > >>> B1310.namprd10.prod.outlook.com
> > >>>
> her%40amd.com%7C1a883fe14d0c408e7d9508d88fb5df4e%7C3dd8961fe488
> > >>>
> 4e608e11a82d994e183d%7C0%7C0%7C637417358593629699%7CUnknown%7
> > >>>
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> > >>>
> LCJXVCI6Mn0%3D%7C1000sdata=TMgKldWzsX8XZ0l7q3%2BszDWXQJJ
> > >>> LOUfX5oGaoLN8n%2B8%3Dreserved=0
> > >>> Signed-off-by: Will Deacon 
> > >>> ---
> > >>>
> > >>> Hi all,
> > >>>
> > >>> Since Joerg is away at the moment, I'm posting this to try to mak

RE: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-23 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Will Deacon 
> Sent: Monday, November 23, 2020 8:44 AM
> To: linux-ker...@vger.kernel.org
> Cc: linux-...@vger.kernel.org; iommu@lists.linux-foundation.org; Will
> Deacon ; Bjorn Helgaas ;
> Deucher, Alexander ; Edgar Merger
> ; Joerg Roedel 
> Subject: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> 
> Edgar Merger reports that the AMD Raven GPU does not work reliably on his
> system when the IOMMU is enabled:
> 
>   | [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> signaled seq=1, emitted seq=3
>   | [...]
>   | amdgpu :0b:00.0: GPU reset begin!
>   | AMD-Vi: Completion-Wait loop timed out
>   | iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> device=0b:00.0 address=0x38edc0970]
> 
> This is indicative of a hardware/platform configuration issue so, since
> disabling ATS has been shown to resolve the problem, add a quirk to match
> this particular device while Edgar follows-up with AMD for more information.
> 
> Cc: Bjorn Helgaas 
> Cc: Alex Deucher 
> Reported-by: Edgar Merger 
> Suggested-by: Joerg Roedel 
> Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Flinux-
> iommu%2FMWHPR10MB1310F042A30661D4158520B589FC0%40MWHPR10M
> B1310.namprd10.prod.outlook.comdata=04%7C01%7Calexander.deuc
> her%40amd.com%7C1a883fe14d0c408e7d9508d88fb5df4e%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637417358593629699%7CUnknown%7
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C1000sdata=TMgKldWzsX8XZ0l7q3%2BszDWXQJJ
> LOUfX5oGaoLN8n%2B8%3Dreserved=0
> Signed-off-by: Will Deacon 
> ---
> 
> Hi all,
> 
> Since Joerg is away at the moment, I'm posting this to try to make some
> progress with the thread in the Link: tag.

+ Felix

What system is this?  Can you provide more details?  Does a sbios update fix 
this?  Disabling ATS for all Ravens will break GPU compute for a lot of people. 
 I'd prefer to just black list this particular system (e.g., just SSIDs or 
revision) if possible.

Alex

> 
> Cheers,
> 
> Will
> 
>  drivers/pci/quirks.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> f70692ac79c5..3911b0ec57ba 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> 0x6900, quirk_amd_harvest_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> quirk_amd_harvest_no_ats);
> +/* AMD Raven platform iGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
> --
> 2.29.2.454.gaff20da3a2-goog
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-09-06 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Joerg Roedel 
> Sent: Friday, September 4, 2020 6:06 AM
> To: Deucher, Alexander 
> Cc: jroe...@suse.de; Kuehling, Felix ;
> iommu@lists.linux-foundation.org; Huang, Ray ;
> Koenig, Christian ; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> On Fri, Aug 28, 2020 at 03:47:07PM +, Deucher, Alexander wrote:
> > Ah, right,  So CZ and ST are not an issue.  Raven is paired with Zen based
> CPUs.
>
> Okay, so for the Raven case, can you add code to the amdgpu driver which
> makes it fail to initialize on Raven when SME is active? There is a global
> checking function for that, so that shouldn't be hard to do.
>

Sure.  How about the attached patch?

Alex

From f479b9da353c2547c26ebac8930a5dcd9a134eb7 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Sun, 6 Sep 2020 12:05:12 -0400
Subject: [PATCH] drm/amdgpu: Fail to load on RAVEN if SME is active

Due to hardware bugs, scatter/gather display on raven requires
a 1:1 IOMMU mapping, however, SME (System Memory Encryption)
requires an indirect IOMMU mapping because the encryption bit
is beyond the DMA mask of the chip.  As such, the two are
incompatible.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 12e16445df7c..d87d37c25329 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1102,6 +1102,16 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 		return -ENODEV;
 	}
 
+	/* Due to hardware bugs, S/G Display on raven requires a 1:1 IOMMU mapping,
+	 * however, SME requires an indirect IOMMU mapping because the encryption
+	 * bit is beyond the DMA mask of the chip.
+	 */
+	if (mem_encrypt_active() && ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
+		dev_info(>dev,
+			 "SME is not compatible with RAVEN\n");
+		return -ENOTSUPP;
+	}
+
 #ifdef CONFIG_DRM_AMDGPU_SI
 	if (!amdgpu_si_support) {
 		switch (flags & AMD_ASIC_MASK) {
-- 
2.25.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: jroe...@suse.de 
> Sent: Friday, August 28, 2020 11:30 AM
> To: Deucher, Alexander 
> Cc: Kuehling, Felix ; Joerg Roedel
> ; iommu@lists.linux-foundation.org; Huang, Ray
> ; Koenig, Christian ;
> Lendacky, Thomas ; Suthikulpanit, Suravee
> ; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> On Fri, Aug 28, 2020 at 03:11:32PM +, Deucher, Alexander wrote:
> > There are hw bugs on Raven and probably Carrizo/Stoney where they need
> > 1:1 mapping to avoid bugs in some corner cases with the displays.
> > Other GPUs should be fine.  The VIDs is 0x1002 and the DIDs are 0x15dd
> > and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876,
> > 0x9877 and 0x98e4 for carrizo and stoney.  As long as we preserve the
> > 1:1 mapping for those asics, we should be fine.
> 
> Okay, Stoney at least has no Zen-based CPU, so no support for memory
> encryption anyway. How about Raven, is it paired with a Zen CPU?

Ah, right,  So CZ and ST are not an issue.  Raven is paired with Zen based CPUs.

Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, August 28, 2020 9:55 AM
> To: jroe...@suse.de; Deucher, Alexander 
> Cc: Joerg Roedel ; iommu@lists.linux-foundation.org;
> Huang, Ray ; Koenig, Christian
> ; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> Am 2020-08-28 um 9:46 a.m. schrieb jroe...@suse.de:
> > On Wed, Aug 26, 2020 at 03:25:58PM +, Deucher, Alexander wrote:
> >>> Alex, do you know if anyone has tested amdgpu on an APU with SME
> >>> enabled? Is this considered something we support?
> >> It's not something we've tested.  I'm not even sure the GPU portion
> >> of APUs will work properly without an identity mapping.  SME should
> >> work properly with dGPUs however, so this is a proper fix for them.
> >> We don't use the IOMMUv2 path on dGPUs at all.
> > Is it possible to make the IOMMUv2 paths optional on iGPUs as well
> > when SME is active (or better, when the GPU is not identity mapped)?
> 
> Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not needed
> for graphics. And we're making it optional for KFD as well.
> 
> The question Alex and I raised here is more general. We may have some
> assumptions in the amdgpu driver that are broken when the framebuffer is
> not identity mapped. This would break the iGPU in a more general sense,
> regardless of KFD and IOMMUv2. In that case, we don't really need to worry
> about breaking KFD because we have a much bigger problem.

There are hw bugs on Raven and probably Carrizo/Stoney where they need 1:1 
mapping to avoid bugs in some corner cases with the displays.  Other GPUs 
should be fine.  The VIDs is 0x1002 and the DIDs are 0x15dd and 0x15d8 for 
raven variants and 0x9870, 0x9874, 0x9875, 0x9876, 0x9877 and 0x98e4 for 
carrizo and stoney.  As long as we preserve the 1:1 mapping for those asics, we 
should be fine.

Alex

> 
> Regards,
>   Felix
> 
> 
> >
> > Regards,
> >
> > Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-26 Thread Deucher, Alexander
[AMD Public Use]

 + Christian

> -Original Message-
> From: Kuehling, Felix 
> Sent: Wednesday, August 26, 2020 11:22 AM
> To: Deucher, Alexander ; Joerg Roedel
> ; iommu@lists.linux-foundation.org; Huang, Ray
> 
> Cc: jroe...@suse.de; Lendacky, Thomas ;
> Suthikulpanit, Suravee ; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> [+Ray]
> 
> 
> Thanks for the heads up. Currently KFD won't work on APUs when IOMMUv2
> is disabled. But Ray is working on fallbacks that will allow KFD to work on
> APUs even without IOMMUv2, similar to our dGPUs. Along with changes in
> ROCm user mode, those fallbacks are necessary for making ROCm on APUs
> generally useful.
> 
> 
> How common is SME on typical PCs or laptops that would use AMD APUs?

I think the hw supports it, but it as far as I know it's not formally 
productized on client parts.

> 
> 
> Alex, do you know if anyone has tested amdgpu on an APU with SME
> enabled? Is this considered something we support?

It's not something we've tested.  I'm not even sure the GPU portion of APUs 
will work properly without an identity mapping.  SME should work properly with 
dGPUs however, so this is a proper fix for them.  We don't use the IOMMUv2 path 
on dGPUs at all.

Alex

> 
> 
> Thanks,
>   Felix
> 
> 
> Am 2020-08-26 um 10:14 a.m. schrieb Deucher, Alexander:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> >
> > + Felix
> > --
> > --
> > *From:* Joerg Roedel 
> > *Sent:* Monday, August 24, 2020 6:54 AM
> > *To:* iommu@lists.linux-foundation.org
> > 
> > *Cc:* Joerg Roedel ; jroe...@suse.de
> > ; Lendacky, Thomas ;
> > Suthikulpanit, Suravee ; Deucher,
> > Alexander ; linux-ker...@vger.kernel.org
> > 
> > *Subject:* [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> > active
> >
> > From: Joerg Roedel 
> >
> > Hi,
> >
> > Some IOMMUv2 capable devices do not work correctly when SME is active,
> > because their DMA mask does not include the encryption bit, so that
> > they can not DMA to encrypted memory directly.
> >
> > The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
> > capable devices into an identity mapped domain. Fix that by not
> > forcing an identity mapped domain on devices when SME is active and
> > forbid using their IOMMUv2 functionality.
> >
> > Please review.
> >
> > Thanks,
> >
> >     Joerg
> >
> > Joerg Roedel (2):
> >   iommu/amd: Do not force direct mapping when SME is active
> >   iommu/amd: Do not use IOMMUv2 functionality when SME is active
> >
> >  drivers/iommu/amd/iommu.c    | 7 ++-
> >  drivers/iommu/amd/iommu_v2.c | 7 +++
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > --
> > 2.28.0
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 1/2] iommu/amd: Do not force direct mapping when SME is active

2020-08-26 Thread Deucher, Alexander
[AMD Public Use]

+ Felix, Christian

> -Original Message-
> From: Joerg Roedel 
> Sent: Monday, August 24, 2020 6:54 AM
> To: iommu@lists.linux-foundation.org
> Cc: Joerg Roedel ; jroe...@suse.de; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org
> Subject: [PATCH 1/2] iommu/amd: Do not force direct mapping when SME is
> active
> 
> From: Joerg Roedel 
> 
> Do not force devices supporting IOMMUv2 to be direct mapped when
> memory encryption is active. This might cause them to be unusable because
> their DMA mask does not include the encryption bit.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/iommu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index ba9f3dbc5b94..77e4268e41cf 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2659,7 +2659,12 @@ static int amd_iommu_def_domain_type(struct
> device *dev)
>   if (!dev_data)
>   return 0;
> 
> - if (dev_data->iommu_v2)
> + /*
> +  * Do not identity map IOMMUv2 capable devices when memory
> encryption is
> +  * active, because some of those devices (AMD GPUs) don't have the
> +  * encryption bit in their DMA-mask and require remapping.
> +  */

I think on the integrated GPUs in APUs I'd prefer to have the identity mapping 
over SME, but I guess this is fine because you have to explicitly enable SME 
and if you do that you know what you are getting into.

Alex

> + if (!mem_encrypt_active() && dev_data->iommu_v2)
>   return IOMMU_DOMAIN_IDENTITY;
> 
>   return 0;
> --
> 2.28.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 2/2] iommu/amd: Do not use IOMMUv2 functionality when SME is active

2020-08-26 Thread Deucher, Alexander
[AMD Public Use]

+ Felix, Christian

> -Original Message-
> From: Joerg Roedel 
> Sent: Monday, August 24, 2020 6:54 AM
> To: iommu@lists.linux-foundation.org
> Cc: Joerg Roedel ; jroe...@suse.de; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org
> Subject: [PATCH 2/2] iommu/amd: Do not use IOMMUv2 functionality when
> SME is active
> 
> From: Joerg Roedel 
> 
> When memory encryption is active the device is likely not in a direct mapped
> domain. Forbid using IOMMUv2 functionality for now until finer grained
> checks for this have been implemented.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/iommu_v2.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu_v2.c
> b/drivers/iommu/amd/iommu_v2.c index c259108ab6dd..0d175aed1d92
> 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -737,6 +737,13 @@ int amd_iommu_init_device(struct pci_dev *pdev,
> int pasids)
> 
>   might_sleep();
> 
> + /*
> +  * When memory encryption is active the device is likely not in a
> +  * direct-mapped domain. Forbid using IOMMUv2 functionality for
> now.
> +  */
> + if (mem_encrypt_active())
> + return -ENODEV;
> +
>   if (!amd_iommu_v2_supported())
>   return -ENODEV;
> 
> --
> 2.28.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-26 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

+ Felix

From: Joerg Roedel 
Sent: Monday, August 24, 2020 6:54 AM
To: iommu@lists.linux-foundation.org 
Cc: Joerg Roedel ; jroe...@suse.de ; 
Lendacky, Thomas ; Suthikulpanit, Suravee 
; Deucher, Alexander 
; linux-ker...@vger.kernel.org 

Subject: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

From: Joerg Roedel 

Hi,

Some IOMMUv2 capable devices do not work correctly when SME is
active, because their DMA mask does not include the encryption bit, so
that they can not DMA to encrypted memory directly.

The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
capable devices into an identity mapped domain. Fix that by not
forcing an identity mapped domain on devices when SME is active and
forbid using their IOMMUv2 functionality.

Please review.

Thanks,

Joerg

Joerg Roedel (2):
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Do not use IOMMUv2 functionality when SME is active

 drivers/iommu/amd/iommu.c| 7 ++-
 drivers/iommu/amd/iommu_v2.c | 7 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

--
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [BUG] "Pre-boot DMA Protection" makes AMDGPU stop working

2020-07-02 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Kai-Heng Feng 
> Sent: Thursday, July 2, 2020 8:04 AM
> To: Joerg Roedel 
> Cc: Deucher, Alexander ;
> iommu@lists.linux-foundation.org; open list 
> Subject: [BUG] "Pre-boot DMA Protection" makes AMDGPU stop working
> 
> Hi,
> 
> A more detailed bug report can be found at [1].
> 
> I have a AMD Renoir system that can't enter graphical session because there
> are many IOMMU splat.
> 
> Alex suggested to disable "Pre-boot DMA Protection", I can confirm once it's
> disabled, AMDGPU starts working with IOMMU enabled.
> So raise the issue here because I have no knowledge on how to reset the
> IOMMU.

+ Suravee

This is part of MS's Secure Core initiative.  We are investigating how to 
properly handle this properly on Linux.  Stay tuned.

Alex

> 
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1204data=02%7C01%7Calexander.deucher%40amd.com
> %7C60746a6fecf04a5e570908d81e8011c6%7C3dd8961fe4884e608e11a82d994
> e183d%7C0%7C0%7C637292882713301680sdata=r6cj19Vc8N0%2FSmsb
> CAJva%2BabMD2b5r2lvPLIxZSacoY%3Dreserved=0
> 
> Kai-Heng
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3] iommu/amd: Disable IOMMU on Stoney Ridge systems

2020-02-10 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Kai-Heng Feng 
> Sent: Monday, February 10, 2020 2:51 AM
> To: j...@8bytes.org
> Cc: Kai-Heng Feng ; Deucher, Alexander
> ; open list:AMD IOMMU (AMD-VI)
> ; open list  ker...@vger.kernel.org>
> Subject: [PATCH v3] iommu/amd: Disable IOMMU on Stoney Ridge systems
> 
> Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> 
> Use identity-mapping and PCI ATS doesn't help this issue.
> 
> According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do the
> same here to avoid screen flickering on 4K monitor.
> 
> Cc: Alex Deucher 
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961data=02%7C01%7
> Calexander.deucher%40amd.com%7C79aa213aaf2d4540064308d7adfe0749%
> 7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637169178877965485
> sdata=UeNw4%2FuQ3Rs5SwEvguDmdfuMEsizO8F138B%2B2GNleTY%
> 3Dreserved=0
> Signed-off-by: Kai-Heng Feng 

Acked-by: Alex Deucher 


> ---
> v3:
>  - Update commit message to mention identity-mapping and ATS don't help.
> 
> v2:
>  - Find Stoney graphics instead of host bridge.
> 
>  drivers/iommu/amd_iommu_init.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c
> b/drivers/iommu/amd_iommu_init.c index 2759a8d57b7f..6be3853a5d97
> 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2523,6 +2523,7 @@ static int __init early_amd_iommu_init(void)
>   struct acpi_table_header *ivrs_base;
>   acpi_status status;
>   int i, remap_cache_sz, ret = 0;
> + u32 pci_id;
> 
>   if (!amd_iommu_detected)
>   return -ENODEV;
> @@ -2610,6 +2611,16 @@ static int __init early_amd_iommu_init(void)
>   if (ret)
>   goto out;
> 
> + /* Disable IOMMU if there's Stoney Ridge graphics */
> + for (i = 0; i < 32; i++) {
> + pci_id = read_pci_config(0, i, 0, 0);
> + if ((pci_id & 0x) == 0x1002 && (pci_id >> 16) == 0x98e4) {
> + pr_info("Disable IOMMU on Stoney Ridge\n");
> + amd_iommu_disabled = true;
> + break;
> + }
> + }
> +
>   /* Disable any previously enabled IOMMUs */
>   if (!is_kdump_kernel() || amd_iommu_disabled)
>   disable_iommus();
> @@ -2718,7 +2729,7 @@ static int __init state_next(void)
>   ret = early_amd_iommu_init();
>   init_state = ret ? IOMMU_INIT_ERROR :
> IOMMU_ACPI_FINISHED;
>   if (init_state == IOMMU_ACPI_FINISHED &&
> amd_iommu_disabled) {
> - pr_info("AMD IOMMU disabled on kernel command-
> line\n");
> + pr_info("AMD IOMMU disabled\n");
>   init_state = IOMMU_CMDLINE_DISABLED;
>   ret = -EINVAL;
>   }
> --
> 2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-12-19 Thread Deucher, Alexander
> -Original Message-
> From: Kai-Heng Feng 
> Sent: Wednesday, December 18, 2019 12:45 PM
> To: Joerg Roedel 
> Cc: Christoph Hellwig ; Deucher, Alexander
> ; iommu@lists.linux-foundation.org; Kernel
> development list 
> Subject: Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
> systems
> 
> 
> 
> > On Dec 17, 2019, at 17:53, Joerg Roedel  wrote:
> >
> > On Fri, Dec 06, 2019 at 01:57:41PM +0800, Kai-Heng Feng wrote:
> >> Hi Joerg,
> >>
> >>> On Dec 3, 2019, at 01:00, Christoph Hellwig  wrote:
> >>>
> >>> On Fri, Nov 29, 2019 at 10:21:54PM +0800, Kai-Heng Feng wrote:
> >>>> Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> >>>>
> >>>> According to Alex Deucher, IOMMU isn't enabled on Windows, so let's
> >>>> do the same here to avoid screen flickering on 4K monitor.
> >>>
> >>> Disabling the IOMMU entirely seem pretty severe.  Isn't it enough to
> >>> identity map the GPU device?
> >>
> >> Ok, there's set_device_exclusion_range() to exclude the device from
> IOMMU.
> >> However I don't know how to generate range_start and range_length,
> which are read from ACPI.
> >
> > set_device_exclusion_range() is not the solution here. The best is if
> > the GPU device is put into a passthrough domain at boot, in which it
> > will be identity mapped. DMA still goes through the IOMMU in this
> > case, but it only needs to lookup the device-table, page-table walks
> > will not be done anymore.
> >
> > The best way to implement this is to put it into the
> > amd_iommu_add_device() in drivers/iommu/amd_iommu.c. There is this
> > check:
> >
> >if (dev_data->iommu_v2)
> > iommu_request_dm_for_dev(dev);
> >
> > The iommu_request_dm_for_dev() function causes the device to be
> > identity mapped. The check can be extended to also check for a device
> > white-list for devices that need identity mapping.
> 
> My patch looks like this but the original behavior (4K screen flickering) is 
> still
> the same:

Does reverting the patch to disable ATS along with this patch help?

Alex

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index bd25674ee4db..f913a25c9e92 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "amd_iommu_proto.h"
>  #include "amd_iommu_types.h"
> @@ -2159,6 +2160,8 @@ static int amd_iommu_add_device(struct device
> *dev)
> struct iommu_domain *domain;
> struct amd_iommu *iommu;
> int ret, devid;
> +   bool need_identity_mapping = false;
> +   u32 header;
> 
> if (!check_device(dev) || get_dev_data(dev))
> return 0;
> @@ -2184,7 +2187,11 @@ static int amd_iommu_add_device(struct device
> *dev)
> 
> BUG_ON(!dev_data);
> 
> -   if (dev_data->iommu_v2)
> +   header = read_pci_config(0, PCI_BUS_NUM(devid), PCI_SLOT(devid),
> PCI_FUNC(devid));
> +   if ((header & 0x) == 0x1002 && (header >> 16) == 0x98e4)
> +   need_identity_mapping = true;
> +
> +   if (dev_data->iommu_v2 || need_identity_mapping)
> iommu_request_dm_for_dev(dev);
> 
> /* Domains are initialized for this device - have a look what we 
> ended up
> with */
> 
> 
> $ dmesg | grep -i direct
> [0.011446] Using GB pages for direct mapping
> [0.703369] pci :00:01.0: Using iommu direct mapping
> [0.703830] pci :00:08.0: Using iommu direct mapping
> 
> So the graphics device (pci :00:01.0:) is using direct mapping after the
> change.
> 
> Kai-Heng
> 
> >
> > HTH,
> >
> > Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-12-04 Thread Deucher, Alexander
> -Original Message-
> From: Deucher, Alexander
> Sent: Monday, December 2, 2019 11:37 AM
> To: Lucas Stach ; Kai-Heng Feng
> ; j...@8bytes.org; Koenig, Christian
> (christian.koe...@amd.com) 
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
> systems
> 
> > -Original Message-
> > From: Lucas Stach 
> > Sent: Sunday, December 1, 2019 7:43 AM
> > To: Kai-Heng Feng ; j...@8bytes.org
> > Cc: Deucher, Alexander ;
> > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> > Subject: Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
> > systems
> >
> > Am Freitag, den 29.11.2019, 22:21 +0800 schrieb Kai-Heng Feng:
> > > Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> > >
> > > According to Alex Deucher, IOMMU isn't enabled on Windows, so let's
> > > do the same here to avoid screen flickering on 4K monitor.
> >
> > This doesn't seem like a good solution, especially if there isn't a
> > method for the user to opt-out.  Some users might prefer having the
> > IOMMU support to 4K display output.
> >
> > But before using the big hammer of disabling or breaking one of those
> > features, we should take a look at what's the issue here. Screen
> > flickering caused by the IOMMU being active hints to the IOMMU not
> > being able to sustain the translation bandwidth required by the high-
> > bandwidth isochronous transfers caused by 4K scanout, most likely due
> > to insufficient TLB space.
> >
> > As far as I know the framebuffer memory for the display buffers is
> > located in stolen RAM, and thus contigous in memory. I don't know the
> > details of the GPU integration on those APUs, but maybe there even is
> > a way to bypass the IOMMU for the stolen VRAM regions?
> >
> > If there isn't and all GPU traffic passes through the IOMMU when
> > active, we should check if the stolen RAM is mapped with hugepages on
> > the IOMMU side. All the stolen RAM can most likely be mapped with a
> > few hugepage mappings, which should reduce IOMMU TLB demand by a
> large margin.
> 
> The is no issue when we scan out of the carve out region.  The issue occurs
> when we scan out of regular system memory (scatter/gather).  Many newer
> laptops have very small carve out regions (e.g., 32 MB), so we have to use
> regular system pages to support multiple high resolution displays.  The
> problem is, the latency gets too high at some point when the IOMMU is
> involved.  Huge pages would probably help in this case, but I'm not sure if
> there is any way to guarantee that we get huge pages for system memory.  I
> guess we could use CMA or something like that.

Thomas recently sent out a patch set to add huge page support to ttm:
https://patchwork.freedesktop.org/series/70090/
We'd still need a way to guarantee huge pages for the display buffer.

Alex

> 
> Alex
> 
> >
> > Regards,
> > Lucas
> >
> > > Cc: Alex Deucher 
> > > Bug:
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > tl
> > >
> >
> ab.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961data=02%7C01%
> > 7Calexa
> > >
> >
> nder.deucher%40amd.com%7C30540b2bf2be417c4d9508d7765bf07f%7C3dd
> > 8961fe4
> > >
> >
> 884e608e11a82d994e183d%7C0%7C0%7C637108010075463266sdata=1
> > ZIZUWos
> > > cPiB4auOY10jlGzoFeWszYMDBQG0CtrrOO8%3Dreserved=0
> > > Signed-off-by: Kai-Heng Feng 
> > > ---
> > > v2:
> > > - Find Stoney graphics instead of host bridge.
> > >
> > >  drivers/iommu/amd_iommu_init.c | 13 -
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/amd_iommu_init.c
> > > b/drivers/iommu/amd_iommu_init.c index 568c52317757..139aa6fdadda
> > > 100644
> > > --- a/drivers/iommu/amd_iommu_init.c
> > > +++ b/drivers/iommu/amd_iommu_init.c
> > > @@ -2516,6 +2516,7 @@ static int __init early_amd_iommu_init(void)
> > >   struct acpi_table_header *ivrs_base;
> > >   acpi_status status;
> > >   int i, remap_cache_sz, ret = 0;
> > > + u32 pci_id;
> > >
> > >   if (!amd_iommu_detected)
> > >   return -ENODEV;
> > > @@ -2603,6 +2604,16 @@ static int __init early_amd_iommu_init(void)
> > >   if (ret)
> > >   goto out;
> > >
> > > + /* Disable IOMMU if there's Stoney Ridge graphics */
> > &

RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-12-02 Thread Deucher, Alexander
> -Original Message-
> From: Lucas Stach 
> Sent: Sunday, December 1, 2019 7:43 AM
> To: Kai-Heng Feng ; j...@8bytes.org
> Cc: Deucher, Alexander ;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
> systems
> 
> Am Freitag, den 29.11.2019, 22:21 +0800 schrieb Kai-Heng Feng:
> > Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> >
> > According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do
> > the same here to avoid screen flickering on 4K monitor.
> 
> This doesn't seem like a good solution, especially if there isn't a method for
> the user to opt-out.  Some users might prefer having the IOMMU support to
> 4K display output.
> 
> But before using the big hammer of disabling or breaking one of those
> features, we should take a look at what's the issue here. Screen flickering
> caused by the IOMMU being active hints to the IOMMU not being able to
> sustain the translation bandwidth required by the high- bandwidth
> isochronous transfers caused by 4K scanout, most likely due to insufficient
> TLB space.
> 
> As far as I know the framebuffer memory for the display buffers is located in
> stolen RAM, and thus contigous in memory. I don't know the details of the
> GPU integration on those APUs, but maybe there even is a way to bypass the
> IOMMU for the stolen VRAM regions?
> 
> If there isn't and all GPU traffic passes through the IOMMU when active, we
> should check if the stolen RAM is mapped with hugepages on the IOMMU
> side. All the stolen RAM can most likely be mapped with a few hugepage
> mappings, which should reduce IOMMU TLB demand by a large margin.

The is no issue when we scan out of the carve out region.  The issue occurs 
when we scan out of regular system memory (scatter/gather).  Many newer laptops 
have very small carve out regions (e.g., 32 MB), so we have to use regular 
system pages to support multiple high resolution displays.  The problem is, the 
latency gets too high at some point when the IOMMU is involved.  Huge pages 
would probably help in this case, but I'm not sure if there is any way to 
guarantee that we get huge pages for system memory.  I guess we could use CMA 
or something like that.

Alex

> 
> Regards,
> Lucas
> 
> > Cc: Alex Deucher 
> > Bug:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> >
> ab.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961data=02%7C01%
> 7Calexa
> >
> nder.deucher%40amd.com%7C30540b2bf2be417c4d9508d7765bf07f%7C3dd
> 8961fe4
> >
> 884e608e11a82d994e183d%7C0%7C0%7C637108010075463266sdata=1
> ZIZUWos
> > cPiB4auOY10jlGzoFeWszYMDBQG0CtrrOO8%3Dreserved=0
> > Signed-off-by: Kai-Heng Feng 
> > ---
> > v2:
> > - Find Stoney graphics instead of host bridge.
> >
> >  drivers/iommu/amd_iommu_init.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c
> > b/drivers/iommu/amd_iommu_init.c index 568c52317757..139aa6fdadda
> > 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2516,6 +2516,7 @@ static int __init early_amd_iommu_init(void)
> > struct acpi_table_header *ivrs_base;
> > acpi_status status;
> > int i, remap_cache_sz, ret = 0;
> > +   u32 pci_id;
> >
> > if (!amd_iommu_detected)
> > return -ENODEV;
> > @@ -2603,6 +2604,16 @@ static int __init early_amd_iommu_init(void)
> > if (ret)
> > goto out;
> >
> > +   /* Disable IOMMU if there's Stoney Ridge graphics */
> > +   for (i = 0; i < 32; i++) {
> > +   pci_id = read_pci_config(0, i, 0, 0);
> > +   if ((pci_id & 0x) == 0x1002 && (pci_id >> 16) == 0x98e4) {
> > +   pr_info("Disable IOMMU on Stoney Ridge\n");
> > +   amd_iommu_disabled = true;
> > +   break;
> > +   }
> > +   }
> > +
> > /* Disable any previously enabled IOMMUs */
> > if (!is_kdump_kernel() || amd_iommu_disabled)
> > disable_iommus();
> > @@ -2711,7 +2722,7 @@ static int __init state_next(void)
> > ret = early_amd_iommu_init();
> > init_state = ret ? IOMMU_INIT_ERROR :
> IOMMU_ACPI_FINISHED;
> > if (init_state == IOMMU_ACPI_FINISHED &&
> amd_iommu_disabled) {
> > -   pr_info("AMD IOMMU disabled on kernel command-
> line\n");
> > +   pr_info("AMD IOMMU disabled\n");
> > init_state = IOMMU_CMDLINE_DISABLED;
> > ret = -EINVAL;
> > }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/amd: Disable IOMMU on Stoney Ridge systems

2019-11-25 Thread Deucher, Alexander
> -Original Message-
> From: Kai-Heng Feng 
> Sent: Sunday, November 24, 2019 4:43 AM
> To: j...@8bytes.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Kai-
> Heng Feng ; Deucher, Alexander
> 
> Subject: [PATCH] iommu/amd: Disable IOMMU on Stoney Ridge systems
> 
> Serious screen flickering when Stoney Ridge outputs to a 4K monitor.
> 
> According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do the
> same here to avoid screen flickering on 4K monitor.
> 
> Cc: Alex Deucher 
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961data=02%7C01%7
> Calexander.deucher%40amd.com%7C75a108e9888645728fc208d770c2b418%
> 7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637101853875648481
> sdata=eQ%2FmiFfy%2FHRJSVurfdnvT%2FLdNMYetIPQdFgnU93l%2Fks
> %3Dreserved=0
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/iommu/amd_iommu_init.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c
> b/drivers/iommu/amd_iommu_init.c index 568c52317757..e05f1b269be6
> 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2516,6 +2516,7 @@ static int __init early_amd_iommu_init(void)
>   struct acpi_table_header *ivrs_base;
>   acpi_status status;
>   int i, remap_cache_sz, ret = 0;
> + u32 pci_id;
> 
>   if (!amd_iommu_detected)
>   return -ENODEV;
> @@ -2603,6 +2604,13 @@ static int __init early_amd_iommu_init(void)
>   if (ret)
>   goto out;
> 
> + /* Get the host bridge VID/PID and disable IOMMU if it's Stoney
> Ridge */
> + pci_id = read_pci_config(0, 0, 0, 0);
> + if ((pci_id & 0x) == 0x1022 && (pci_id >> 16) == 0x1576) {

I'm not sure if the IOMMU device id is unique to stoney.  I think it's the same 
DID for the entire APU generation.   I think it would be better to walk the bus 
and try and find the stoney GPU and only in that case, disable the IOMMU.  
E.g., if the user has disabled the GPU portion of the APU or has a dGPU 
installed, they may will want to use the IOMMU.  It's only the integrated GPU 
that has a problem when trying to display high res modes out of system memory 
with the IOMMU due to the added latency.
The stoney GPU is VID 0x1002, DID 0x98E4.

Alex

> + pr_info("Disable IOMMU on Stoney Ridge\n");
> + amd_iommu_disabled = true;
> + }
> +
>   /* Disable any previously enabled IOMMUs */
>   if (!is_kdump_kernel() || amd_iommu_disabled)
>   disable_iommus();
> @@ -2711,7 +2719,7 @@ static int __init state_next(void)
>   ret = early_amd_iommu_init();
>   init_state = ret ? IOMMU_INIT_ERROR :
> IOMMU_ACPI_FINISHED;
>   if (init_state == IOMMU_ACPI_FINISHED &&
> amd_iommu_disabled) {
> - pr_info("AMD IOMMU disabled on kernel command-
> line\n");
> + pr_info("AMD IOMMU disabled\n");
>   init_state = IOMMU_CMDLINE_DISABLED;
>   ret = -EINVAL;
>   }
> --
> 2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: Feature Request: Ability to decode bus/dma address back into physical address

2017-08-01 Thread Deucher, Alexander
> -Original Message-
> From: StDenis, Tom
> Sent: Tuesday, August 01, 2017 4:27 PM
> To: Jerome Glisse
> Cc: iommu@lists.linux-foundation.org; Deucher, Alexander; Koenig, Christian
> Subject: Re: Feature Request: Ability to decode bus/dma address back into
> physical address
> 
> Was trying to walk away from this ... :-) (all in good fun).
> 
> 
> On 01/08/17 03:55 PM, Jerome Glisse wrote:
> > On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
> >> Adding the AMDGPU maintainers to get their opinions.
> >>
> >> Context:
> >> https://lists.linuxfoundation.org/pipermail/iommu/2017-
> August/023489.html
> >>
> >> (and others in case you missed it on the list).
> >>
> >>
> >> On 01/08/17 03:03 PM, Jerome Glisse wrote:
> >>>>> You would need to leverage thing like uevent to get event when
> something
> >>>>> happen like a bo being destroy or command submission ...
> >>>>
> >>>> The problem with this approach is when I'm reading an IB I'm not given
> user
> >>>> space addresses but bus addresses.  So I can't correlate anything I'm
> seeing
> >>>> in the hardware with the user task if I wanted to.
> >>>>
> >>>> In fact, to augment [say] OpenGL debugging I would have to correlate a
> >>>> buffer handle/pointer's page backing with the bus address in the IB so I
> >>>> could correlate the two (e.g. dump an IB and print out user process
> variable
> >>>> names that correspond to the IB contents...).
> >>>
> >>> When you read IB you are provided with GPU virtual address, you can
> get the
> >>> GPU virtual address from the same snoop ioctl just add a field in bo_info
> >>> above. So i don't see any issue here.
> >>
> >> This is effectively what I'm doing with the patch except via a trace not a
> >> bolted on snooping ioctl.
> >>
> >> Tracers are a bit better since there's not really any overhead in the 
> >> normal
> >> case which is desirable (and the code is relatively very simple).
> >>
> >> In an ideal case we could simply search the ttm pages for a given device to
> >> look for a match for a given dma address.  But that also has the problem
> >> that ...
> >>
> >> This will fail for all the other allocations e.g. for our GART, ring
> >> buffers, etc.
> >>
> >> So a "complete" solution would be nice where any bus address mapping
> that is
> >> still valid for a device could be resolved to the physical page that is
> >> backing it.
> >
> > So you agree that you can get what you want with ioctl ? Now you object
> > that the ioctl overhead is too important ?
> 
> No, I object that the overhead of keeping track of it in some sort of
> snoop structure is too much (and adds too much new code to the driver
> for the sole purpose of this one feature).
> 
> Not to mention we can't really invent ioctls without co-ordinating with
> with the other drm users (re: not just AMD).  umr is a lot less coupled
> since it's not really versioned just yet (still bringing up a lot of
> features so users use git not packages).
> 
> > I do not believe ioctl overhead to be an issue. Like i say in the model
> > i adviced you mix ioctl and thing like uevent or trace so you get "real
> > time" notification of what is going on.
> 
> Unless you can nop that in a config invariant fashion (like you can for
> tracers) that's a NAK from the get go.  And we'd need to buffer them to
> be practical since you might run the debugger out of sync with the
> application (e.g. app hangs then you fire up umr to see what's going on).
> 
> > Why do i say that your approach of tracking individual page mapping is
> > insane. There is several obstacle:
> >- to get from GPU virtual address to physical memory address you need
> >  to first walk down the GPU page table to get the GPU pte entry. From
> >  that you either get an address inside the GPU memory (VRAM of PCIE
> >  device) or a DMA bus address. If it is the latter (DMA bus address)
> >  you need to walk down the IOMMU page table to find the physical
> address
> >  of the memory. So this is 2 page table walk down
> 
> We already do this in umr.  give a GPUVM address we can decode it down
> to a PTE on both AI and VI platforms.  It's decoding the PTE to a
> physical page (outside of vram) that is the issue.
> 
> In fact being able to VM decode is important to the kernel team who ha

RE: [PATCH v1 3/3] iommu/amd: Optimize the IOMMU queue flush

2017-06-28 Thread Deucher, Alexander
> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Wednesday, June 28, 2017 4:37 AM
> To: Jan Vesely; Deucher, Alexander
> Cc: Lendacky, Thomas; Nath, Arindam; Craig Stein; iommu@lists.linux-
> foundation.org; Duran, Leo; Suthikulpanit, Suravee
> Subject: Re: [PATCH v1 3/3] iommu/amd: Optimize the IOMMU queue flush
> 
> [Adding Alex Deucher]
> 
> Hey Alex,
> 
> On Tue, Jun 27, 2017 at 12:24:35PM -0400, Jan Vesely wrote:
> > On Mon, 2017-06-26 at 14:14 +0200, Joerg Roedel wrote:
> 
> > > How does that 'dGPU goes to sleep' work? Do you put it to sleep
> manually
> > > via sysfs or something? Or is that something that amdgpu does on its
> > > own?
> >
> > AMD folks should be able to provide more details. afaik, the driver
> > uses ACPI methods to power on/off the device. Driver routines wake the
> > device up before accessing it and there is a timeout to turn it off
> > after few seconds of inactivity.
> >
> > >
> > > It looks like the GPU just switches the ATS unit off when it goes to
> > > sleep and doesn't answer the invalidation anymore, which explains the
> > > completion-wait timeouts.
> >
> > Both MMIO regs and PCIe config regs are turned off so it would not
> > surprise me if all PCIe requests were ignored by the device in off
> > state. it should be possible to request device wake up before
> > invalidating the relevant IOMMU domain. I'll leave to more
> > knowledgeable ppl to judge whether it's a good idea (we can also
> > postpone such invalidations until the device is woken by other means)
> 
> Can you maybe sched some light on how the sleep-mode of the GPUs work?
> Is it initiated by the GPU driver or from somewhere else? In the case
> discussed here it looks like the ATS unit of the GPU is switched of,
> causing IOTLB invalidation timeouts on the IOMMU side.
> 
> If that is the case we might need some sort of dma-api extension so that
> the GPU driver can tell the iommu driver that the device is going to be
> quiet.

I assume you are talking about Hybrid/PowerXpress laptops where the dGPU can be 
powered down dynamically?  That is done via the runtime pm subsystem in the 
kernel.  We register several callbacks with that, and then it takes care of the 
power down auto timers and such.  The actual mechanism to power down the GPU 
varies for platform to platform (platform specific ACPI methods on early 
systems, D3cold on newer ones).

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)

2017-05-23 Thread Deucher, Alexander
> -Original Message-
> From: Arindam Nath [mailto:anath@gmail.com] On Behalf Of
> arindam.n...@amd.com
> Sent: Monday, May 22, 2017 3:48 AM
> To: iommu@lists.linux-foundation.org
> Cc: amd-...@lists.freedesktop.org; Joerg Roedel; Deucher, Alexander;
> Bridgman, John; dr...@endlessm.com; Suthikulpanit, Suravee;
> li...@endlessm.com; Craig Stein; mic...@daenzer.net; Kuehling, Felix;
> sta...@vger.kernel.org; Nath, Arindam
> Subject: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)
> 
> From: Arindam Nath <arindam.n...@amd.com>
> 
> Change History
> --
> 
> v3:
> - add Fixes and CC tags
> - add link to Bugzilla
> 
> v2: changes suggested by Joerg
> - add flush flag to improve efficiency of flush operation
> 
> v1:
> - The idea behind flush queues is to defer the IOTLB flushing
>   for domains for which the mappings are no longer valid. We
>   add such domains in queue_add(), and when the queue size
>   reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().
> 
>   Since we have already taken lock before __queue_flush()
>   is called, we need to make sure the IOTLB flushing is
>   performed as quickly as possible.
> 
>   In the current implementation, we perform IOTLB flushing
>   for all domains irrespective of which ones were actually
>   added in the flush queue initially. This can be quite
>   expensive especially for domains for which unmapping is
>   not required at this point of time.
> 
>   This patch makes use of domain information in
>   'struct flush_queue_entry' to make sure we only flush
>   IOTLBs for domains who need it, skipping others.
> 
> Bugzilla: https://bugs.freedesktop.org/101029
> Fixes: b1516a14657a ("iommu/amd: Implement flush queue")
> Cc: sta...@vger.kernel.org
> Suggested-by: Joerg Roedel <j...@8bytes.org>
> Signed-off-by: Arindam Nath <arindam.n...@amd.com>

Acked-by: Alex Deucher <alexander.deuc...@amd.com>

> ---
>  drivers/iommu/amd_iommu.c   | 27 ---
>  drivers/iommu/amd_iommu_types.h |  2 ++
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5..1edeebec 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2227,15 +2227,26 @@ static struct iommu_group
> *amd_iommu_device_group(struct device *dev)
> 
>  static void __queue_flush(struct flush_queue *queue)
>  {
> - struct protection_domain *domain;
> - unsigned long flags;
>   int idx;
> 
> - /* First flush TLB of all known domains */
> - spin_lock_irqsave(_iommu_pd_lock, flags);
> - list_for_each_entry(domain, _iommu_pd_list, list)
> - domain_flush_tlb(domain);
> - spin_unlock_irqrestore(_iommu_pd_lock, flags);
> + /* First flush TLB of all domains which were added to flush queue */
> + for (idx = 0; idx < queue->next; ++idx) {
> + struct flush_queue_entry *entry;
> +
> + entry = queue->entries + idx;
> +
> + /*
> +  * There might be cases where multiple IOVA entries for the
> +  * same domain are queued in the flush queue. To avoid
> +  * flushing the same domain again, we check whether the
> +  * flag is set or not. This improves the efficiency of
> +  * flush operation.
> +  */
> + if (!entry->dma_dom->domain.already_flushed) {
> + entry->dma_dom->domain.already_flushed = true;
> + domain_flush_tlb(>dma_dom->domain);
> + }
> + }
> 
>   /* Wait until flushes have completed */
>   domain_flush_complete(NULL);
> @@ -2289,6 +2300,8 @@ static void queue_add(struct dma_ops_domain
> *dma_dom,
>   pages = __roundup_pow_of_two(pages);
>   address >>= PAGE_SHIFT;
> 
> + dma_dom->domain.already_flushed = false;
> +
>   queue = get_cpu_ptr(_queue);
>   spin_lock_irqsave(>lock, flags);
> 
> diff --git a/drivers/iommu/amd_iommu_types.h
> b/drivers/iommu/amd_iommu_types.h
> index 4de8f41..4f5519d 100644
> --- a/drivers/iommu/amd_iommu_types.h
> +++ b/drivers/iommu/amd_iommu_types.h
> @@ -454,6 +454,8 @@ struct protection_domain {
>   bool updated;   /* complete domain flush required */
>   unsigned dev_cnt;   /* devices assigned to this domain */
>   unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference
> count */
> + bool already_flushed;   /* flag to avoid flushing the same domain
> again
> +in a single invocation of __queue_flush() */
>  };
> 
>  /*
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

2017-03-21 Thread Deucher, Alexander
> -Original Message-
> From: 'j...@8bytes.org' [mailto:j...@8bytes.org]
> Sent: Tuesday, March 21, 2017 12:26 PM
> To: Deucher, Alexander
> Cc: Bridgman, John; Alex Deucher; Daniel Drake; Chris Chiu; amd-
> g...@lists.freedesktop.org; Nath, Arindam; iommu@lists.linux-
> foundation.org; Suthikulpanit, Suravee; Linux Upstreaming Team
> Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> On Tue, Mar 21, 2017 at 04:17:40PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: 'j...@8bytes.org' [mailto:j...@8bytes.org]
> > > Sent: Tuesday, March 21, 2017 12:11 PM
> > > To: Deucher, Alexander
> > > Cc: Alex Deucher; Daniel Drake; Chris Chiu; amd-
> g...@lists.freedesktop.org;
> > > Nath, Arindam; iommu@lists.linux-foundation.org; Suthikulpanit,
> Suravee;
> > > Linux Upstreaming Team
> > > Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-
> Wait
> > > loop timed out
> > >
> > > On Tue, Mar 21, 2017 at 04:01:53PM +, Deucher, Alexander wrote:
> > > > It seems to only affect Stoney systems, but not others (Carrizo,
> > > > Bristol, etc.).  Maybe we could just disable it on Stoney until we
> > > > root cause it.
> > >
> > > Completion-wait loop timeouts indicate something is seriously wrong.
> How
> > > can I detect whether I am running on a 'Stoney' system?
> >
> > + John
> >
> > I'm not sure if the iommu ids are different on stoney systems compared to
> Carrizo/Bristol systems.  The pci ids of the GPUs are different.  Stoney parts
> have 0x98E4 as the pci id for the GPU.
> >
> > >
> > > Other question, a shot into the dark, does the GPU on these systems
> have
> > > ATS? Probably yes, as they are likely HSA compatible.
> >
> > Stoney is a small APU.  Kind of a mini Carrizo.  While it may claim to 
> > support
> ATS, I don't think it was ever validated on Stoney, only Carrizo/Bristol.
> 
> Okay, so maybe ATS is broken in some way on these chips. When queue
> flushes happen it will also send the ATS-invalidates, and a queue flush
> can cause a storm of those. This may be the issue.
> 
> I am preparing a debug-patch that disables ATS for these GPUs so someone
> with such a chip can test it.

Thanks Joerg.

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

2017-03-21 Thread Deucher, Alexander
> -Original Message-
> From: 'j...@8bytes.org' [mailto:j...@8bytes.org]
> Sent: Tuesday, March 21, 2017 12:11 PM
> To: Deucher, Alexander
> Cc: Alex Deucher; Daniel Drake; Chris Chiu; amd-...@lists.freedesktop.org;
> Nath, Arindam; iommu@lists.linux-foundation.org; Suthikulpanit, Suravee;
> Linux Upstreaming Team
> Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> On Tue, Mar 21, 2017 at 04:01:53PM +, Deucher, Alexander wrote:
> > It seems to only affect Stoney systems, but not others (Carrizo,
> > Bristol, etc.).  Maybe we could just disable it on Stoney until we
> > root cause it.
> 
> Completion-wait loop timeouts indicate something is seriously wrong. How
> can I detect whether I am running on a 'Stoney' system?

+ John

I'm not sure if the iommu ids are different on stoney systems compared to 
Carrizo/Bristol systems.  The pci ids of the GPUs are different.  Stoney parts 
have 0x98E4 as the pci id for the GPU.

> 
> Other question, a shot into the dark, does the GPU on these systems have
> ATS? Probably yes, as they are likely HSA compatible.

Stoney is a small APU.  Kind of a mini Carrizo.  While it may claim to support 
ATS, I don't think it was ever validated on Stoney, only Carrizo/Bristol.

Alex


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

2017-03-21 Thread Deucher, Alexander
> -Original Message-
> From: j...@8bytes.org [mailto:j...@8bytes.org]
> Sent: Tuesday, March 21, 2017 11:57 AM
> To: Alex Deucher
> Cc: Daniel Drake; Deucher, Alexander; Chris Chiu; amd-
> g...@lists.freedesktop.org; Nath, Arindam; iommu@lists.linux-
> foundation.org; Suthikulpanit, Suravee; Linux Upstreaming Team
> Subject: Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> On Fri, Mar 17, 2017 at 11:53:09AM -0400, Alex Deucher wrote:
> > On Fri, Mar 17, 2017 at 8:15 AM, Daniel Drake <dr...@endlessm.com>
> wrote:
> > > Hi,
> > >
> > > On Mon, Mar 13, 2017 at 2:01 PM, Deucher, Alexander
> > > <alexander.deuc...@amd.com> wrote:
> > >> > We are unable to boot Acer Aspire E5-553G (AMD FX-9800P RADEON
> R7) nor
> > >> > Acer Aspire E5-523 with standard configurations because during boot
> > >> > the screen is flooded with the following error message over and over:
> > >> >
> > >> >   AMD-Vi: Completion-Wait loop timed out
> > >>
> > >> We ran into similar issues and bisected it to commit
> b1516a14657acf81a587e9a6e733a881625eee53.  I'm not too familiar with the
> IOMMU hardware to know if this is an iommu or display driver issue yet.
> > >
> > > We can confirm that reverting this commit solves the issue.
> > >
> > > Given that that commit is an optimization, but it has introduced a
> > > regression on multiple platforms, and has been like this for 8 months,
> > > it would be common practice to now revert this patch upstream until
> > > the regression is fixed. Could you please send a new patch to do this?
> > >
> > > Also, we would be happy to test any real solutions to this issue while
> > > we still have the affected units in hand.
> >
> > No objections to a revert here.
> 
> Big objection here. Since this only happens with amdgpu so far we
> shouldn't rule out a display-driver issue.
> 
> Reverting that patch basically destroys iommu-performance on AMD
> systems. Doing this for all devices just to make amdgpu working is
> overkill at this stage of the debugging.

It seems to only affect Stoney systems, but not others (Carrizo, Bristol, 
etc.).  Maybe we could just disable it on Stoney until we root cause it.

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

2017-03-13 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Daniel Drake
> Sent: Monday, March 13, 2017 3:50 PM
> To: j...@8bytes.org
> Cc: Chris Chiu; iommu@lists.linux-foundation.org; Linux Upstreaming Team;
> amd-...@lists.freedesktop.org
> Subject: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait
> loop timed out
> 
> Hi,
> 
> We are unable to boot Acer Aspire E5-553G (AMD FX-9800P RADEON R7) nor
> Acer Aspire E5-523 with standard configurations because during boot
> the screen is flooded with the following error message over and over:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> We have left the system for quite a while but the message spam does
> not stop and the system doesn't complete the boot sequence.
> 
> We have reproduced on Linux 4.8 and Linux 4.10.
> 
> To avoid this, we can boot with iommu=soft or just disable the amdgpu
> display driver.
> 
> Looks like this may also affect HP 15-ba012no :
> https://bugzilla.redhat.com/show_bug.cgi?id=1409201
> 
> Earlier during boot the iommu is detected as:
> 
> [1.274518] AMD-Vi: Found IOMMU at :00:00.2 cap 0x40
> [1.274519] AMD-Vi: Extended features (0x37ef22294ada):
> [1.274519]  PPR NX GT IA GA PC GA_vAPIC
> [1.274523] AMD-Vi: Interrupt remapping enabled
> [1.274523] AMD-Vi: virtual APIC enabled
> [1.275144] AMD-Vi: Lazy IO/TLB flushing enabled
> [1.276498] perf: AMD NB counters detected
> [1.278096] LVT offset 0 assigned for vector 0x400
> [1.278963] perf: AMD IBS detected (0x07ff)
> [1.278977] perf: amd_iommu: Detected. (0 banks, 0 counters/bank)
> 
> Any suggestions for how we can fix this, or get more useful debug info?

+Suravee and Arindam

We ran into similar issues and bisected it to commit 
b1516a14657acf81a587e9a6e733a881625eee53.  I'm not too familiar with the IOMMU 
hardware to know if this is an iommu or display driver issue yet.

Alex

> 
> Thanks
> Daniel
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu