Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2021-01-15 Thread jroe...@suse.de
Hi Edgar,

On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Just wanted to follow-up on that topic.
> Is that quirk already put into upstream kernel?

Sorry for the late reply, I had to take an extended sick leave. I will
take care of sending this fix upstream next week.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 02:28:27PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Alright, so is this going to make it into an upstream-Kernel?

Yes, but please test it first. It should apply on-top of a 5.9.3 kernel.
If it works I can send a patch and will Cc you as well as a few other
folks.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Thank you. I do think that this is the GPU. Would you please elaborate
> on what that quirk would be?

The GPU seems to have broken ATS, or require driver setup to make ATS
work. Anyhow, ATS is unstable for Linux to use, so it must not be
enabled.

This diff to the kernel should do that:

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f70692ac79c5..3911b0ec57ba 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, 
quirk_amd_harvest_no_ats);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats);
 /* AMD Navi14 dGPU */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
+/* AMD Raven platform iGPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats);
 #endif /* CONFIG_PCI_ATS */
 
 /* Freescale PCIe doesn't support MSI in RC mode */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> With Kernel 5.9.3 kernel-parameter pci=noats the system is running for
> 19hours now in reboot-test without the error to occur.

Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you
please send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0
ais your GPU)?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-05 Thread jroe...@suse.de
On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> One remark:
> With kernel-parameter pci=noats in dmesg there is
> 
> [   10.128463] kfd kfd: Error initializing iommuv2

That is expected. IOMMUv2 depends on ATS support.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread jroe...@suse.de
On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting
with "pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the
machine where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de  
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too. 
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread jroe...@suse.de
Hi Edgar,

On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> With one board we have a boot-problem that is reproducible at every ~50 boot.
> The system is accessible via ssh and works fine except for the Graphics. The
> graphics is off. We don´t see a screen. Please see attached “dmesg.log”. From
> [52.772273] onwards the kernel reports drm/amdgpu errors. It even tries to
> reset the GPU but that fails too. I tried to reset amdgpu also by command 
> “sudo
> cat /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.

Can you reproduce the problem with an upstream kernel too?

These messages in dmesg indicate some problem in the platform setup:

AMD-Vi: Completion-Wait loop timed out

Might there be some inconsistencies in the PCI setup between the bridges
and the endpoints or something?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread jroe...@suse.de
On Fri, Aug 28, 2020 at 03:11:32PM +, Deucher, Alexander wrote:
> There are hw bugs on Raven and probably Carrizo/Stoney where they need
> 1:1 mapping to avoid bugs in some corner cases with the displays.
> Other GPUs should be fine.  The VIDs is 0x1002 and the DIDs are 0x15dd
> and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876,
> 0x9877 and 0x98e4 for carrizo and stoney.  As long as we
> preserve the 1:1 mapping for those asics, we should be fine.

Okay, Stoney at least has no Zen-based CPU, so no support for memory
encryption anyway. How about Raven, is it paired with a Zen CPU?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread jroe...@suse.de
Hi Felix,

On Fri, Aug 28, 2020 at 09:54:59AM -0400, Felix Kuehling wrote:
> Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not
> needed for graphics. And we're making it optional for KFD as well.

Okay, KFD should fail gracefully because it can't initialize the
device's iommuv2 functionality.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread jroe...@suse.de
On Wed, Aug 26, 2020 at 03:25:58PM +, Deucher, Alexander wrote:
> > Alex, do you know if anyone has tested amdgpu on an APU with SME
> > enabled? Is this considered something we support?
> 
> It's not something we've tested.  I'm not even sure the GPU portion of
> APUs will work properly without an identity mapping.  SME should work
> properly with dGPUs however, so this is a proper fix for them.  We
> don't use the IOMMUv2 path on dGPUs at all.

Is it possible to make the IOMMUv2 paths optional on iGPUs as well when
SME is active (or better, when the GPU is not identity mapped)?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device specific pass through in host systems - discuss user interface

2019-07-01 Thread jroe...@suse.de
On Tue, Jun 11, 2019 at 05:27:15PM +, Prakhya, Sai Praneeth wrote:
> 1. Since we already have "type" file, which is "read-only", we could make it 
> R/W.
> 
> The present value shows the existing type of default domain.
> If user wants to change it (Eg: from DMA to IDENTITY or vice versa), he 
> attempts to write the new value.
> Kernel performs checks to make sure that the driver in unbinded and it's safe 
> to change the default domain type.
> After successfully changing the default_domain type internally, kernel 
> reflects the new value in the file.
> Ay errors in the process will be reported in dmesg.

I prefer this way. Writing to the file should fail with -EBUSY when it
is not safe to change the default domain-type. Writing should only
succeed when no device in the group is assigned to a device driver.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: print out "tag" in INVALID_PPR_REQUEST

2019-05-28 Thread jroe...@suse.de
On Tue, May 14, 2019 at 10:55:46AM -0400, Qian Cai wrote:
> Jroedel, I am wondering what the plan for 41e59a41fc5d1 (iommu tree) or this
> patch to be pushed to the linux-next or mainline...

Looks like I applied that patch directly to the master branch, which is
not what goes upstream. I cherry-picked it to x86/amd, so it will go
upstream for v5.3.


Regards,

Joerg


Re: [PATCH 2/4] iommu: add a bitmap based dma address allocator

2015-11-25 Thread jroe...@suse.de
On Tue, Nov 24, 2015 at 02:05:12PM -0800, Shaohua Li wrote:
> The lib/iommu-common.c uses a bitmap and a lock. This implementation
> actually uses a percpu_ida which completely avoids locking. It would be
> possible to make lib/iommu-common.c use percpu_ida too if somebody wants
> to do it, but I think this shouldn't be a blocker for these patches
> giving it has huge performance gain.

It doesn't "completely avoids locking", the percpu_ida code uses a lock
internally too. Also, what is the memory and device address space
overhead per cpu?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/22] iommu: Introduce direct mapped region handling

2015-06-05 Thread jroe...@suse.de
Hi Will,

On Fri, Jun 05, 2015 at 03:17:50PM +0100, Will Deacon wrote:
 On Thu, May 28, 2015 at 05:41:33PM +0100, Joerg Roedel wrote:
  +/**
  + * struct iommu_dm_region - descriptor for a direct mapped memory region
  + * @list: Linked list pointers
  + * @start: System physical start address of the region
  + * @length: Length of the region in bytes
  + * @prot: IOMMU Protection flags (READ/WRITE/...)
  + */
  +struct iommu_dm_region {
  +   struct list_headlist;
  +   phys_addr_t start;
  +   size_t  length;
  +   int prot;
  +};
 
 I'm slightly puzzled about this. It looks to me like we're asking the
 IOMMU driver to construct a description of the system's physical address
 space, but this information tends to be known elsewhere for things like
 initialising lowmem on the CPU using memblock.

Well, this is not about the general memory layout of the machine, it is
more about the requirements of the firmware. The firmware might have
their own mapping requirements, for example a USB controler that is
handled by the BIOS. Other devices (be2net adapters with special
firmware) might have such requirements too. On x86 these requirements
are described in the IOMMU ACPI tables (RMRR entries on Intel, Unity
mappings on AMD).

 Also, it looks like we just use these regions to create the default
 domain using iommu_map calls -- why don't we just have an IOMMU callback
 to initialise the default domain instead? That would allow IOMMUs with a
 per-master bypass mode to avoid allocating page tables altogether.

In theory yes, but this information is not only needed for the creation
of default domains, but also for a generic DMA-API implementation for
IOMMU drivers. A DMA-API implementation has to mark these address ranges
as reserved in its address allocator, so it is better to export this
information than doing the handling in the iommu drivers.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/22 v2] Introduce default domains for iommu groups

2015-06-05 Thread jroe...@suse.de
Hi Will,

Thanks for having a look!

On Fri, Jun 05, 2015 at 03:22:06PM +0100, Will Deacon wrote:
 
 Most of this looks fine to me, modulo by comments about the dm regions
 (which I'm not sure how to implement for ARM).

When there are no direct mapping requirements from the firmware on ARM,
you can just return an empty list in these call-backs.

  A major change is that now the default domain has to be
  allocated by the code that allocates the iommu group. For
  PCI devices this happens in the IOMMU core, but drivers
  allocating the group on their own can now implement a policy
  that fits their needs (e.g. not allocate one domain per
  group but let multiple groups share one domain).
 
 Makes sense. I really think we should be moving group allocation out of
 the IOMMU drivers and into the bus code, like we already have for PCI.
 Once we've got a way to describe groups of platform devices (e.g. in the
 device-tree), then we can have the group creation happen automatically
 as part of Laurent's of_iommu work.

Yes, that makes sense. And PCI is pretty hardcoded into the iommu-groups
implementation right now. This needs probably be more generic too.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/7] dma-mapping: detect and configure IOMMU in of_dma_configure

2014-10-27 Thread jroe...@suse.de
On Mon, Oct 27, 2014 at 04:02:16PM +, Will Deacon wrote:
 On Mon, Oct 27, 2014 at 11:30:33AM +, Laurent Pinchart wrote:
  I'm not sure to follow you here. Aren't we already exposing masters that 
  master through multiple IOMMUs as single instances of struct device ?
 
 Hmm, yes, now you've confused me too! The conclusion was certainly that
 dma-mapping should not be the one dealing with the I/O topology. Domain
 allocation would then be an iommu callback (something like
 -get_default_domain), but the rest of the details weren't fleshed out.

The idea is that the IOMMU core code will allocate a default domain for
each iommu-group at initialization time. This domain can be requested
later by a new iommu-api function and used for DMA-API mappings.

A device still can be assigned to another domain by driver code (like
VFIO). But if the device is later de-assigned the IOMMU core-code
automatically puts it back into the default domain.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu: replace IOMMU_EXEC with IOMMU_EXEC and update ARM SMMU driver

2014-10-22 Thread jroe...@suse.de
On Mon, Oct 20, 2014 at 07:42:01PM +0100, Will Deacon wrote:
 On Mon, Oct 20, 2014 at 04:39:15PM +0100, Will Deacon wrote:
  On Mon, Oct 13, 2014 at 02:06:15PM +0100, Antonios Motakis wrote:
   This patch series applies to Joerg Roedel's iommu/next branch, commit 
   09b5269a.
   It replaces the IOMMU_EXEC flag used by the ARM SMMU driver to 
   IOMMU_NOEXEC.
   This is more enforceable, since the lack of the flag on hardware that 
   doesn't
   support it implies that the target memory will be executable.
  
  Looks good to me; I'll take this via the arm-smmu tree and send it to Joerg
  along with anything else that gets queued for 3.19.
 
 The 0-day builder spotted a new warning from this series:
 
drivers/iommu/amd_iommu.c: In function 'amd_iommu_capable':
  drivers/iommu/amd_iommu.c:3409:2: warning: enumeration value 
  'IOMMU_CAP_NOEXEC' not handled in switch [-Wswitch]
  switch (cap) {
  ^
 
 I fixed it with the patch below, but I'd appreciate you and Joerg taking
 a look too.
 
 Cheers,
 
 Will
 
 ---8
 
 diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
 index 505a9adac2d5..3d78a8fb5a6a 100644
 --- a/drivers/iommu/amd_iommu.c
 +++ b/drivers/iommu/amd_iommu.c
 @@ -3411,6 +3411,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
 return true;
 case IOMMU_CAP_INTR_REMAP:
 return (irq_remapping_enabled == 1);
 +   case IOMMU_CAP_NOEXEC:
 +   return false;
 }

Looks good to me.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 4/7] iommu: provide helper function to configure an IOMMU for an of master

2014-09-02 Thread jroe...@suse.de
On Tue, Sep 02, 2014 at 04:01:32PM +0200, Arnd Bergmann wrote:
 This is an artifact of the API being single-instance at the moment.
 We might not in fact need it, I was just trying to think of things
 that naturally fit in there and that are probably already linked
 together in the individual iommu drivers.

I am not sure what you mean by single-instance. Is it that currently the
API only supports one type of iommu_ops per bus? That should be fine as
long as there is only one type of IOMMU on the bus.

Besides that, it is a feature of the IOMMU-API to hide the details about
all the hardware IOMMUs in the system from its users.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu