Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Edgar, On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Just wanted to follow-up on that topic. > Is that quirk already put into upstream kernel? Sorry for the late reply, I had to take an extended sick leave. I will take care of sending this fix upstream next week. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 02:28:27PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Alright, so is this going to make it into an upstream-Kernel? Yes, but please test it first. It should apply on-top of a 5.9.3 kernel. If it works I can send a patch and will Cc you as well as a few other folks. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Thank you. I do think that this is the GPU. Would you please elaborate > on what that quirk would be? The GPU seems to have broken ATS, or require driver setup to make ATS work. Anyhow, ATS is unstable for Linux to use, so it must not be enabled. This diff to the kernel should do that: diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index f70692ac79c5..3911b0ec57ba 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); +/* AMD Raven platform iGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */ /* Freescale PCIe doesn't support MSI in RC mode */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > With Kernel 5.9.3 kernel-parameter pci=noats the system is running for > 19hours now in reboot-test without the error to occur. Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > One remark: > With kernel-parameter pci=noats in dmesg there is > > [ 10.128463] kfd kfd: Error initializing iommuv2 That is expected. IOMMUv2 depends on ATS support. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: amdgpu error whenever IOMMU is enabled
Hi Edgar, On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > With one board we have a boot-problem that is reproducible at every ~50 boot. > The system is accessible via ssh and works fine except for the Graphics. The > graphics is off. We don´t see a screen. Please see attached “dmesg.log”. From > [52.772273] onwards the kernel reports drm/amdgpu errors. It even tries to > reset the GPU but that fails too. I tried to reset amdgpu also by command > “sudo > cat /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. Can you reproduce the problem with an upstream kernel too? These messages in dmesg indicate some problem in the platform setup: AMD-Vi: Completion-Wait loop timed out Might there be some inconsistencies in the PCI setup between the bridges and the endpoints or something? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active
On Fri, Aug 28, 2020 at 03:11:32PM +, Deucher, Alexander wrote: > There are hw bugs on Raven and probably Carrizo/Stoney where they need > 1:1 mapping to avoid bugs in some corner cases with the displays. > Other GPUs should be fine. The VIDs is 0x1002 and the DIDs are 0x15dd > and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876, > 0x9877 and 0x98e4 for carrizo and stoney. As long as we > preserve the 1:1 mapping for those asics, we should be fine. Okay, Stoney at least has no Zen-based CPU, so no support for memory encryption anyway. How about Raven, is it paired with a Zen CPU? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active
Hi Felix, On Fri, Aug 28, 2020 at 09:54:59AM -0400, Felix Kuehling wrote: > Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not > needed for graphics. And we're making it optional for KFD as well. Okay, KFD should fail gracefully because it can't initialize the device's iommuv2 functionality. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active
On Wed, Aug 26, 2020 at 03:25:58PM +, Deucher, Alexander wrote: > > Alex, do you know if anyone has tested amdgpu on an APU with SME > > enabled? Is this considered something we support? > > It's not something we've tested. I'm not even sure the GPU portion of > APUs will work properly without an identity mapping. SME should work > properly with dGPUs however, so this is a proper fix for them. We > don't use the IOMMUv2 path on dGPUs at all. Is it possible to make the IOMMUv2 paths optional on iGPUs as well when SME is active (or better, when the GPU is not identity mapped)? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device specific pass through in host systems - discuss user interface
On Tue, Jun 11, 2019 at 05:27:15PM +, Prakhya, Sai Praneeth wrote: > 1. Since we already have "type" file, which is "read-only", we could make it > R/W. > > The present value shows the existing type of default domain. > If user wants to change it (Eg: from DMA to IDENTITY or vice versa), he > attempts to write the new value. > Kernel performs checks to make sure that the driver in unbinded and it's safe > to change the default domain type. > After successfully changing the default_domain type internally, kernel > reflects the new value in the file. > Ay errors in the process will be reported in dmesg. I prefer this way. Writing to the file should fail with -EBUSY when it is not safe to change the default domain-type. Writing should only succeed when no device in the group is assigned to a device driver. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: print out "tag" in INVALID_PPR_REQUEST
On Tue, May 14, 2019 at 10:55:46AM -0400, Qian Cai wrote: > Jroedel, I am wondering what the plan for 41e59a41fc5d1 (iommu tree) or this > patch to be pushed to the linux-next or mainline... Looks like I applied that patch directly to the master branch, which is not what goes upstream. I cherry-picked it to x86/amd, so it will go upstream for v5.3. Regards, Joerg
Re: [PATCH 2/4] iommu: add a bitmap based dma address allocator
On Tue, Nov 24, 2015 at 02:05:12PM -0800, Shaohua Li wrote: > The lib/iommu-common.c uses a bitmap and a lock. This implementation > actually uses a percpu_ida which completely avoids locking. It would be > possible to make lib/iommu-common.c use percpu_ida too if somebody wants > to do it, but I think this shouldn't be a blocker for these patches > giving it has huge performance gain. It doesn't "completely avoids locking", the percpu_ida code uses a lock internally too. Also, what is the memory and device address space overhead per cpu? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/22] iommu: Introduce direct mapped region handling
Hi Will, On Fri, Jun 05, 2015 at 03:17:50PM +0100, Will Deacon wrote: On Thu, May 28, 2015 at 05:41:33PM +0100, Joerg Roedel wrote: +/** + * struct iommu_dm_region - descriptor for a direct mapped memory region + * @list: Linked list pointers + * @start: System physical start address of the region + * @length: Length of the region in bytes + * @prot: IOMMU Protection flags (READ/WRITE/...) + */ +struct iommu_dm_region { + struct list_headlist; + phys_addr_t start; + size_t length; + int prot; +}; I'm slightly puzzled about this. It looks to me like we're asking the IOMMU driver to construct a description of the system's physical address space, but this information tends to be known elsewhere for things like initialising lowmem on the CPU using memblock. Well, this is not about the general memory layout of the machine, it is more about the requirements of the firmware. The firmware might have their own mapping requirements, for example a USB controler that is handled by the BIOS. Other devices (be2net adapters with special firmware) might have such requirements too. On x86 these requirements are described in the IOMMU ACPI tables (RMRR entries on Intel, Unity mappings on AMD). Also, it looks like we just use these regions to create the default domain using iommu_map calls -- why don't we just have an IOMMU callback to initialise the default domain instead? That would allow IOMMUs with a per-master bypass mode to avoid allocating page tables altogether. In theory yes, but this information is not only needed for the creation of default domains, but also for a generic DMA-API implementation for IOMMU drivers. A DMA-API implementation has to mark these address ranges as reserved in its address allocator, so it is better to export this information than doing the handling in the iommu drivers. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/22 v2] Introduce default domains for iommu groups
Hi Will, Thanks for having a look! On Fri, Jun 05, 2015 at 03:22:06PM +0100, Will Deacon wrote: Most of this looks fine to me, modulo by comments about the dm regions (which I'm not sure how to implement for ARM). When there are no direct mapping requirements from the firmware on ARM, you can just return an empty list in these call-backs. A major change is that now the default domain has to be allocated by the code that allocates the iommu group. For PCI devices this happens in the IOMMU core, but drivers allocating the group on their own can now implement a policy that fits their needs (e.g. not allocate one domain per group but let multiple groups share one domain). Makes sense. I really think we should be moving group allocation out of the IOMMU drivers and into the bus code, like we already have for PCI. Once we've got a way to describe groups of platform devices (e.g. in the device-tree), then we can have the group creation happen automatically as part of Laurent's of_iommu work. Yes, that makes sense. And PCI is pretty hardcoded into the iommu-groups implementation right now. This needs probably be more generic too. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 5/7] dma-mapping: detect and configure IOMMU in of_dma_configure
On Mon, Oct 27, 2014 at 04:02:16PM +, Will Deacon wrote: On Mon, Oct 27, 2014 at 11:30:33AM +, Laurent Pinchart wrote: I'm not sure to follow you here. Aren't we already exposing masters that master through multiple IOMMUs as single instances of struct device ? Hmm, yes, now you've confused me too! The conclusion was certainly that dma-mapping should not be the one dealing with the I/O topology. Domain allocation would then be an iommu callback (something like -get_default_domain), but the rest of the details weren't fleshed out. The idea is that the IOMMU core code will allocate a default domain for each iommu-group at initialization time. This domain can be requested later by a new iommu-api function and used for DMA-API mappings. A device still can be assigned to another domain by driver code (like VFIO). But if the device is later de-assigned the IOMMU core-code automatically puts it back into the default domain. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] iommu: replace IOMMU_EXEC with IOMMU_EXEC and update ARM SMMU driver
On Mon, Oct 20, 2014 at 07:42:01PM +0100, Will Deacon wrote: On Mon, Oct 20, 2014 at 04:39:15PM +0100, Will Deacon wrote: On Mon, Oct 13, 2014 at 02:06:15PM +0100, Antonios Motakis wrote: This patch series applies to Joerg Roedel's iommu/next branch, commit 09b5269a. It replaces the IOMMU_EXEC flag used by the ARM SMMU driver to IOMMU_NOEXEC. This is more enforceable, since the lack of the flag on hardware that doesn't support it implies that the target memory will be executable. Looks good to me; I'll take this via the arm-smmu tree and send it to Joerg along with anything else that gets queued for 3.19. The 0-day builder spotted a new warning from this series: drivers/iommu/amd_iommu.c: In function 'amd_iommu_capable': drivers/iommu/amd_iommu.c:3409:2: warning: enumeration value 'IOMMU_CAP_NOEXEC' not handled in switch [-Wswitch] switch (cap) { ^ I fixed it with the patch below, but I'd appreciate you and Joerg taking a look too. Cheers, Will ---8 diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 505a9adac2d5..3d78a8fb5a6a 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3411,6 +3411,8 @@ static bool amd_iommu_capable(enum iommu_cap cap) return true; case IOMMU_CAP_INTR_REMAP: return (irq_remapping_enabled == 1); + case IOMMU_CAP_NOEXEC: + return false; } Looks good to me. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 4/7] iommu: provide helper function to configure an IOMMU for an of master
On Tue, Sep 02, 2014 at 04:01:32PM +0200, Arnd Bergmann wrote: This is an artifact of the API being single-instance at the moment. We might not in fact need it, I was just trying to think of things that naturally fit in there and that are probably already linked together in the individual iommu drivers. I am not sure what you mean by single-instance. Is it that currently the API only supports one type of iommu_ops per bus? That should be fine as long as there is only one type of IOMMU on the bus. Besides that, it is a feature of the IOMMU-API to hide the details about all the hardware IOMMUs in the system from its users. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu