RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Joerg, Thanks. Hope you are doing well now. Edgar -Original Message- From: jroe...@suse.de Sent: Freitag, 15. Januar 2021 09:18 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Hi Edgar, On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Just wanted to follow-up on that topic. > Is that quirk already put into upstream kernel? Sorry for the late reply, I had to take an extended sick leave. I will take care of sending this fix upstream next week. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Edgar, On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Just wanted to follow-up on that topic. > Is that quirk already put into upstream kernel? Sorry for the late reply, I had to take an extended sick leave. I will take care of sending this fix upstream next week. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 02:28:27PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Alright, so is this going to make it into an upstream-Kernel? Yes, but please test it first. It should apply on-top of a 5.9.3 kernel. If it works I can send a patch and will Cc you as well as a few other folks. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Alright, so is this going to make it into an upstream-Kernel? -Original Message- From: jroe...@suse.de Sent: Freitag, 6. November 2020 15:06 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Thank you. I do think that this is the GPU. Would you please elaborate > on what that quirk would be? The GPU seems to have broken ATS, or require driver setup to make ATS work. Anyhow, ATS is unstable for Linux to use, so it must not be enabled. This diff to the kernel should do that: diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index f70692ac79c5..3911b0ec57ba 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); +/* AMD Raven platform iGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, +quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */ /* Freescale PCIe doesn't support MSI in RC mode */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Thank you. I do think that this is the GPU. Would you please elaborate > on what that quirk would be? The GPU seems to have broken ATS, or require driver setup to make ATS work. Anyhow, ATS is unstable for Linux to use, so it must not be enabled. This diff to the kernel should do that: diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index f70692ac79c5..3911b0ec57ba 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); +/* AMD Raven platform iGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */ /* Freescale PCIe doesn't support MSI in RC mode */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please > send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)? Thank you. I do think that this is the GPU. Would you please elaborate on what that quirk would be? -Original Message- From: jroe...@suse.de Sent: Freitag, 6. November 2020 13:19 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > With Kernel 5.9.3 kernel-parameter pci=noats the system is running for > 19hours now in reboot-test without the error to occur. Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)? Thanks, Joerg 0b:00.0 0300: 1002:15d8 (rev cf) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > With Kernel 5.9.3 kernel-parameter pci=noats the system is running for > 19hours now in reboot-test without the error to occur. Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
With Kernel 5.9.3 kernel-parameter pci=noats the system is running for 19hours now in reboot-test without the error to occur. Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Donnerstag, 5. November 2020 13:33 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > One remark: > With kernel-parameter pci=noats in dmesg there is > > [ 10.128463] kfd kfd: Error initializing iommuv2 That is expected. IOMMUv2 depends on ATS support. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > One remark: > With kernel-parameter pci=noats in dmesg there is > > [ 10.128463] kfd kfd: Error initializing iommuv2 That is expected. IOMMUv2 depends on ATS support. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Joerg, One remark: With kernel-parameter pci=noats in dmesg there is [ 10.128463] kfd kfd: Error initializing iommuv2 Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Donnerstag, 5. November 2020 12:16 To: 'jroe...@suse.de' Cc: 'iommu@lists.linux-foundation.org' Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Joerg, I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed again with amdgpu-problems. > please try booting with "pci=noats" on the kernel command line. This I will do next. Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:36 To: 'jroe...@suse.de' Cc: 'iommu@lists.linux-foundation.org' Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Joerg, One remark: > However I found out that with Kernel 5.9.3 the amdgpu kernel module is > not loaded/installed That is likely my fault because I was compiling that linux kernel on a faster machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just now on the target machine to avoid any problems. Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:19 To: jroe...@suse.de Cc: iommu@lists.linux-foundation.org Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with > "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine > where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg dmesg_pci_noats.log Description: dmesg_pci_noats.log ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Joerg, I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed again with amdgpu-problems. > please try booting with "pci=noats" on the kernel command line. This I will do next. Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:36 To: 'jroe...@suse.de' Cc: 'iommu@lists.linux-foundation.org' Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Joerg, One remark: > However I found out that with Kernel 5.9.3 the amdgpu kernel module is > not loaded/installed That is likely my fault because I was compiling that linux kernel on a faster machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just now on the target machine to avoid any problems. Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:19 To: jroe...@suse.de Cc: iommu@lists.linux-foundation.org Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with > "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine > where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg <> ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Joerg, One remark: > However I found out that with Kernel 5.9.3 the amdgpu kernel module is not > loaded/installed That is likely my fault because I was compiling that linux kernel on a faster machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just now on the target machine to avoid any problems. Best regards, Edgar -Original Message- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:19 To: jroe...@suse.de Cc: iommu@lists.linux-foundation.org Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with > "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine > where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with > "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine > where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg Linux-logs.tar.gz Description: Linux-logs.tar.gz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -Original Message- > From: jroe...@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Jörg, AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error is at [ 52.772273], hence much earlier. Have not tried to use an upstream kernel yet. Which one would you recommend? As far as inconsistencies in the PCI-setup is concerned, the only thing that I know of right now is that we haven´t entered a PCI subsystem vendor and device ID yet. It is still "Advanced Micro Devices". We will change that soon to "General Electric" or "Emerson". Best regards, Edgar -Original Message- From: jroe...@suse.de Sent: Mittwoch, 4. November 2020 09:53 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Hi Edgar, On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > With one board we have a boot-problem that is reproducible at every ~50 boot. > The system is accessible via ssh and works fine except for the > Graphics. The graphics is off. We don´t see a screen. Please see > attached “dmesg.log”. From [52.772273] onwards the kernel reports > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > I tried to reset amdgpu also by command “sudo cat > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. Can you reproduce the problem with an upstream kernel too? These messages in dmesg indicate some problem in the platform setup: AMD-Vi: Completion-Wait loop timed out Might there be some inconsistencies in the PCI setup between the bridges and the endpoints or something? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu