RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2021-01-15 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Hi Joerg,

Thanks. Hope you are doing well now.

Edgar

-Original Message-
From: jroe...@suse.de  
Sent: Freitag, 15. Januar 2021 09:18
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Hi Edgar,

On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Just wanted to follow-up on that topic.
> Is that quirk already put into upstream kernel?

Sorry for the late reply, I had to take an extended sick leave. I will take 
care of sending this fix upstream next week.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2021-01-15 Thread jroe...@suse.de
Hi Edgar,

On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Just wanted to follow-up on that topic.
> Is that quirk already put into upstream kernel?

Sorry for the late reply, I had to take an extended sick leave. I will
take care of sending this fix upstream next week.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 02:28:27PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Alright, so is this going to make it into an upstream-Kernel?

Yes, but please test it first. It should apply on-top of a 5.9.3 kernel.
If it works I can send a patch and will Cc you as well as a few other
folks.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Alright, so is this going to make it into an upstream-Kernel?

-Original Message-
From: jroe...@suse.de  
Sent: Freitag, 6. November 2020 15:06
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Thank you. I do think that this is the GPU. Would you please elaborate 
> on what that quirk would be?

The GPU seems to have broken ATS, or require driver setup to make ATS work. 
Anyhow, ATS is unstable for Linux to use, so it must not be enabled.

This diff to the kernel should do that:

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
f70692ac79c5..3911b0ec57ba 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, 
quirk_amd_harvest_no_ats);  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, 
quirk_amd_harvest_no_ats);
 /* AMD Navi14 dGPU */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
+/* AMD Raven platform iGPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, 
+quirk_amd_harvest_no_ats);
 #endif /* CONFIG_PCI_ATS */
 
 /* Freescale PCIe doesn't support MSI in RC mode */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 01:03:22PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> Thank you. I do think that this is the GPU. Would you please elaborate
> on what that quirk would be?

The GPU seems to have broken ATS, or require driver setup to make ATS
work. Anyhow, ATS is unstable for Linux to use, so it must not be
enabled.

This diff to the kernel should do that:

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f70692ac79c5..3911b0ec57ba 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, 
quirk_amd_harvest_no_ats);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats);
 /* AMD Navi14 dGPU */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats);
+/* AMD Raven platform iGPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats);
 #endif /* CONFIG_PCI_ATS */
 
 /* Freescale PCIe doesn't support MSI in RC mode */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
> Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please 
> send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)?

Thank you. I do think that this is the GPU. Would you please elaborate on what 
that quirk would be?

-Original Message-
From: jroe...@suse.de  
Sent: Freitag, 6. November 2020 13:19
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> With Kernel 5.9.3 kernel-parameter pci=noats the system is running for 
> 19hours now in reboot-test without the error to occur.

Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you please 
send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0 ais your GPU)?

Thanks,

Joerg
0b:00.0 0300: 1002:15d8 (rev cf)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-06 Thread jroe...@suse.de
On Fri, Nov 06, 2020 at 05:51:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> With Kernel 5.9.3 kernel-parameter pci=noats the system is running for
> 19hours now in reboot-test without the error to occur.

Thanks. So I guess the GPU needs a quirk to disable ATS on it. Can you
please send me the output of lspci -n -s "0b:00.0" (Given that 0b:00.0
ais your GPU)?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-05 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
With Kernel 5.9.3 kernel-parameter pci=noats the system is running for 19hours 
now in reboot-test without the error to occur.

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de  
Sent: Donnerstag, 5. November 2020 13:33
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> One remark:
> With kernel-parameter pci=noats in dmesg there is
> 
> [   10.128463] kfd kfd: Error initializing iommuv2

That is expected. IOMMUv2 depends on ATS support.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-05 Thread jroe...@suse.de
On Thu, Nov 05, 2020 at 11:58:30AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> One remark:
> With kernel-parameter pci=noats in dmesg there is
> 
> [   10.128463] kfd kfd: Error initializing iommuv2

That is expected. IOMMUv2 depends on ATS support.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-05 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Joerg,

One remark:
With kernel-parameter pci=noats in dmesg there is

[   10.128463] kfd kfd: Error initializing iommuv2

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Sent: Donnerstag, 5. November 2020 12:16
To: 'jroe...@suse.de' 
Cc: 'iommu@lists.linux-foundation.org' 
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Joerg,

I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed 
again with amdgpu-problems.

> please try booting with "pci=noats" on the kernel command line.
This I will do next.

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS]
Sent: Mittwoch, 4. November 2020 15:36
To: 'jroe...@suse.de' 
Cc: 'iommu@lists.linux-foundation.org' 
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Joerg,

One remark: 
> However I found out that with Kernel 5.9.3 the amdgpu kernel module is 
> not loaded/installed
That is likely my fault because I was compiling that linux kernel on a faster 
machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just 
now on the target machine to avoid any problems.

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS]
Sent: Mittwoch, 4. November 2020 15:19
To: jroe...@suse.de
Cc: iommu@lists.linux-foundation.org
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

> Yes, but it could be the same underlying reason.
There is no PCI setup issue that we are aware of.

> For a first try, use 5.9.3. If it reproduces there, please try booting with 
> "pci=noats" on the kernel command line.
Did compile the kernel 5.9.3 and started a reboot test to see if it is going to 
fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module 
is not loaded/installed. So this way I don´t see it makes sense for further 
investigation. I might did something wrong when compiling the linux kernel 
5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration 
of the kernel 5.9.3. However I do not know why it did not install amdgpu.

> Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
> where this happens.
For comparison I attached the logs when using 5.4.0-47 and 5.9.3. 

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de 
Sent: Mittwoch, 4. November 2020 11:15
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting with 
"pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de 
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too.
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg


dmesg_pci_noats.log
Description: dmesg_pci_noats.log
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-05 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Joerg,

I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed 
again with amdgpu-problems.

> please try booting with "pci=noats" on the kernel command line.
This I will do next.

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Sent: Mittwoch, 4. November 2020 15:36
To: 'jroe...@suse.de' 
Cc: 'iommu@lists.linux-foundation.org' 
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Joerg,

One remark: 
> However I found out that with Kernel 5.9.3 the amdgpu kernel module is 
> not loaded/installed
That is likely my fault because I was compiling that linux kernel on a faster 
machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just 
now on the target machine to avoid any problems.

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS]
Sent: Mittwoch, 4. November 2020 15:19
To: jroe...@suse.de
Cc: iommu@lists.linux-foundation.org
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

> Yes, but it could be the same underlying reason.
There is no PCI setup issue that we are aware of.

> For a first try, use 5.9.3. If it reproduces there, please try booting with 
> "pci=noats" on the kernel command line.
Did compile the kernel 5.9.3 and started a reboot test to see if it is going to 
fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module 
is not loaded/installed. So this way I don´t see it makes sense for further 
investigation. I might did something wrong when compiling the linux kernel 
5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration 
of the kernel 5.9.3. However I do not know why it did not install amdgpu.

> Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
> where this happens.
For comparison I attached the logs when using 5.4.0-47 and 5.9.3. 

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de 
Sent: Mittwoch, 4. November 2020 11:15
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting with 
"pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de 
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too.
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg
<>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Joerg,

One remark: 
> However I found out that with Kernel 5.9.3 the amdgpu kernel module is not 
> loaded/installed
That is likely my fault because I was compiling that linux kernel on a faster 
machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just 
now on the target machine to avoid any problems.

Best regards,
Edgar

-Original Message-
From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Sent: Mittwoch, 4. November 2020 15:19
To: jroe...@suse.de
Cc: iommu@lists.linux-foundation.org
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

> Yes, but it could be the same underlying reason.
There is no PCI setup issue that we are aware of.

> For a first try, use 5.9.3. If it reproduces there, please try booting with 
> "pci=noats" on the kernel command line.
Did compile the kernel 5.9.3 and started a reboot test to see if it is going to 
fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module 
is not loaded/installed. So this way I don´t see it makes sense for further 
investigation. I might did something wrong when compiling the linux kernel 
5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration 
of the kernel 5.9.3. However I do not know why it did not install amdgpu.

> Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
> where this happens.
For comparison I attached the logs when using 5.4.0-47 and 5.9.3. 

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de 
Sent: Mittwoch, 4. November 2020 11:15
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting with 
"pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de 
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too.
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
> Yes, but it could be the same underlying reason.
There is no PCI setup issue that we are aware of.

> For a first try, use 5.9.3. If it reproduces there, please try booting with 
> "pci=noats" on the kernel command line.
Did compile the kernel 5.9.3 and started a reboot test to see if it is going to 
fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module 
is not loaded/installed. So this way I don´t see it makes sense for further 
investigation. I might did something wrong when compiling the linux kernel 
5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration 
of the kernel 5.9.3. However I do not know why it did not install amdgpu.

> Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
> where this happens.
For comparison I attached the logs when using 5.4.0-47 and 5.9.3. 

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de  
Sent: Mittwoch, 4. November 2020 11:15
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting with 
"pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de 
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too.
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg


Linux-logs.tar.gz
Description: Linux-logs.tar.gz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread jroe...@suse.de
On Wed, Nov 04, 2020 at 09:21:35AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting
with "pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the
machine where this happens.

Regards,

Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: jroe...@suse.de  
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Cc: iommu@lists.linux-foundation.org
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too. 
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>   AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>   Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

2020-11-04 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Hi Jörg,

AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error is 
at [   52.772273], hence much earlier.

Have not tried to use an upstream kernel yet. Which one would you recommend?

As far as inconsistencies in the PCI-setup is concerned, the only thing that I 
know of right now is that we haven´t entered a PCI subsystem vendor and device 
ID yet. It is still "Advanced Micro Devices". We will change that soon to 
"General Electric" or "Emerson".

Best regards,
Edgar

-Original Message-
From: jroe...@suse.de  
Sent: Mittwoch, 4. November 2020 09:53
To: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Cc: iommu@lists.linux-foundation.org
Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Hi Edgar,

On Fri, Oct 30, 2020 at 02:26:23PM +, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> With one board we have a boot-problem that is reproducible at every ~50 boot.
> The system is accessible via ssh and works fine except for the 
> Graphics. The graphics is off. We don´t see a screen. Please see 
> attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> drm/amdgpu errors. It even tries to reset the GPU but that fails too. 
> I tried to reset amdgpu also by command “sudo cat 
> /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.

Can you reproduce the problem with an upstream kernel too?

These messages in dmesg indicate some problem in the platform setup:

AMD-Vi: Completion-Wait loop timed out

Might there be some inconsistencies in the PCI setup between the bridges and 
the endpoints or something?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu