Re: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly

2022-05-10 Thread Andrey Grodzovsky



On 2022-05-09 14:03, Deucher, Alexander wrote:

[Public]


-Original Message-
From: Bjorn Helgaas 
Sent: Monday, May 9, 2022 12:23 PM
To: Linux PCI 
Cc: r087...@yahoo.it; Deucher, Alexander
; Koenig, Christian
; Pan, Xinhui ; amd-gfx
mailing list ; dri-devel 
Subject: Re: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly

On Sun, May 8, 2022 at 3:29 PM  wrote:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz


illa.kernel.org%2Fshow_bug.cgi%3Fid%3D215958data=05%7C01%7Cal
exan
der.deucher%40amd.com%7C8bb8567427844b05e5f808da31d8435f%7C3d
d8961fe48
84e608e11a82d994e183d%7C0%7C0%7C637877102168668221%7CUnkno
wn%7CTWFpbGZ
sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
0%3
D%7C3000%7C%7C%7Csdata=PpcDBIpUW8vCX%2F4kM6Q8RjdgS1qw2
uuWoWZXis4M

dDQ%3Dreserved=0

 Bug ID: 215958
Summary: thunderbolt3 egpu cannot disconnect cleanly
Product: Drivers
Version: 2.5
 Kernel Version: 5.17.0-1003-oem #3-Ubuntu SMP PREEMPT
   Hardware: All
 OS: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: PCI
   Assignee: drivers_...@kernel-bugs.osdl.org
   Reporter: r087...@yahoo.it
 Regression: No

I assume this is not a regression, right?  If it is a regression, what previous
kernel worked correctly?


I have an external egpu (Radeon 6600 RX) connected through
thunderbolt3 to my Thinkpad X1 carbon 6th Gen.. When I disconnect the
thunderbolt3 cable I get the following error in dmesg:

[21874.194994] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.195006] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.195123] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.195129] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.195271] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.195276] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.195406] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.195411] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.195544] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:51 param:0x message:GetPptLimit?
[21874.195550] amdgpu :0c:00.0: amdgpu:
[smu_v11_0_get_current_power_limit]
get PPT limit failed!
[21874.195582] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.195587] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.227454] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.227463] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.227532] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.227536] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.227618] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.227621] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.227700] amdgpu :0c:00.0: amdgpu: SMU:

response:0x

for
index:18 param:0x0005 message:TransferTableSmu2Dram?
[21874.227703] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.227784] amdgpu :0c:00.0: amdgpu:
[smu_v11_0_get_current_power_limit]
get PPT limit failed!
[21874.227804] amdgpu :0c:00.0: amdgpu: Failed to export SMU

metrics table!

[21874.514661] snd_hda_codec_hdmi hdaudioC1D0: Unable to sync

register

0x2f0d00. -5 [21874.568360] amdgpu :0c:00.0: amdgpu: Failed to
switch to AC mode!
[21874.599292] amdgpu :0c:00.0: amdgpu: Failed to switch to AC

mode!

[21874.718562] amdgpu :0c:00.0: amdgpu: amdgpu: finishing device.
[21878.722376] amdgpu: cp queue pipe 4 queue 0 preemption failed
[21878.722422] amdgpu :0c:00.0: amdgpu: Failed to disable gfxoff!
[21879.134918] amdgpu :0c:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110) [21879.135144]
[drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[21879.338158] amdgpu :0c:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110) [21879.338402]
[drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[21879.543318] [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR*
failed to halt cp gfx [21879.544216] __smu_cmn_reg_print_error: 5
callbacks suppressed [21879.544220] amdgpu :0c:00.0: amdgpu:

SMU

Re: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly

2022-05-10 Thread Bjorn Helgaas
On Sun, May 8, 2022 at 3:29 PM  wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=215958
>
> Bug ID: 215958
>Summary: thunderbolt3 egpu cannot disconnect cleanly
>Product: Drivers
>Version: 2.5
> Kernel Version: 5.17.0-1003-oem #3-Ubuntu SMP PREEMPT
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: PCI
>   Assignee: drivers_...@kernel-bugs.osdl.org
>   Reporter: r087...@yahoo.it
> Regression: No

I assume this is not a regression, right?  If it is a regression, what
previous kernel worked correctly?

> I have an external egpu (Radeon 6600 RX) connected through thunderbolt3 to my
> Thinkpad X1 carbon 6th Gen.. When I disconnect the thunderbolt3 cable I get 
> the
> following error in dmesg:
>
> [21874.194994] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.195006] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.195123] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.195129] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.195271] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.195276] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.195406] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.195411] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.195544] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:51 param:0x message:GetPptLimit?
> [21874.195550] amdgpu :0c:00.0: amdgpu: 
> [smu_v11_0_get_current_power_limit]
> get PPT limit failed!
> [21874.195582] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.195587] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.227454] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.227463] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.227532] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.227536] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.227618] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.227621] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.227700] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:18 param:0x0005 message:TransferTableSmu2Dram?
> [21874.227703] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.227784] amdgpu :0c:00.0: amdgpu: 
> [smu_v11_0_get_current_power_limit]
> get PPT limit failed!
> [21874.227804] amdgpu :0c:00.0: amdgpu: Failed to export SMU metrics 
> table!
> [21874.514661] snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register
> 0x2f0d00. -5
> [21874.568360] amdgpu :0c:00.0: amdgpu: Failed to switch to AC mode!
> [21874.599292] amdgpu :0c:00.0: amdgpu: Failed to switch to AC mode!
> [21874.718562] amdgpu :0c:00.0: amdgpu: amdgpu: finishing device.
> [21878.722376] amdgpu: cp queue pipe 4 queue 0 preemption failed
> [21878.722422] amdgpu :0c:00.0: amdgpu: Failed to disable gfxoff!
> [21879.134918] amdgpu :0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
> *ERROR* ring kiq_2.1.0 test failed (-110)
> [21879.135144] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
> [21879.338158] amdgpu :0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
> *ERROR* ring kiq_2.1.0 test failed (-110)
> [21879.338402] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
> [21879.543318] [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to
> halt cp gfx
> [21879.544216] __smu_cmn_reg_print_error: 5 callbacks suppressed
> [21879.544220] amdgpu :0c:00.0: amdgpu: SMU: response:0x for
> index:7 param:0x message:DisableAllSmuFeatures?
> [21879.544226] amdgpu :0c:00.0: amdgpu: Failed to disable smu features.
> [21879.544230] amdgpu :0c:00.0: amdgpu: Fail to disable dpm features!
> [21879.544238] [drm] free PSP TMR buffer

The above looks like what amdgpu would see when the GPU is no longer
accessible (writes are dropped and reads return 0x).  It's
possible amdgpu could notice this and shut down more gracefully, but I
don't think it's the main problem here and it probably wouldn't force
you to reboot.

> [21880.455935] i915 :00:02.0: vgaarb: 

RE: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly

2022-05-09 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Bjorn Helgaas 
> Sent: Monday, May 9, 2022 12:23 PM
> To: Linux PCI 
> Cc: r087...@yahoo.it; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui ; amd-gfx
> mailing list ; dri-devel  de...@lists.freedesktop.org>
> Subject: Re: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly
> 
> On Sun, May 8, 2022 at 3:29 PM  wrote:
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D215958data=05%7C01%7Cal
> exan
> >
> der.deucher%40amd.com%7C8bb8567427844b05e5f808da31d8435f%7C3d
> d8961fe48
> >
> 84e608e11a82d994e183d%7C0%7C0%7C637877102168668221%7CUnkno
> wn%7CTWFpbGZ
> >
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> 0%3
> >
> D%7C3000%7C%7C%7Csdata=PpcDBIpUW8vCX%2F4kM6Q8RjdgS1qw2
> uuWoWZXis4M
> > dDQ%3Dreserved=0
> >
> > Bug ID: 215958
> >Summary: thunderbolt3 egpu cannot disconnect cleanly
> >Product: Drivers
> >Version: 2.5
> > Kernel Version: 5.17.0-1003-oem #3-Ubuntu SMP PREEMPT
> >   Hardware: All
> > OS: Linux
> >   Tree: Mainline
> > Status: NEW
> >   Severity: normal
> >   Priority: P1
> >  Component: PCI
> >   Assignee: drivers_...@kernel-bugs.osdl.org
> >   Reporter: r087...@yahoo.it
> > Regression: No
> 
> I assume this is not a regression, right?  If it is a regression, what 
> previous
> kernel worked correctly?
> 
> > I have an external egpu (Radeon 6600 RX) connected through
> > thunderbolt3 to my Thinkpad X1 carbon 6th Gen.. When I disconnect the
> > thunderbolt3 cable I get the following error in dmesg:
> >
> > [21874.194994] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.195006] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.195123] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.195129] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.195271] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.195276] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.195406] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.195411] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.195544] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:51 param:0x message:GetPptLimit?
> > [21874.195550] amdgpu :0c:00.0: amdgpu:
> > [smu_v11_0_get_current_power_limit]
> > get PPT limit failed!
> > [21874.195582] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.195587] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.227454] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.227463] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.227532] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.227536] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.227618] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.227621] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.227700] amdgpu :0c:00.0: amdgpu: SMU:
> response:0x
> > for
> > index:18 param:0x0005 message:TransferTableSmu2Dram?
> > [21874.227703] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.227784] amdgpu :0c:00.0: amdgpu:
> > [smu_v11_0_get_current_power_limit]
> > get PPT limit failed!
> > [21874.227804] amdgpu :0c:00.0: amdgpu: Failed to export SMU
> metrics table!
> > [21874.514661] snd_hda_codec_hdmi hdaudioC1D0: Unable to sync
> register
> > 0x2f0d00. -5 [21874.568360] amdg