[Kernel-packages] [Bug 1919508] Re: AMDGPU lockup on every computer sleep if monitor is already asleep

2021-11-07 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-signed-oem-5.10 (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-oem-5.10 in Ubuntu.
https://bugs.launchpad.net/bugs/1919508

Title:
  AMDGPU lockup on every computer sleep if monitor is already asleep

Status in linux-signed-oem-5.10 package in Ubuntu:
  Confirmed

Bug description:
  The system always locks up, requiring a reboot.

  Steps to reproduce:
  1. Configure power saving to turn off the monitor after a period of 
inactivity.
  2. Configure power saving to suspend the PC automatically after a certain 
delay which is longer than the above one.
  3. Wait.
  4. The system will lock up with no way of returning to an operational state.

  === Hardware tested ===
  GPU: Radeon RX 5600XT
  Monitor: Samsung LS24H850QFU (FreeSync enabled)
  Connection: DisplayPort

  === Previous kernels ===
  Same issue on 5.4 and 5.8 kernels (from Ubuntu packages).

  === Relevant crash info ===
  Mar 17 19:48:56 laptop kernel: [ 8692.935426] [drm] free PSP TMR buffer
  Mar 17 19:49:04 laptop kernel: [ 8700.925536] [drm] PCIE GART of 512M enabled 
(table at 0x0080).
  Mar 17 19:49:04 laptop kernel: [ 8700.925549] [drm] PSP is resuming...
  Mar 17 19:49:04 laptop kernel: [ 8701.107392] [drm] reserve 0xa0 from 
0x803f40 for PSP TMR
  Mar 17 19:49:04 laptop kernel: [ 8701.295353] amdgpu :0a:00.0: amdgpu: 
RAS: optional ras ta ucode is not available
  Mar 17 19:49:04 laptop kernel: [ 8701.319351] amdgpu :0a:00.0: amdgpu: 
RAP: optional rap ta ucode is not available
  Mar 17 19:49:04 laptop kernel: [ 8701.319353] amdgpu :0a:00.0: amdgpu: 
SMU is resuming...
  Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu :0a:00.0: amdgpu: 
smu driver if version = 0x0036, smu fw if version = 0x0035, smu fw 
version = 0x002a3200 (42.50.0)
  Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu :0a:00.0: amdgpu: 
SMU driver if version not matched
  Mar 17 19:49:07 laptop kernel: [ 8703.784372] amdgpu :0a:00.0: amdgpu: 
failed send message: EnableAllSmuFeatures (6)   param: 0x response 
0xffc2
  Mar 17 19:49:07 laptop kernel: [ 8703.784375] amdgpu :0a:00.0: amdgpu: 
Failed to enable requested dpm features!
  Mar 17 19:49:07 laptop kernel: [ 8703.784377] amdgpu :0a:00.0: amdgpu: 
Failed to setup smc hw!
  Mar 17 19:49:07 laptop kernel: [ 8703.784458] 
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block  
failed -62
  Mar 17 19:49:07 laptop kernel: [ 8703.784460] amdgpu :0a:00.0: amdgpu: 
amdgpu_device_ip_resume failed (-62).
  Mar 17 19:49:07 laptop kernel: [ 8703.803693] snd_hda_intel :0a:00.1: 
refused to change power state from D3hot to D0
  Mar 17 19:49:07 laptop kernel: [ 8703.907879] snd_hda_intel :0a:00.1: 
CORB reset timeout#2, CORBRP = 65535
  Mar 17 19:49:07 laptop kernel: [ 8703.929045] amdgpu: Move buffer fallback to 
memcpy unavailable
  Mar 17 19:49:07 laptop kernel: [ 8703.929137] [drm:amdgpu_cs_ioctl [amdgpu]] 
*ERROR* Failed to process the buffer list -19!
  Mar 17 19:49:09 laptop kernel: [ 8705.904322] amdgpu: Move buffer fallback to 
memcpy unavailable
  Mar 17 19:49:09 laptop kernel: [ 8705.904385] [drm:amdgpu_cs_ioctl [amdgpu]] 
*ERROR* Failed to process the buffer list -19!
  Mar 17 19:49:19 laptop kernel: [ 8715.651540] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=11267, emitted seq=11269
  Mar 17 19:49:19 laptop kernel: [ 8715.651633] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
  Mar 17 19:49:19 laptop kernel: [ 8715.651638] amdgpu :0a:00.0: amdgpu: 
GPU reset begin!
  Mar 17 19:49:19 laptop kernel: [ 8715.651667] [ cut here 
]
  Mar 17 19:49:19 laptop kernel: [ 8715.651668] kernel BUG at mm/slub.c:304!
  Mar 17 19:49:19 laptop kernel: [ 8715.651674] invalid opcode:  [#1] SMP 
NOPTI
  Mar 17 19:49:19 laptop kernel: [ 8715.651677] CPU: 11 PID: 8470 Comm: 
kworker/11:1 Tainted: PW  O  5.10.0-1016-oem #17-Ubuntu
  Mar 17 19:49:19 laptop kernel: [ 8715.651678] Hardware name: Gigabyte 
Technology Co., Ltd. B550I AORUS PRO AX/B550I AORUS PRO AX, BIOS F12 01/18/2021
  Mar 17 19:49:19 laptop kernel: [ 8715.651684] Workqueue: events 
drm_sched_job_timedout [gpu_sched]
  Mar 17 19:49:19 laptop kernel: [ 8715.651689] RIP: 
0010:__slab_free+0x1c9/0x380
  Mar 17 19:49:19 laptop kernel: [ 8715.651691] Code: 41 5e 41 5f 5d c3 41 f7 
46 08 00 0d 21 00 0f 85 f0 fe ff ff 4d 85 ed 0f 85 e7 fe ff ff 80 4c 24 5b 80 
45 31 c0 e9 2a ff ff ff <0f> 0b 49 3b 5c 24 28 75 97 4c 89 c0 41 89 f0 44 89 fe 
49 89 4c 24
  Mar 17 19:49:19 laptop kernel: [ 8715.651693] RSP: 0018:bb0cc06ffbc0 
EFLAGS: 00010246
  Mar 17 19:49:19 laptop kernel: [ 8715.651695] RAX: 96a256699d00 RBX: 
8020001f RCX: 

[Kernel-packages] [Bug 1919508] Re: AMDGPU lockup on every computer sleep if monitor is already asleep

2021-11-07 Thread Michael Gratton
Getting this with 5.13 on 21.10 with a Radeon RX 6800 XT with a AMD
Ryzen 5 3600.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-oem-5.10 in Ubuntu.
https://bugs.launchpad.net/bugs/1919508

Title:
  AMDGPU lockup on every computer sleep if monitor is already asleep

Status in linux-signed-oem-5.10 package in Ubuntu:
  Confirmed

Bug description:
  The system always locks up, requiring a reboot.

  Steps to reproduce:
  1. Configure power saving to turn off the monitor after a period of 
inactivity.
  2. Configure power saving to suspend the PC automatically after a certain 
delay which is longer than the above one.
  3. Wait.
  4. The system will lock up with no way of returning to an operational state.

  === Hardware tested ===
  GPU: Radeon RX 5600XT
  Monitor: Samsung LS24H850QFU (FreeSync enabled)
  Connection: DisplayPort

  === Previous kernels ===
  Same issue on 5.4 and 5.8 kernels (from Ubuntu packages).

  === Relevant crash info ===
  Mar 17 19:48:56 laptop kernel: [ 8692.935426] [drm] free PSP TMR buffer
  Mar 17 19:49:04 laptop kernel: [ 8700.925536] [drm] PCIE GART of 512M enabled 
(table at 0x0080).
  Mar 17 19:49:04 laptop kernel: [ 8700.925549] [drm] PSP is resuming...
  Mar 17 19:49:04 laptop kernel: [ 8701.107392] [drm] reserve 0xa0 from 
0x803f40 for PSP TMR
  Mar 17 19:49:04 laptop kernel: [ 8701.295353] amdgpu :0a:00.0: amdgpu: 
RAS: optional ras ta ucode is not available
  Mar 17 19:49:04 laptop kernel: [ 8701.319351] amdgpu :0a:00.0: amdgpu: 
RAP: optional rap ta ucode is not available
  Mar 17 19:49:04 laptop kernel: [ 8701.319353] amdgpu :0a:00.0: amdgpu: 
SMU is resuming...
  Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu :0a:00.0: amdgpu: 
smu driver if version = 0x0036, smu fw if version = 0x0035, smu fw 
version = 0x002a3200 (42.50.0)
  Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu :0a:00.0: amdgpu: 
SMU driver if version not matched
  Mar 17 19:49:07 laptop kernel: [ 8703.784372] amdgpu :0a:00.0: amdgpu: 
failed send message: EnableAllSmuFeatures (6)   param: 0x response 
0xffc2
  Mar 17 19:49:07 laptop kernel: [ 8703.784375] amdgpu :0a:00.0: amdgpu: 
Failed to enable requested dpm features!
  Mar 17 19:49:07 laptop kernel: [ 8703.784377] amdgpu :0a:00.0: amdgpu: 
Failed to setup smc hw!
  Mar 17 19:49:07 laptop kernel: [ 8703.784458] 
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block  
failed -62
  Mar 17 19:49:07 laptop kernel: [ 8703.784460] amdgpu :0a:00.0: amdgpu: 
amdgpu_device_ip_resume failed (-62).
  Mar 17 19:49:07 laptop kernel: [ 8703.803693] snd_hda_intel :0a:00.1: 
refused to change power state from D3hot to D0
  Mar 17 19:49:07 laptop kernel: [ 8703.907879] snd_hda_intel :0a:00.1: 
CORB reset timeout#2, CORBRP = 65535
  Mar 17 19:49:07 laptop kernel: [ 8703.929045] amdgpu: Move buffer fallback to 
memcpy unavailable
  Mar 17 19:49:07 laptop kernel: [ 8703.929137] [drm:amdgpu_cs_ioctl [amdgpu]] 
*ERROR* Failed to process the buffer list -19!
  Mar 17 19:49:09 laptop kernel: [ 8705.904322] amdgpu: Move buffer fallback to 
memcpy unavailable
  Mar 17 19:49:09 laptop kernel: [ 8705.904385] [drm:amdgpu_cs_ioctl [amdgpu]] 
*ERROR* Failed to process the buffer list -19!
  Mar 17 19:49:19 laptop kernel: [ 8715.651540] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=11267, emitted seq=11269
  Mar 17 19:49:19 laptop kernel: [ 8715.651633] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
  Mar 17 19:49:19 laptop kernel: [ 8715.651638] amdgpu :0a:00.0: amdgpu: 
GPU reset begin!
  Mar 17 19:49:19 laptop kernel: [ 8715.651667] [ cut here 
]
  Mar 17 19:49:19 laptop kernel: [ 8715.651668] kernel BUG at mm/slub.c:304!
  Mar 17 19:49:19 laptop kernel: [ 8715.651674] invalid opcode:  [#1] SMP 
NOPTI
  Mar 17 19:49:19 laptop kernel: [ 8715.651677] CPU: 11 PID: 8470 Comm: 
kworker/11:1 Tainted: PW  O  5.10.0-1016-oem #17-Ubuntu
  Mar 17 19:49:19 laptop kernel: [ 8715.651678] Hardware name: Gigabyte 
Technology Co., Ltd. B550I AORUS PRO AX/B550I AORUS PRO AX, BIOS F12 01/18/2021
  Mar 17 19:49:19 laptop kernel: [ 8715.651684] Workqueue: events 
drm_sched_job_timedout [gpu_sched]
  Mar 17 19:49:19 laptop kernel: [ 8715.651689] RIP: 
0010:__slab_free+0x1c9/0x380
  Mar 17 19:49:19 laptop kernel: [ 8715.651691] Code: 41 5e 41 5f 5d c3 41 f7 
46 08 00 0d 21 00 0f 85 f0 fe ff ff 4d 85 ed 0f 85 e7 fe ff ff 80 4c 24 5b 80 
45 31 c0 e9 2a ff ff ff <0f> 0b 49 3b 5c 24 28 75 97 4c 89 c0 41 89 f0 44 89 fe 
49 89 4c 24
  Mar 17 19:49:19 laptop kernel: [ 8715.651693] RSP: 0018:bb0cc06ffbc0 
EFLAGS: 00010246
  Mar 17 19:49:19 laptop kernel: [ 8715.651695] RAX: 96a256699d00 RBX: 
8020001f RCX: 96a256699c00
  Mar 17 19:49:19 laptop kernel: [ 8715.651697] RDX: