Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-23 Thread Harvey

Alex,

thanks for the hint, but...

Is this patch intended for kernel 5.11.8?

I applied the patch against 5.11.8 and it is freezing again:


Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail 
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail 
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] 
*ERROR* ring sdma0 timeout, signaled seq=615, emitted seq=617
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] 
*ERROR* Process information: process  pid 0 thread  pid 0

Mär 23 16:18:51 obelix kernel: amdgpu :03:00.0: amdgpu: GPU reset begin!
Mär 23 16:18:51 obelix kernel: BUG: kernel NULL pointer dereference, 
address: 0029

Mär 23 16:18:51 obelix kernel: #PF: supervisor read access in kernel mode
Mär 23 16:18:51 obelix kernel: #PF: error_code(0x) - not-present page
Mär 23 16:18:51 obelix kernel: PGD 0 P4D 0
Mär 23 16:18:51 obelix kernel: Oops:  [#1] PREEMPT SMP NOPTI
Mär 23 16:18:51 obelix kernel: CPU: 12 PID: 178 Comm: kworker/12:1 Not 
tainted 5.11.8-arch1-1-custom #1
Mär 23 16:18:51 obelix kernel: Hardware name: Micro-Star International 
Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.117 10/29/2020
Mär 23 16:18:51 obelix kernel: Workqueue: events drm_sched_job_timedout 
[gpu_sched]
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 
48 8b 7f 08 4c

Mär 23 16:18:51 obelix kernel: RSP: 0018:a35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0001 RBX: 
8b044c5ee000 RCX: 0080005b
Mär 23 16:18:51 obelix kernel: RDX: 0080005c RSI: 
0001 RDI: 8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: 8b044a877bc0 R08: 
0001 R09: 
Mär 23 16:18:51 obelix kernel: R10:  R11: 
afccba00 R12: 8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: 8b044bf6 R14: 
8b04414a1000 R15: 8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS:  () 
GS:8b075f90() knlGS:
Mär 23 16:18:51 obelix kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Mär 23 16:18:51 obelix kernel: CR2: 0029 CR3: 
0001ab01 CR4: 00350ee0

Mär 23 16:18:51 obelix kernel: Call Trace:
Mär 23 16:18:51 obelix kernel:  stop_cpsch+0xa0/0xc0 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Mär 23 16:18:51 obelix kernel: 
amdgpu_device_gpu_recover.cold+0x36e/0x95d [amdgpu]

Mär 23 16:18:51 obelix kernel:  amdgpu_job_timedout+0x121/0x140 [amdgpu]
Mär 23 16:18:51 obelix kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Mär 23 16:18:51 obelix kernel:  process_one_work+0x214/0x3e0
Mär 23 16:18:51 obelix kernel:  worker_thread+0x4d/0x3d0
Mär 23 16:18:51 obelix kernel:  ? rescuer_thread+0x3c0/0x3c0
Mär 23 16:18:51 obelix kernel:  kthread+0x133/0x150
Mär 23 16:18:51 obelix kernel:  ? __kthread_bind_mask+0x60/0x60
Mär 23 16:18:51 obelix kernel:  ret_from_fork+0x22/0x30
Mär 23 16:18:51 obelix kernel: Modules linked in: rfcomm 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
snd_hda_codec_hdmi cmac algif_hash snd_hda_intel algif_skcipher 
snd_intel_dspcfg soundwire_intel af_alg soundwire_ge>
Mär 23 16:18:51 obelix kernel:  sr_mod cdrom uas usb_storage dm_crypt 
cbc encrypted_keys dm_mod trusted tpm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd 
glue_helper serio_raw ccp xhc>

Mär 23 16:18:51 obelix kernel: CR2: 0029
Mär 23 16:18:51 obelix kernel: ---[ end trace 8a72c5e07cbe6b63 ]---
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 
48 8b 7f 08 4c

Mär 23 16:18:51 obelix kernel: RSP: 0018:a35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0001 RBX: 
8b044c5ee000 RCX: 0080005b
Mär 23 16:18:51 obelix kernel: RDX: 0080005c RSI: 
0001 RDI: 8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: 8b044a877bc0 R08: 
0001 R09: 
Mär 23 16:18:51 obelix kernel: R10:  R11: 
afccba00 R12: 8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: 8b044bf6 R14: 
8b04414a1000 R15: 8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS:  () 
GS:8b075f90() knlGS:
Mär 23 16:18:51 obelix kernel: 

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-22 Thread Alex Deucher
On Thu, Mar 18, 2021 at 8:19 AM Harvey  wrote:
>
> Alex,
>
> I waited for kernel 5.11.7 to hit our repos yesterday evening and tested
> again:
>
> 1. The suspend issue is gone - suspend and resume now work as expected.
>
> 2. System hibernation seems to be a different beast - still freezing

You need this patch:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/711c13547aad08f2cfe996e0cddc3d56f1233081

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-22 Thread Harvey

Still freezing on 5.11.8 and 5.12-rc4.

Log on 5.12-rc4 looks a little different:


Mär 22 17:40:26 obelix systemd[1]: Reached target Sleep.
Mär 22 17:40:26 obelix systemd[1]: Starting Hibernate...
Mär 22 17:40:26 obelix kernel: PM: hibernation: hibernation entry
Mär 22 17:40:26 obelix systemd-sleep[2380]: Suspending system...
Mär 22 17:40:46 obelix kernel: Filesystems sync: 0.012 seconds
Mär 22 17:40:46 obelix kernel: Freezing user space processes ...
Mär 22 17:40:46 obelix kernel: Freezing of tasks failed after 20.003 
seconds (1 tasks refusing to freeze, wq_busy=0):
Mär 22 17:40:46 obelix kernel: task:Xorgstate:D stack:0 
pid: 1635 ppid:  1633 flags:0x0004

Mär 22 17:40:46 obelix kernel: Call Trace:
Mär 22 17:40:46 obelix kernel:  __schedule+0x2fc/0x8b0
Mär 22 17:40:46 obelix kernel:  schedule+0x5b/0xc0
Mär 22 17:40:46 obelix kernel:  rpm_resume+0x18c/0x810
Mär 22 17:40:46 obelix kernel:  ? wait_woken+0x80/0x80
Mär 22 17:40:46 obelix kernel:  __pm_runtime_resume+0x4a/0x80
Mär 22 17:40:46 obelix kernel:  amdgpu_drm_ioctl+0x33/0x80 [amdgpu]
Mär 22 17:40:46 obelix kernel:  __x64_sys_ioctl+0x83/0xb0
Mär 22 17:40:46 obelix kernel:  do_syscall_64+0x33/0x40
Mär 22 17:40:46 obelix kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Mär 22 17:40:46 obelix kernel: RIP: 0033:0x7f7647d4de6b
Mär 22 17:40:46 obelix kernel: RSP: 002b:7ffec3671e88 EFLAGS: 
0246 ORIG_RAX: 0010
Mär 22 17:40:46 obelix kernel: RAX: ffda RBX: 
7ffec3671ec0 RCX: 7f7647d4de6b
Mär 22 17:40:46 obelix kernel: RDX: 7ffec3671ec0 RSI: 
c06864a2 RDI: 000d
Mär 22 17:40:46 obelix kernel: RBP: c06864a2 R08: 
 R09: 
Mär 22 17:40:46 obelix kernel: R10:  R11: 
0246 R12: 5609594eedf0
Mär 22 17:40:46 obelix kernel: R13: 000d R14: 
 R15: 

Mär 22 17:40:46 obelix kernel:
Mär 22 17:40:46 obelix kernel: OOM killer enabled.
Mär 22 17:40:46 obelix kernel: Restarting tasks ... done.
Mär 22 17:40:46 obelix kernel: thermal thermal_zone1: failed to read out 
thermal zone (-61)
Mär 22 17:40:46 obelix rtkit-daemon[1381]: The canary thread is 
apparently starving. Taking action.

Mär 22 17:40:46 obelix rtkit-daemon[1381]: Demoting known real-time threads.
Mär 22 17:40:46 obelix rtkit-daemon[1381]: Successfully demoted thread 
2346 of process 1780.
Mär 22 17:40:46 obelix rtkit-daemon[1381]: Successfully demoted thread 
1811 of process 1780.
Mär 22 17:40:46 obelix rtkit-daemon[1381]: Successfully demoted thread 
1810 of process 1780.
Mär 22 17:40:46 obelix rtkit-daemon[1381]: Successfully demoted thread 
1780 of process 1780.

Mär 22 17:40:46 obelix rtkit-daemon[1381]: Demoted 4 threads.
Mär 22 17:40:46 obelix systemd-sleep[2380]: Failed to suspend system. 
System resumed again: Device or resource busy
Mär 22 17:40:46 obelix systemd[1]: systemd-hibernate.service: Main 
process exited, code=exited, status=1/FAILURE
Mär 22 17:40:46 obelix systemd[1]: systemd-hibernate.service: Failed 
with result 'exit-code'.

Mär 22 17:40:46 obelix systemd[1]: Failed to start Hibernate.
Mär 22 17:40:46 obelix kernel: PM: hibernation: hibernation exit
Mär 22 17:40:46 obelix systemd[1]: Dependency failed for Hibernate.
Mär 22 17:40:46 obelix audit[1]: SERVICE_START pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=systemd-hibernate 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? 
terminal=? res=failed'
Mär 22 17:40:46 obelix systemd[1]: hibernate.target: Job 
hibernate.target/start failed with result 'dependency'.

Mär 22 17:40:46 obelix systemd-logind[1091]: Operation 'sleep' finished.
Mär 22 17:40:46 obelix systemd[1]: Stopped target Sleep.
Mär 22 17:40:46 obelix NetworkManager[1089]:   [1616431246.8706] 
manager: sleep: wake requested (sleeping: yes  enabled: yes)
Mär 22 17:40:46 obelix kernel: audit: type=1130 
audit(1616431246.867:108): pid=1 uid=0 auid=4294967295 ses=4294967295 
msg='unit=systemd-hibernate comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Mär 22 17:40:46 obelix NetworkManager[1089]:   [1616431246.8708] 
device (wlp4s0): state change: unmanaged -> unavailable (reason 
'managed', sys-iface-state: 'external')
Mär 22 17:40:47 obelix NetworkManager[1089]:   [1616431247.1288] 
device (p2p-dev-wlp4s0): state change: unmanaged -> unavailable (reason 
'managed', sys-iface-state: 'external')
Mär 22 17:40:47 obelix NetworkManager[1089]:   [1616431247.1296] 
manager: NetworkManager state is now DISCONNECTED
Mär 22 17:40:47 obelix NetworkManager[1089]:   [1616431247.2208] 
device (wlp4s0): supplicant interface state: internal-starting -> 
disconnected
Mär 22 17:40:47 obelix NetworkManager[1089]:   [1616431247.2209] 
device (p2p-dev-wlp4s0): state change: unavailable -> unmanaged (reason 
'removed', sys-iface-state: 'removed')
Mär 22 17:40:47 obelix NetworkManager[1089]:   [1616431247.2216] 
Wi-Fi P2P device controlled by 

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-19 Thread Harvey

Evan,

this is a laptop with RENOIR hardware (Ryzen 4800H) and a discrete GPU 
RX5500. There is an external monitor connected to the HDMI port (which 
is attached to the iGPU afaict).


I would be happy to test further and help in nailing this one ;)

Greetings
Harey


Am 19.03.21 um 03:10 schrieb Quan, Evan:

[AMD Public Use]

Hi Harvey,

Resuming after mode1 reset failed according to the error logs below.
Also according to the lspci output of last email, it happened for a Navi14 ASIC.
However, I cannot reproduce that on my desktop platform with 2 x Navi14 ASICs.

Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu: MODE1 reset
Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu: GPU psp
mode1 reset
Mär 18 13:00:43 obelix kernel: [drm] psp mode1 reset succeed

...
Mär 18 13:00:43 obelix kernel: [drm:psp_v11_0_ring_create [amdgpu]]
*ERROR* Failed to wait for sOS ready for ring creation
Mär 18 13:00:43 obelix kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP
create ring failed!
Mär 18 13:00:43 obelix kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP
resume failed
Mär 18 13:00:43 obelix kernel: [drm:amdgpu_device_fw_loading [amdgpu]]
*ERROR* resume of IP block  failed -62
Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu:
amdgpu_device_ip_resume failed (-62).


Considering you seemed not running our latest driver according to the complaint 
blow. Maybe it's worth to try our latest driver(@Deucher, Alexander where 
Harvey can get our latest code?).
Mär 18 12:51:36 obelix kernel: amdgpu :03:00.0: amdgpu: smu driver
if version = 0x0036, smu fw if version = 0x0038, smu fw version
= 0x00352100 (53.33.0)

BR
Evan
-Original Message-
From: amd-gfx  On Behalf Of Harvey
Sent: Thursday, March 18, 2021 8:17 PM
To: amd-gfx@lists.freedesktop.org
Subject: Re: Amdgpu kernel oops and freezing on system suspend and hibernate

Alex,

I waited for kernel 5.11.7 to hit our repos yesterday evening and tested
again:

1. The suspend issue is gone - suspend and resume now work as expected.

2. System hibernation seems to be a different beast - still freezing

When invoking 'systemctl hibernate' the system does not power off (I
waited for 5 minutes) and I have to hard reset it to start up again. It
then tries to resume from the swap partition and comes back up with only
the external monitor connected to the HDMI port showing a picture and
the builtin screen of the laptop staying black. Nevertheless the system
is freezed and not responding, neither to mouse or keyboard. After
another hard reset I managed to get the following log from journalctl
(only cut the relevant part):

Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3466]
manager: sleep: sleep requested (sleeping: no  enabled: yes)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3473]
device (wlp4s0): state change: disconnected -> unmanaged (reason
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3508]
device (wlp4s0): set-hw-addr: reset MAC address to 14:F6:D8:18:8C:EC
(unmanage)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3575]
device (p2p-dev-wlp4s0): state change: disconnected -> unmanaged (reason
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3580]
manager: NetworkManager state is now ASLEEP
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit
ifname=p2p-dev-wlp4s0 disabled_11b_rates=0
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit
ifname=wlp4s0 disabled_11b_rates=0
Mär 18 12:51:12 obelix gsd-media-keys[1691]: Unable to get default sink
Mär 18 12:51:15 obelix gnome-shell[1496]:
../glib/gobject/gsignal.c:2732: instance '0x560b86c67b50' has no handler
with id '15070'
Mär 18 12:51:16 obelix gsd-usb-protect[1724]: Error calling USBGuard
DBus to change the protection after a screensaver event:
GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name
org.usbguard1 was not provided by any .service files
Mär 18 12:51:16 obelix systemd[1]: Reached target Sleep.
Mär 18 12:51:16 obelix systemd[1]: Starting Suspend...
Mär 18 12:51:16 obelix systemd-sleep[2000]: Suspending system...
Mär 18 12:51:16 obelix kernel: PM: suspend entry (deep)
Mär 18 12:51:16 obelix kernel: Filesystems sync: 0.005 seconds
Mär 18 12:51:36 obelix kernel: Freezing user space processes ...
(elapsed 0.002 seconds) done.
Mär 18 12:51:36 obelix kernel: OOM killer disabled.
Mär 18 12:51:36 obelix kernel: Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
Mär 18 12:51:36 obelix kernel: printk: Suspending console(s) (use
no_console_suspend to debug)
Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: ACPI: EC: interrupt blocked
Mär 18 12:51:36 obelix kernel: ACPI: Preparing to enter system sleep
state S3
Mär 18 12:51:36 obelix kernel: ACPI: EC: event blocked
Mär 18 12:51:36 obelix kernel: ACPI: EC: EC stopped

RE: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-18 Thread Quan, Evan
[AMD Public Use]

Hi Harvey,

Resuming after mode1 reset failed according to the error logs below.
Also according to the lspci output of last email, it happened for a Navi14 ASIC.
However, I cannot reproduce that on my desktop platform with 2 x Navi14 ASICs.

Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu: MODE1 reset
Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu: GPU psp 
mode1 reset
Mär 18 13:00:43 obelix kernel: [drm] psp mode1 reset succeed

...
Mär 18 13:00:43 obelix kernel: [drm:psp_v11_0_ring_create [amdgpu]] 
*ERROR* Failed to wait for sOS ready for ring creation
Mär 18 13:00:43 obelix kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP 
create ring failed!
Mär 18 13:00:43 obelix kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP 
resume failed
Mär 18 13:00:43 obelix kernel: [drm:amdgpu_device_fw_loading [amdgpu]] 
*ERROR* resume of IP block  failed -62
Mär 18 13:00:43 obelix kernel: amdgpu :03:00.0: amdgpu: 
amdgpu_device_ip_resume failed (-62).


Considering you seemed not running our latest driver according to the complaint 
blow. Maybe it's worth to try our latest driver(@Deucher, Alexander where 
Harvey can get our latest code?).
Mär 18 12:51:36 obelix kernel: amdgpu :03:00.0: amdgpu: smu driver 
if version = 0x0036, smu fw if version = 0x0038, smu fw version 
= 0x00352100 (53.33.0)

BR
Evan
-Original Message-
From: amd-gfx  On Behalf Of Harvey
Sent: Thursday, March 18, 2021 8:17 PM
To: amd-gfx@lists.freedesktop.org
Subject: Re: Amdgpu kernel oops and freezing on system suspend and hibernate

Alex,

I waited for kernel 5.11.7 to hit our repos yesterday evening and tested 
again:

1. The suspend issue is gone - suspend and resume now work as expected.

2. System hibernation seems to be a different beast - still freezing

When invoking 'systemctl hibernate' the system does not power off (I 
waited for 5 minutes) and I have to hard reset it to start up again. It 
then tries to resume from the swap partition and comes back up with only 
the external monitor connected to the HDMI port showing a picture and 
the builtin screen of the laptop staying black. Nevertheless the system 
is freezed and not responding, neither to mouse or keyboard. After 
another hard reset I managed to get the following log from journalctl 
(only cut the relevant part):

Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3466] 
manager: sleep: sleep requested (sleeping: no  enabled: yes)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3473] 
device (wlp4s0): state change: disconnected -> unmanaged (reason 
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3508] 
device (wlp4s0): set-hw-addr: reset MAC address to 14:F6:D8:18:8C:EC 
(unmanage)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3575] 
device (p2p-dev-wlp4s0): state change: disconnected -> unmanaged (reason 
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3580] 
manager: NetworkManager state is now ASLEEP
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit 
ifname=p2p-dev-wlp4s0 disabled_11b_rates=0
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit 
ifname=wlp4s0 disabled_11b_rates=0
Mär 18 12:51:12 obelix gsd-media-keys[1691]: Unable to get default sink
Mär 18 12:51:15 obelix gnome-shell[1496]: 
../glib/gobject/gsignal.c:2732: instance '0x560b86c67b50' has no handler 
with id '15070'
Mär 18 12:51:16 obelix gsd-usb-protect[1724]: Error calling USBGuard 
DBus to change the protection after a screensaver event: 
GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name 
org.usbguard1 was not provided by any .service files
Mär 18 12:51:16 obelix systemd[1]: Reached target Sleep.
Mär 18 12:51:16 obelix systemd[1]: Starting Suspend...
Mär 18 12:51:16 obelix systemd-sleep[2000]: Suspending system...
Mär 18 12:51:16 obelix kernel: PM: suspend entry (deep)
Mär 18 12:51:16 obelix kernel: Filesystems sync: 0.005 seconds
Mär 18 12:51:36 obelix kernel: Freezing user space processes ... 
(elapsed 0.002 seconds) done.
Mär 18 12:51:36 obelix kernel: OOM killer disabled.
Mär 18 12:51:36 obelix kernel: Freezing remaining freezable tasks ... 
(elapsed 0.001 seconds) done.
Mär 18 12:51:36 obelix kernel: printk: Suspending console(s) (use 
no_console_suspend to debug)
Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: ACPI: EC: interrupt blocked
Mär 18 12:51:36 obelix kernel: ACPI: Preparing to enter system sleep 
state S3
Mär 18 12:51:36 obelix kernel: ACPI: EC: event blocked
Mär 18 12:51:36 obelix kernel: ACPI: EC: EC stopped
Mär 18 12:51:36 obelix kernel: PM: Saving platform NVS memory
Mär 18 12:51:36 obelix kernel: Disabling non-boot CPUs ...
Mär 18 12:51:36 obelix kernel: IRQ 86: no longer affine to CPU1
Mär 18 12:51:36 obelix kernel: smpboot: CPU 1 is now offline
Mär 18 12:51:36 obelix kernel: 

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-18 Thread Harvey

Alex,

I waited for kernel 5.11.7 to hit our repos yesterday evening and tested 
again:


1. The suspend issue is gone - suspend and resume now work as expected.

2. System hibernation seems to be a different beast - still freezing

When invoking 'systemctl hibernate' the system does not power off (I 
waited for 5 minutes) and I have to hard reset it to start up again. It 
then tries to resume from the swap partition and comes back up with only 
the external monitor connected to the HDMI port showing a picture and 
the builtin screen of the laptop staying black. Nevertheless the system 
is freezed and not responding, neither to mouse or keyboard. After 
another hard reset I managed to get the following log from journalctl 
(only cut the relevant part):


Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3466] 
manager: sleep: sleep requested (sleeping: no  enabled: yes)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3473] 
device (wlp4s0): state change: disconnected -> unmanaged (reason 
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3508] 
device (wlp4s0): set-hw-addr: reset MAC address to 14:F6:D8:18:8C:EC 
(unmanage)
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3575] 
device (p2p-dev-wlp4s0): state change: disconnected -> unmanaged (reason 
'sleeping', sys-iface-state: 'managed')
Mär 18 12:51:11 obelix NetworkManager[866]:   [1616068271.3580] 
manager: NetworkManager state is now ASLEEP
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit 
ifname=p2p-dev-wlp4s0 disabled_11b_rates=0
Mär 18 12:51:11 obelix wpa_supplicant[954]: nl80211: deinit 
ifname=wlp4s0 disabled_11b_rates=0

Mär 18 12:51:12 obelix gsd-media-keys[1691]: Unable to get default sink
Mär 18 12:51:15 obelix gnome-shell[1496]: 
../glib/gobject/gsignal.c:2732: instance '0x560b86c67b50' has no handler 
with id '15070'
Mär 18 12:51:16 obelix gsd-usb-protect[1724]: Error calling USBGuard 
DBus to change the protection after a screensaver event: 
GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name 
org.usbguard1 was not provided by any .service files

Mär 18 12:51:16 obelix systemd[1]: Reached target Sleep.
Mär 18 12:51:16 obelix systemd[1]: Starting Suspend...
Mär 18 12:51:16 obelix systemd-sleep[2000]: Suspending system...
Mär 18 12:51:16 obelix kernel: PM: suspend entry (deep)
Mär 18 12:51:16 obelix kernel: Filesystems sync: 0.005 seconds
Mär 18 12:51:36 obelix kernel: Freezing user space processes ... 
(elapsed 0.002 seconds) done.

Mär 18 12:51:36 obelix kernel: OOM killer disabled.
Mär 18 12:51:36 obelix kernel: Freezing remaining freezable tasks ... 
(elapsed 0.001 seconds) done.
Mär 18 12:51:36 obelix kernel: printk: Suspending console(s) (use 
no_console_suspend to debug)

Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: [drm] free PSP TMR buffer
Mär 18 12:51:36 obelix kernel: ACPI: EC: interrupt blocked
Mär 18 12:51:36 obelix kernel: ACPI: Preparing to enter system sleep 
state S3

Mär 18 12:51:36 obelix kernel: ACPI: EC: event blocked
Mär 18 12:51:36 obelix kernel: ACPI: EC: EC stopped
Mär 18 12:51:36 obelix kernel: PM: Saving platform NVS memory
Mär 18 12:51:36 obelix kernel: Disabling non-boot CPUs ...
Mär 18 12:51:36 obelix kernel: IRQ 86: no longer affine to CPU1
Mär 18 12:51:36 obelix kernel: smpboot: CPU 1 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 87: no longer affine to CPU2
Mär 18 12:51:36 obelix kernel: smpboot: CPU 2 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 88: no longer affine to CPU3
Mär 18 12:51:36 obelix kernel: smpboot: CPU 3 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 89: no longer affine to CPU4
Mär 18 12:51:36 obelix kernel: smpboot: CPU 4 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 90: no longer affine to CPU5
Mär 18 12:51:36 obelix kernel: smpboot: CPU 5 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 91: no longer affine to CPU6
Mär 18 12:51:36 obelix kernel: smpboot: CPU 6 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 92: no longer affine to CPU7
Mär 18 12:51:36 obelix kernel: smpboot: CPU 7 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 93: no longer affine to CPU8
Mär 18 12:51:36 obelix kernel: smpboot: CPU 8 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 94: no longer affine to CPU9
Mär 18 12:51:36 obelix kernel: smpboot: CPU 9 is now offline
Mär 18 12:51:36 obelix kernel: IRQ 95: no longer affine to CPU10
Mär 18 12:51:36 obelix kernel: smpboot: CPU 10 is now offline
Mär 18 12:51:36 obelix kernel: smpboot: CPU 11 is now offline
Mär 18 12:51:36 obelix kernel: smpboot: CPU 12 is now offline
Mär 18 12:51:36 obelix kernel: smpboot: CPU 13 is now offline
Mär 18 12:51:36 obelix kernel: smpboot: CPU 14 is now offline
Mär 18 12:51:36 obelix kernel: smpboot: CPU 15 is now offline
Mär 18 12:51:36 obelix kernel: ACPI: Low-level resume complete
Mär 18 12:51:36 obelix kernel: ACPI: EC: EC started
Mär 18 12:51:36 obelix kernel: PM: Restoring 

Re: Amdgpu kernel oops and freezing on system suspend and hibernate

2021-03-17 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Please try the latest patch set on this bug:
https://gitlab.freedesktop.org/drm/amd/-/issues/1230

Alex

From: amd-gfx  on behalf of Harvey 

Sent: Wednesday, March 17, 2021 12:20 PM
To: amd-gfx@lists.freedesktop.org 
Subject: Amdgpu kernel oops and freezing on system suspend and hibernate

Hello,

I own a laptop, a MSI Bravo 17 A4DDR/MS-17FK
with Ryzen 7 4800U and hybrid graphics on a Radeon RX 5500M.

DMI: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS
E17FKAMS.117 10/29/2020

The system does not hibernate, it just freezes. Starting after a reset
it then resumes from the swap partition and gets the system up, but
shortly after that freezes again.

Even suspending is not working properly - on archlinux with kernel
5.11.6 and on 5.12-rc1 I see the following kernel oopses after resume:

The output of dmesg -l err,warn is:

[11020.188925] [ cut here ]
[11020.188929] WARNING: CPU: 0 PID: 7736 at
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2574
dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[11020.189314] Modules linked in: rfcomm snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi cmac algif_hash
algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common iwlmvm
snd_hda_intel snd_intel_dspcfg soundwire_intel
soundwire_generic_allocation soundwire_cadence nls_iso8859_1 vfat
mac80211 snd_hda_codec fat edac_mce_amd uvcvideo btusb snd_hda_core
kvm_amd btrtl libarc4 videobuf2_vmalloc btbcm snd_hwdep videobuf2_memops
hid_multitouch soundwire_bus videobuf2_v4l2 btintel pktcdvd iwlwifi
snd_soc_core kvm videobuf2_common bluetooth snd_compress videodev
ac97_bus snd_pcm_dmaengine snd_pcm snd_timer irqbypass msi_wmi
ecdh_generic joydev mousedev cfg80211 mc ecc rapl snd psmouse
snd_rn_pci_acp3x pcspkr sparse_keymap k10temp i2c_piix4 snd_pci_acp3x
soundcore rfkill tpm_crb tpm_tis tpm_tis_core pinctrl_amd i2c_hid
acpi_cpufreq mac_hid soc_button_array vboxnetflt(OE) vboxnetadp(OE)
vboxdrv(OE) usbip_host usbip_core sg fuse crypto_user bpf_preload
ip_tables x_tables
[11020.189400]  ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom uas
usb_storage dm_crypt cbc encrypted_keys dm_mod trusted tpm
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
aesni_intel crypto_simd cryptd glue_helper serio_raw ccp xhci_pci
xhci_pci_renesas rng_core wmi video usbhid r8168(OE) amdgpu
drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops cec drm agpgart
[11020.189445] CPU: 0 PID: 7736 Comm: systemd-sleep Tainted: G
  OE 5.11.6-arch1-1 #1
[11020.189450] Hardware name: Micro-Star International Co., Ltd. Bravo
17 A4DDR/MS-17FK, BIOS E17FKAMS.117 10/29/2020
[11020.189453] RIP: 0010:dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[11020.189792] Code: 88 03 00 00 31 c0 48 8d 96 f0 01 00 00 48 8b 0a 48
85 c9 74 06 48 3b 59 08 74 20 83 c0 01 48 81 c2 d0 04 00 00 83 f8 06 75
e3 <0f> 0b 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d 41 5e c3 48 98 48 69 c0
[11020.189795] RSP: 0018:c1f003373c38 EFLAGS: 00010246
[11020.189799] RAX: 0006 RBX: 9e244e0ea800 RCX:

[11020.189802] RDX: 9e2582fe1ed0 RSI: 9e2582fe RDI:

[11020.189804] RBP: 9e244e0f R08: 00f9 R09:
9e244323a000
[11020.189806] R10: 9e244323ae40 R11: 01320122 R12:
fa01
[11020.189808] R13:  R14: fa42 R15:
0003
[11020.189810] FS:  7f6219470a40() GS:9e275f60()
knlGS:
[11020.189813] CS:  0010 DS:  ES:  CR0: 80050033
[11020.189815] CR2: 7fb7a8980180 CR3: 000109cae000 CR4:
00350ef0
[11020.189818] Call Trace:
[11020.189828]  amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu]
[11020.190185]  backlight_suspend+0x6a/0x80
[11020.190192]  ? brightness_store+0x80/0x80
[11020.190197]  dpm_run_callback+0x4c/0x150
[11020.190202]  __device_suspend+0x11c/0x4d0
[11020.190205]  dpm_suspend+0xef/0x230
[11020.190209]  dpm_suspend_start+0x77/0x80
[11020.190213]  suspend_devices_and_enter+0x109/0x800
[11020.190219]  pm_suspend.cold+0x329/0x374
[11020.190225]  state_store+0x71/0xd0
[11020.190230]  kernfs_fop_write_iter+0x124/0x1b0
[11020.190236]  new_sync_write+0x159/0x1f0
[11020.190241]  vfs_write+0x1fc/0x2a0
[11020.190245]  ksys_write+0x67/0xe0
[11020.190249]  do_syscall_64+0x33/0x40
[11020.190255]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[11020.190261] RIP: 0033:0x7f6219de10f7
[11020.190265] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[11020.190268] RSP: 002b:7fff7ae91318 EFLAGS: 0246 ORIG_RAX:
0001
[11020.190272] RAX: ffda RBX: