Another occurrence today (30 May), barely 24 hours after previous freeze. Again 
on the 6.19.10 mainline kernel. This time I captured debug data live, while the 
display was still frozen, before
recovering. I'm posting it raw, without interpretation.

System: same as my previous comment - Ubuntu 26.04, RX 7900 XT (Navi 31),
kernel 6.19.10-061910-generic, GNOME Wayland.

What happened:
- Monitor 1: Counter-Strike 2 (fullscreen). Monitor 2: Chrome with a Twitch
  stream playing in one tab and a second tab open.
- Monitor 2 froze on a single frame and stayed frozen for 11+ minutes. Monitor 1
  kept working normally the whole time.
- I did not recover via Settings this time. I left it frozen and collected the
  data below.
- I recovered by switching VT (Ctrl+Alt+F1): both screens went black for
  ~15-30 seconds, then the login manager appeared, and after that monitor 2 was
  working again.
- Observation: both of my long (multi-minute) freezes so far have happened
  while video was playing on monitor 2.

Live kernel log during the freeze (sudo journalctl -k; apparmor audit lines
removed for brevity):

May 30 19:09:38 kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:367:crtc-1] 
flip_done timed out
May 30 19:09:38 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* 
[CRTC:367:crtc-1] hw_done or flip_done timed out

Relevant dmesg lines from this boot (apparmor audit lines and one unrelated
mmput_async_fn workqueue line removed):

[12199.091506] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for 
>10000us 11 times, consider switching to WQ_UNBOUND
[27450.056954] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for 
>10000us 19 times, consider switching to WQ_UNBOUND
[34609.489796] amdgpu 0000:03:00.0: [drm] *ERROR* 
dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[34745.939967] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for 
>10000us 35 times, consider switching to WQ_UNBOUND
[37445.522679] amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:367:crtc-1] flip_done 
timed out
[37445.522708] amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* [CRTC:367:crtc-1] 
hw_done or flip_done timed out

Note: the "dc_dmub_srv_log_diagnostic_data: DMCUB error" line above did not
appear in any of my earlier captures; this is the first time I have seen it.

DMUB debug interfaces, read while the display was still frozen:

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_state 2>/dev/null || echo 
"not available"
not available

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm
0

$ sudo ls /sys/kernel/debug/dri/0/ | grep -i dmub
amdgpu_dm_dmub_fw_state
amdgpu_dm_dmub_trace_mask
amdgpu_dm_dmub_tracebuffer

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_fw_state
cat: /sys/kernel/debug/dri/0/amdgpu_dm_dmub_fw_state: Operation not permitted

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_trace_mask
0x0

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_tracebuffer
trace_code=1 tick_count=413008612 param0=83897600 param1=0
trace_code=239 tick_count=413009028 param0=29 param1=5
trace_code=236 tick_count=413009348 param0=0 param1=4
trace_code=237 tick_count=413009368 param0=0 param1=0
trace_code=238 tick_count=413009376 param0=0 param1=0
trace_code=236 tick_count=413009684 param0=2 param1=4
trace_code=237 tick_count=413009692 param0=0 param1=0
trace_code=238 tick_count=413009700 param0=0 param1=0
trace_code=236 tick_count=413010188 param0=3 param1=4
trace_code=237 tick_count=413010196 param0=0 param1=0
trace_code=238 tick_count=413010204 param0=6 param1=1
trace_code=236 tick_count=413010624 param0=4 param1=4
trace_code=237 tick_count=413010632 param0=0 param1=0
trace_code=238 tick_count=413010640 param0=6 param1=1
trace_code=17 tick_count=413011792 param0=0 param1=0
trace_code=33 tick_count=413018256 param0=3 param1=1792
trace_code=34 tick_count=413018448 param0=3 param1=0
trace_code=33 tick_count=413018648 param0=3 param1=0
trace_code=34 tick_count=413018660 param0=3 param1=0
trace_code=33 tick_count=413024864 param0=3 param1=1
trace_code=34 tick_count=413024892 param0=3 param1=0
trace_code=33 tick_count=413025036 param0=3 param1=2
trace_code=34 tick_count=413025044 param0=3 param1=0
trace_code=33 tick_count=413031204 param0=3 param1=3
trace_code=34 tick_count=413031232 param0=3 param1=0
trace_code=33 tick_count=413098892 param0=26 param1=3
trace_code=427 tick_count=413098928 param0=3 param1=3
trace_code=428 tick_count=413099108 param0=3 param1=1
trace_code=429 tick_count=413099128 param0=1 param1=0
trace_code=34 tick_count=413099156 param0=26 param1=0
trace_code=33 tick_count=413105292 param0=26 param1=4
trace_code=427 tick_count=413105316 param0=4 param1=4
trace_code=428 tick_count=413105476 param0=4 param1=1
trace_code=429 tick_count=413105484 param0=1 param1=0
trace_code=34 tick_count=413105512 param0=26 param1=0

Notes on the captures, stated as facts only:
- amdgpu_dm_dmub_fw_state returned "Operation not permitted" even under sudo.
- amdgpu_dm_dmub_trace_mask was 0x0, so the tracebuffer above is whatever is
  captured under the default trace mask.
- I read the tracebuffer only once, so I cannot say whether it was still
  advancing or had stopped.

This reproduces every couple of days. If specific debug files, a particular
trace mask value, or any other data would be useful to capture during the next
occurrence, tell me exactly what to run and I will collect it. I can also
provide the full untrimmed journal/dmesg on request.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2150776

Title:
  Ubuntu 26.04 GNOME Wayland: random short display/presentation freezes
  on AMD RX 7900 XT while apps continue running

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2150776/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to