Another occurrence today (30 May), barely 24 hours after previous freeze. Again on the 6.19.10 mainline kernel. This time I captured debug data live, while the display was still frozen, before recovering. I'm posting it raw, without interpretation.
System: same as my previous comment - Ubuntu 26.04, RX 7900 XT (Navi 31), kernel 6.19.10-061910-generic, GNOME Wayland. What happened: - Monitor 1: Counter-Strike 2 (fullscreen). Monitor 2: Chrome with a Twitch stream playing in one tab and a second tab open. - Monitor 2 froze on a single frame and stayed frozen for 11+ minutes. Monitor 1 kept working normally the whole time. - I did not recover via Settings this time. I left it frozen and collected the data below. - I recovered by switching VT (Ctrl+Alt+F1): both screens went black for ~15-30 seconds, then the login manager appeared, and after that monitor 2 was working again. - Observation: both of my long (multi-minute) freezes so far have happened while video was playing on monitor 2. Live kernel log during the freeze (sudo journalctl -k; apparmor audit lines removed for brevity): May 30 19:09:38 kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:367:crtc-1] flip_done timed out May 30 19:09:38 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* [CRTC:367:crtc-1] hw_done or flip_done timed out Relevant dmesg lines from this boot (apparmor audit lines and one unrelated mmput_async_fn workqueue line removed): [12199.091506] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND [27450.056954] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND [34609.489796] amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data [34745.939967] workqueue: dm_handle_vmin_vmax_update [amdgpu] hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND [37445.522679] amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:367:crtc-1] flip_done timed out [37445.522708] amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* [CRTC:367:crtc-1] hw_done or flip_done timed out Note: the "dc_dmub_srv_log_diagnostic_data: DMCUB error" line above did not appear in any of my earlier captures; this is the first time I have seen it. DMUB debug interfaces, read while the display was still frozen: $ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_state 2>/dev/null || echo "not available" not available $ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm 0 $ sudo ls /sys/kernel/debug/dri/0/ | grep -i dmub amdgpu_dm_dmub_fw_state amdgpu_dm_dmub_trace_mask amdgpu_dm_dmub_tracebuffer $ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_fw_state cat: /sys/kernel/debug/dri/0/amdgpu_dm_dmub_fw_state: Operation not permitted $ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_trace_mask 0x0 $ sudo cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_tracebuffer trace_code=1 tick_count=413008612 param0=83897600 param1=0 trace_code=239 tick_count=413009028 param0=29 param1=5 trace_code=236 tick_count=413009348 param0=0 param1=4 trace_code=237 tick_count=413009368 param0=0 param1=0 trace_code=238 tick_count=413009376 param0=0 param1=0 trace_code=236 tick_count=413009684 param0=2 param1=4 trace_code=237 tick_count=413009692 param0=0 param1=0 trace_code=238 tick_count=413009700 param0=0 param1=0 trace_code=236 tick_count=413010188 param0=3 param1=4 trace_code=237 tick_count=413010196 param0=0 param1=0 trace_code=238 tick_count=413010204 param0=6 param1=1 trace_code=236 tick_count=413010624 param0=4 param1=4 trace_code=237 tick_count=413010632 param0=0 param1=0 trace_code=238 tick_count=413010640 param0=6 param1=1 trace_code=17 tick_count=413011792 param0=0 param1=0 trace_code=33 tick_count=413018256 param0=3 param1=1792 trace_code=34 tick_count=413018448 param0=3 param1=0 trace_code=33 tick_count=413018648 param0=3 param1=0 trace_code=34 tick_count=413018660 param0=3 param1=0 trace_code=33 tick_count=413024864 param0=3 param1=1 trace_code=34 tick_count=413024892 param0=3 param1=0 trace_code=33 tick_count=413025036 param0=3 param1=2 trace_code=34 tick_count=413025044 param0=3 param1=0 trace_code=33 tick_count=413031204 param0=3 param1=3 trace_code=34 tick_count=413031232 param0=3 param1=0 trace_code=33 tick_count=413098892 param0=26 param1=3 trace_code=427 tick_count=413098928 param0=3 param1=3 trace_code=428 tick_count=413099108 param0=3 param1=1 trace_code=429 tick_count=413099128 param0=1 param1=0 trace_code=34 tick_count=413099156 param0=26 param1=0 trace_code=33 tick_count=413105292 param0=26 param1=4 trace_code=427 tick_count=413105316 param0=4 param1=4 trace_code=428 tick_count=413105476 param0=4 param1=1 trace_code=429 tick_count=413105484 param0=1 param1=0 trace_code=34 tick_count=413105512 param0=26 param1=0 Notes on the captures, stated as facts only: - amdgpu_dm_dmub_fw_state returned "Operation not permitted" even under sudo. - amdgpu_dm_dmub_trace_mask was 0x0, so the tracebuffer above is whatever is captured under the default trace mask. - I read the tracebuffer only once, so I cannot say whether it was still advancing or had stopped. This reproduces every couple of days. If specific debug files, a particular trace mask value, or any other data would be useful to capture during the next occurrence, tell me exactly what to run and I will collect it. I can also provide the full untrimmed journal/dmesg on request. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2150776 Title: Ubuntu 26.04 GNOME Wayland: random short display/presentation freezes on AMD RX 7900 XT while apps continue running To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2150776/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
