Bug#1053864: libdrm-amdgpu1: gpu crash on graphics start with Radeon 760M (both sway and gdm3)
Oop, my bad. I was wondering why I hadn't seen it go through on the bug report... The issue is still present in apt package linux-image-6.5.0-3 (Kernel 6.5.8-1) , and linux-image-6.5.0-4 (kernel 6.5.10-1). Same messages, as far as I can see, but here's the dmesg output from the 6.5.10-1 kernel in case there's something subtly different. Thanks, Simon [ 7.490078] ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5) [ 7.605873] ucsi_acpi USBC000:00: possible UCSI driver bug 1 [ 7.605903] ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) [ 13.555707] pipewire[1065]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 23.808871] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=23, emitted seq=25 [ 23.809320] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ 23.809592] amdgpu :c1:00.0: amdgpu: GPU reset begin! [ 23.990678] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 23.990842] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.124228] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.124374] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.257754] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.257918] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.391326] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.391555] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.525068] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.525211] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.658617] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.658758] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.792155] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.792326] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 24.925815] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 24.925961] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 25.059344] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 25.059488] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 25.061023] amdgpu :c1:00.0: amdgpu: MODE2 reset [ 25.090107] amdgpu :c1:00.0: amdgpu: GPU reset succeeded, trying to resume [ 25.090767] [drm] PCIE GART of 512M enabled (table at 0x00801FD0). [ 25.090889] amdgpu :c1:00.0: amdgpu: SMU is resuming... [ 25.092526] amdgpu :c1:00.0: amdgpu: SMU is resumed successfully! [ 25.094267] [drm] DMUB hardware initialized: version=0x08000E00 [ 25.101834] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264 [ 25.104428] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272 [ 25.107025] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280 [ 25.109617] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288 [ 25.117187] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264 [ 25.119782] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272 [ 25.122380] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280 [ 25.124993] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288 [ 25.534004] [drm] kiq ring mec 3 pipe 1 q 0 [ 25.536314] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 25.536470] amdgpu :c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully. [ 25.537196] amdgpu :c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 25.537200] amdgpu :c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 25.537202] amdgpu :c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 25.537204] amdgpu :c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [ 25.537206] amdgpu :c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [ 25.537208] amdgpu :c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [ 25.537210] amdgpu :c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [ 25.537212]
Bug#1053864: libdrm-amdgpu1: gpu crash on graphics start with Radeon 760M (both sway and gdm3)
Control: tag -1 moreinfo On Fri, 13 Oct 2023 00:47:57 -0400 Simon Heath wrote: > Package: libdrm-amdgpu1 > Version: 2.4.115-1 > > When GDM3 starts, or when I turn it off and log into the console by hand > and then start sway or another WM, often the graphics mode switch will > hang for a few seconds on an unresponsive black screen, then go back to > a text console for an instant and try again. This seems to repeat 0-3 > times until eventually it works successfully. Sometimes it works on the > first try, often on the second try, etc. > > Once Sway or GDM3 and Xorg have actually started, it *seems* perfectly > stable, as far as I've seen so far. > > I also see the following errors in dmesg associated with the > apparent-crash-and-restart: > > [ 26.625039] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, > signaled seq=23, emitted seq=25 > [ 26.625482] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process > information: process pid 0 thread pid 0 > [ 26.625820] amdgpu :c1:00.0: amdgpu: GPU reset begin! > [ 26.810595] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 > [amdgpu]] *ERROR* MES failed to response msg=3 > [ 26.810761] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to > unmap legacy queue > ... > Kernel: Linux 6.5.0-1-amd64 (SMP w/12 CPU threads; PREEMPT) Those messages are actually from the kernel driver. Can you test whether the issue is still present with kernel 6.5.8-1 (Testing) and if so, also try it with 6.5.10-1 from Unstable? signature.asc Description: This is a digitally signed message part.
Bug#1053864: libdrm-amdgpu1: gpu crash on graphics start with Radeon 760M (both sway and gdm3)
Package: libdrm-amdgpu1 Version: 2.4.115-1 Severity: normal X-Debbugs-Cc: ice...@dreamquest.io Dear Maintainer, When GDM3 starts, or when I turn it off and log into the console by hand and then start sway or another WM, often the graphics mode switch will hang for a few seconds on an unresponsive black screen, then go back to a text console for an instant and try again. This seems to repeat 0-3 times until eventually it works successfully. Sometimes it works on the first try, often on the second try, etc. Once Sway or GDM3 and Xorg have actually started, it *seems* perfectly stable, as far as I've seen so far. This is a brand new GPU chipset afaik so graphics bugs are pretty understandable. CPU: AMD Ryzen 5 7640U w/ Radeon 760M Graphics Extended renderer info from `glxinfo`: Device: AMD Radeon Graphics (gfx1103_r1, LLVM 16.0.6, DRM 3.54, 6.5.0-1-amd64) (0x15bf) Version: 23.2.1 I also see the following errors in dmesg associated with the apparent-crash-and-restart: [ 26.625039] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=23, emitted seq=25 [ 26.625482] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ 26.625820] amdgpu :c1:00.0: amdgpu: GPU reset begin! [ 26.810595] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 26.810761] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 26.944169] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 26.944310] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.077693] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.077834] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.211163] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.211303] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.344634] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.344776] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.478028] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.478175] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.611499] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.611640] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.744960] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.745097] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.878425] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 27.878564] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 27.880086] amdgpu :c1:00.0: amdgpu: MODE2 reset [ 27.909811] amdgpu :c1:00.0: amdgpu: GPU reset succeeded, trying to resume [ 27.910426] [drm] PCIE GART of 512M enabled (table at 0x00801FD0). [ 27.910540] amdgpu :c1:00.0: amdgpu: SMU is resuming... [ 27.911480] amdgpu :c1:00.0: amdgpu: SMU is resumed successfully! [ 27.913327] [drm] DMUB hardware initialized: version=0x08000E00 [ 27.918776] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264 [ 27.921376] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272 [ 27.923969] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280 [ 27.926566] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288 [ 27.934650] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264 [ 27.937248] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272 [ 27.939841] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280 [ 27.942439] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288 [ 28.328853] [drm] kiq ring mec 3 pipe 1 q 0 [ 28.331133] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 28.331252] amdgpu :c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully. [ 28.331965] amdgpu :c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 28.331968] amdgpu :c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 28.331971] amdgpu :c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 28.331973] amdgpu :c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [ 28.331975] amdgpu