Bug#1092624: marked as done (linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels)

2025-05-22 Thread Debian Bug Tracking System
Your message dated Thu, 22 May 2025 21:00:10 +
with message-id 
and subject line Bug#1092624: fixed in linux 6.15~rc7-1~exp1
has caused the Debian Bug report #1092624,
regarding linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus 
errors, not present in 6.10 or 6.11 series kernels
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: src:linux
Version: 6.12.8-1
Severity: normal
X-Debbugs-Cc: gib...@debian.org

  I have been running testing on this machine for the past four months,
and the AMD GPU has worked just fine with the 6.10 and 6.11 series
kernels. However, a couple of weeks ago when 6.12.5 migrated to
testing, I began to notice numerous kernel errors:

> [ 2269.495240] amdgpu :e3:00.0: PCIe Bus Error: severity=Correctable, 
> type=Transaction Layer, (Receiver ID)
> [ 2269.495250] amdgpu :e3:00.0:   device [1002:744c] error 
> status/mask=2000/
> [ 2269.495256] amdgpu :e3:00.0:[13] NonFatalErr   
> [ 2269.495275] pcieport :e0:01.1: AER: Correctable error message received 
> from :e3:00.0

  While the GPU still works, I have also observed over the past couple
of weeks that sometimes when I turn on my monitor after a period of
inactivity (such as overnight) the GPU fails to "wake up" and nothing
is displayed. The system is still responsive (if I switch to a virtual
terminal I can CTRL+ALT+DEL to reboot the system), but there's no
graphical output.

  I did try downgrading firmware-amd-graphics to 20240909-2 just in
case there might have been something caused by the latest firmware
updates, but the issue persists.

  I am running the KDE Plasma 6 desktop with wayland, but have been
doing so for a couple of months now without issue.

Mathias

-- Package-specific info:
** Version:
Linux version 6.12.8-amd64 (debian-kernel@lists.debian.org) 
(x86_64-linux-gnu-gcc-14 (Debian 14.2.0-12) 14.2.0, GNU ld (GNU Binutils for 
Debian) 2.43.50.20241230) #1 SMP PREEMPT_DYNAMIC Debian 6.12.8-1 (2025-01-02)

** Command line:
BOOT_IMAGE=/vmlinuz-6.12.8-amd64 root=/dev/mapper/olorin--vg-root ro quiet 
nosplash

** Tainted: PO (4097)
 * proprietary module was loaded
 * externally-built ("out-of-tree") module was loaded

** Kernel log:
See attached files, 6.11.10-good and 6.12.8-bad.

** Model information
sys_vendor: ASUS
product_name: System Product Name
product_version: System Version
chassis_vendor: Default string
chassis_version: Default string
bios_vendor: American Megatrends Inc.
bios_version: 0803
board_vendor: ASUSTeK COMPUTER INC.
board_name: Pro WS WRX90E-SAGE SE
board_version: Rev 1.xx

** Loaded modules:
cfg80211
veth
snd_seq_dummy
snd_hrtimer
snd_seq
snd_seq_device
nft_masq
nft_ct
nft_reject_ipv4
nf_reject_ipv4
nft_reject
act_csum
cls_u32
sch_htb
nft_chain_nat
wireguard
nf_nat
libchacha20poly1305
nf_conntrack
chacha_x86_64
nf_defrag_ipv6
poly1305_x86_64
nf_defrag_ipv4
curve25519_x86_64
libcurve25519_generic
libchacha
ip6_udp_tunnel
nf_tables
udp_tunnel
libcrc32c
bridge
stp
llc
vhost_vsock
vmw_vsock_virtio_transport_common
vhost
vhost_iotlb
qrtr
vsock
sunrpc
binfmt_misc
nls_ascii
nls_cp437
vfat
fat
snd_hda_codec_realtek
snd_hda_codec_generic
eeepc_wmi
snd_hda_scodec_component
asus_wmi
snd_hda_codec_hdmi
amd_atl
intel_rapl_msr
sparse_keymap
snd_hda_intel
intel_rapl_common
platform_profile
snd_intel_dspcfg
amd64_edac
battery
snd_intel_sdw_acpi
edac_mce_amd
snd_hda_codec
snd_hda_core
kvm_amd
snd_hwdep
zfs(PO)
snd_pcm
kvm
snd_timer
spd5118
ucsi_acpi
snd
sp5100_tco
typec_ucsi
rfkill
rapl
wmi_bmof
pcspkr
soundcore
ccp
k10temp
watchdog
typec
joydev
spl(O)
roles
sg
evdev
msr
parport_pc
ppdev
lp
parport
efi_pstore
configfs
nfnetlink
efivarfs
ip_tables
x_tables
autofs4
ext4
mbcache
jbd2
crc32c_generic
dm_crypt
dm_mod
amdgpu
amdxcp
drm_exec
gpu_sched
drm_buddy
video
i2c_algo_bit
drm_suballoc_helper
drm_display_helper
sr_mod
cec
cdrom
sd_mod
crct10dif_pclmul
rc_core
crc32_pclmul
hid_generic
usbhid
hid
drm_ttm_helper
crc32c_intel
ttm
ahci
ghash_clmulni_intel
xhci_pci
libahci
drm_kms_helper
sha512_ssse3
nvme
xhci_hcd
libata
sha256_ssse3
drm
nvme_core
i40e
usbcore
i2c_piix4
scsi_mod
gpio_amdpt
thunderbolt
sha1_ssse3
crc16
nvme_auth
libie
usb_common
i2c_smbus
scsi_common
wmi
gpio_generic
button
aesni_intel
gf128mul
crypto_simd
cryptd

** PCI devices:
e3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
31 [Radeon RX 7900 XT/7900 XTX/7900 GR

Bug#1092624: marked as done (linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels)

2025-05-18 Thread Debian Bug Tracking System
Your message dated Sun, 18 May 2025 17:00:12 +
with message-id 
and subject line Bug#1092624: fixed in linux 6.12.29-1
has caused the Debian Bug report #1092624,
regarding linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus 
errors, not present in 6.10 or 6.11 series kernels
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: src:linux
Version: 6.12.8-1
Severity: normal
X-Debbugs-Cc: gib...@debian.org

  I have been running testing on this machine for the past four months,
and the AMD GPU has worked just fine with the 6.10 and 6.11 series
kernels. However, a couple of weeks ago when 6.12.5 migrated to
testing, I began to notice numerous kernel errors:

> [ 2269.495240] amdgpu :e3:00.0: PCIe Bus Error: severity=Correctable, 
> type=Transaction Layer, (Receiver ID)
> [ 2269.495250] amdgpu :e3:00.0:   device [1002:744c] error 
> status/mask=2000/
> [ 2269.495256] amdgpu :e3:00.0:[13] NonFatalErr   
> [ 2269.495275] pcieport :e0:01.1: AER: Correctable error message received 
> from :e3:00.0

  While the GPU still works, I have also observed over the past couple
of weeks that sometimes when I turn on my monitor after a period of
inactivity (such as overnight) the GPU fails to "wake up" and nothing
is displayed. The system is still responsive (if I switch to a virtual
terminal I can CTRL+ALT+DEL to reboot the system), but there's no
graphical output.

  I did try downgrading firmware-amd-graphics to 20240909-2 just in
case there might have been something caused by the latest firmware
updates, but the issue persists.

  I am running the KDE Plasma 6 desktop with wayland, but have been
doing so for a couple of months now without issue.

Mathias

-- Package-specific info:
** Version:
Linux version 6.12.8-amd64 (debian-kernel@lists.debian.org) 
(x86_64-linux-gnu-gcc-14 (Debian 14.2.0-12) 14.2.0, GNU ld (GNU Binutils for 
Debian) 2.43.50.20241230) #1 SMP PREEMPT_DYNAMIC Debian 6.12.8-1 (2025-01-02)

** Command line:
BOOT_IMAGE=/vmlinuz-6.12.8-amd64 root=/dev/mapper/olorin--vg-root ro quiet 
nosplash

** Tainted: PO (4097)
 * proprietary module was loaded
 * externally-built ("out-of-tree") module was loaded

** Kernel log:
See attached files, 6.11.10-good and 6.12.8-bad.

** Model information
sys_vendor: ASUS
product_name: System Product Name
product_version: System Version
chassis_vendor: Default string
chassis_version: Default string
bios_vendor: American Megatrends Inc.
bios_version: 0803
board_vendor: ASUSTeK COMPUTER INC.
board_name: Pro WS WRX90E-SAGE SE
board_version: Rev 1.xx

** Loaded modules:
cfg80211
veth
snd_seq_dummy
snd_hrtimer
snd_seq
snd_seq_device
nft_masq
nft_ct
nft_reject_ipv4
nf_reject_ipv4
nft_reject
act_csum
cls_u32
sch_htb
nft_chain_nat
wireguard
nf_nat
libchacha20poly1305
nf_conntrack
chacha_x86_64
nf_defrag_ipv6
poly1305_x86_64
nf_defrag_ipv4
curve25519_x86_64
libcurve25519_generic
libchacha
ip6_udp_tunnel
nf_tables
udp_tunnel
libcrc32c
bridge
stp
llc
vhost_vsock
vmw_vsock_virtio_transport_common
vhost
vhost_iotlb
qrtr
vsock
sunrpc
binfmt_misc
nls_ascii
nls_cp437
vfat
fat
snd_hda_codec_realtek
snd_hda_codec_generic
eeepc_wmi
snd_hda_scodec_component
asus_wmi
snd_hda_codec_hdmi
amd_atl
intel_rapl_msr
sparse_keymap
snd_hda_intel
intel_rapl_common
platform_profile
snd_intel_dspcfg
amd64_edac
battery
snd_intel_sdw_acpi
edac_mce_amd
snd_hda_codec
snd_hda_core
kvm_amd
snd_hwdep
zfs(PO)
snd_pcm
kvm
snd_timer
spd5118
ucsi_acpi
snd
sp5100_tco
typec_ucsi
rfkill
rapl
wmi_bmof
pcspkr
soundcore
ccp
k10temp
watchdog
typec
joydev
spl(O)
roles
sg
evdev
msr
parport_pc
ppdev
lp
parport
efi_pstore
configfs
nfnetlink
efivarfs
ip_tables
x_tables
autofs4
ext4
mbcache
jbd2
crc32c_generic
dm_crypt
dm_mod
amdgpu
amdxcp
drm_exec
gpu_sched
drm_buddy
video
i2c_algo_bit
drm_suballoc_helper
drm_display_helper
sr_mod
cec
cdrom
sd_mod
crct10dif_pclmul
rc_core
crc32_pclmul
hid_generic
usbhid
hid
drm_ttm_helper
crc32c_intel
ttm
ahci
ghash_clmulni_intel
xhci_pci
libahci
drm_kms_helper
sha512_ssse3
nvme
xhci_hcd
libata
sha256_ssse3
drm
nvme_core
i40e
usbcore
i2c_piix4
scsi_mod
gpio_amdpt
thunderbolt
sha1_ssse3
crc16
nvme_auth
libie
usb_common
i2c_smbus
scsi_common
wmi
gpio_generic
button
aesni_intel
gf128mul
crypto_simd
cryptd

** PCI devices:
e3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev c8) (p

Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-12 Thread Mathias Gibbens
  Looks like the fix was included in the 6.15-rc6 kernel:
https://lkml.org/lkml/2025/5/11/398. So just need to wait for the 6.15
stable release and then inclusion into 6.12.y.

Mathias


signature.asc
Description: This is a digitally signed message part


Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-03 Thread Salvatore Bonaccorso
Hi Mathias,

On Sat, May 03, 2025 at 05:14:15PM +, Mathias Gibbens wrote:
> control: tags -1 - moreinfo
> 
>   I have verified that the patch "0004-drm-amdgpu-hdp6-use-memcfg-
> register-to-post-the-writ.patch" posted at
> https://gitlab.freedesktop.org/drm/amd/-/issues/4119 fixes the issue
> for me with the 6.12.25 kernel.

Thanks for confirming and as well updating upstream. So the next step
is watching that it enters mainline and get backported to the 6.12.y
series.

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-03 Thread Mathias Gibbens
control: tags -1 - moreinfo

  I have verified that the patch "0004-drm-amdgpu-hdp6-use-memcfg-
register-to-post-the-writ.patch" posted at
https://gitlab.freedesktop.org/drm/amd/-/issues/4119 fixes the issue
for me with the 6.12.25 kernel.

Mathias


signature.asc
Description: This is a digitally signed message part


Processed: Re: Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-03 Thread Debian Bug Tracking System
Processing control commands:

> tags -1 - moreinfo
Bug #1092624 [src:linux] linux-image-6.12.8-amd64: Numerous amdgpu correctable 
PCIe bus errors, not present in 6.10 or 6.11 series kernels
Bug #1093217 [src:linux] linux-image-6.12.9-amd64: dmesg is spammed full of 
"AER: Correctable error message received from :e3:00.0"
Removed tag(s) moreinfo.
Removed tag(s) moreinfo.

-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
1093217: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093217
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-01 Thread Salvatore Bonaccorso
Hi Mathias,

On Thu, May 01, 2025 at 12:30:32PM +, Mathias Gibbens wrote:
> On Thu, 2025-05-01 at 13:15 +0200, Salvatore Bonaccorso wrote:
> > There is some progress in the upstream issues,
> > https://gitlab.freedesktop.org/drm/amd/-/issues/3908 and saying we
> > should look at the commits for
> > https://gitlab.freedesktop.org/drm/amd/-/issues/4119 . Do you have
> > resources to test a build with those patches to see if they fix the
> > issue? If so that would be great.
> 
>   Yes, I saw that update as well. I should have time to test it this
> weekend and will report back the results.

Thanks a lot! Please do remove as well the moreinfo tag once you have
the information, so it appears again on our radar.

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-01 Thread Mathias Gibbens
On Thu, 2025-05-01 at 13:15 +0200, Salvatore Bonaccorso wrote:
> There is some progress in the upstream issues,
> https://gitlab.freedesktop.org/drm/amd/-/issues/3908 and saying we
> should look at the commits for
> https://gitlab.freedesktop.org/drm/amd/-/issues/4119 . Do you have
> resources to test a build with those patches to see if they fix the
> issue? If so that would be great.

  Yes, I saw that update as well. I should have time to test it this
weekend and will report back the results.

Mathias


signature.asc
Description: This is a digitally signed message part


Processed: Re: Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-01 Thread Debian Bug Tracking System
Processing control commands:

> tags -1 + moreinfo
Bug #1092624 [src:linux] linux-image-6.12.8-amd64: Numerous amdgpu correctable 
PCIe bus errors, not present in 6.10 or 6.11 series kernels
Bug #1093217 [src:linux] linux-image-6.12.9-amd64: dmesg is spammed full of 
"AER: Correctable error message received from :e3:00.0"
Ignoring request to alter tags of bug #1092624 to the same tags previously set
Ignoring request to alter tags of bug #1093217 to the same tags previously set

-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
1093217: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093217
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Processed: Re: Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-01 Thread Debian Bug Tracking System
Processing control commands:

> tags -1 + moreinfo
Bug #1092624 [src:linux] linux-image-6.12.8-amd64: Numerous amdgpu correctable 
PCIe bus errors, not present in 6.10 or 6.11 series kernels
Bug #1093217 [src:linux] linux-image-6.12.9-amd64: dmesg is spammed full of 
"AER: Correctable error message received from :e3:00.0"
Added tag(s) moreinfo.
Added tag(s) moreinfo.

-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
1093217: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093217
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-05-01 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hi Matthias,

There is some progress in the upstream issues,
https://gitlab.freedesktop.org/drm/amd/-/issues/3908 and saying we
should look at the commits for
https://gitlab.freedesktop.org/drm/amd/-/issues/4119 . Do you have
resources to test a build with those patches to see if they fix the
issue? If so that would be great.

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-02-09 Thread Salvatore Bonaccorso
Mathias has identified the breaking commit:
https://gitlab.freedesktop.org/drm/amd/-/issues/3908#note_2771164

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Salvatore Bonaccorso
Hi Mathias,

On Sun, Jan 19, 2025 at 07:51:44PM +, Mathias Gibbens wrote:
>   Ah, also just noticed #1093217 reporting the same issue. Looks like
> we've got the same motherboard but different GPUs.
> 
>   (I haven't tried to merge the two bugs, but I think it would make
> sense to do so.)

Thanks for spotting this! (can easily be oveseen by the amount of bugs
in src:linux). I just have merged them.

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Mathias Gibbens
  Ah, also just noticed #1093217 reporting the same issue. Looks like
we've got the same motherboard but different GPUs.

  (I haven't tried to merge the two bugs, but I think it would make
sense to do so.)

Mathias


signature.asc
Description: This is a digitally signed message part


Bug#1092624:

2025-01-19 Thread Mathias Gibbens
Control: found -1 linux/6.12.10-1


signature.asc
Description: This is a digitally signed message part


Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Mathias Gibbens
Control: found -1 src:linux/6.12.10-1
Control: forwarded -1 https://gitlab.freedesktop.org/drm/amd/-/issues/3908

  Confirmed that the issue is still present with the 6.12.10 kernel,
and created a report in the upstream gitlab instance.

Mathias


signature.asc
Description: This is a digitally signed message part


Processed (with 1 error): Re: Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Debian Bug Tracking System
Processing control commands:

> found -1 src:linux/6.12.10-1
Unknown command or malformed arguments to command.

> forwarded -1 https://gitlab.freedesktop.org/drm/amd/-/issues/3908
Bug #1092624 [src:linux] linux-image-6.12.8-amd64: Numerous amdgpu correctable 
PCIe bus errors, not present in 6.10 or 6.11 series kernels
Set Bug forwarded-to-address to 
'https://gitlab.freedesktop.org/drm/amd/-/issues/3908'.

-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Processed: Re: Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Debian Bug Tracking System
Processing control commands:

> tags -1 + moreinfo
Bug #1092624 [src:linux] linux-image-6.12.8-amd64: Numerous amdgpu correctable 
PCIe bus errors, not present in 6.10 or 6.11 series kernels
Added tag(s) moreinfo.

-- 
1092624: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1092624
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-19 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hi Mathias,

On Fri, Jan 10, 2025 at 01:17:37AM +, Mathias Gibbens wrote:
> Package: src:linux
> Version: 6.12.8-1
> Severity: normal
> X-Debbugs-Cc: gib...@debian.org
> 
>   I have been running testing on this machine for the past four months,
> and the AMD GPU has worked just fine with the 6.10 and 6.11 series
> kernels. However, a couple of weeks ago when 6.12.5 migrated to
> testing, I began to notice numerous kernel errors:
> 
> > [ 2269.495240] amdgpu :e3:00.0: PCIe Bus Error: severity=Correctable, 
> > type=Transaction Layer, (Receiver ID)
> > [ 2269.495250] amdgpu :e3:00.0:   device [1002:744c] error 
> > status/mask=2000/
> > [ 2269.495256] amdgpu :e3:00.0:[13] NonFatalErr   
> > [ 2269.495275] pcieport :e0:01.1: AER: Correctable error message 
> > received from :e3:00.0
> 
>   While the GPU still works, I have also observed over the past couple
> of weeks that sometimes when I turn on my monitor after a period of
> inactivity (such as overnight) the GPU fails to "wake up" and nothing
> is displayed. The system is still responsive (if I switch to a virtual
> terminal I can CTRL+ALT+DEL to reboot the system), but there's no
> graphical output.
> 
>   I did try downgrading firmware-amd-graphics to 20240909-2 just in
> case there might have been something caused by the latest firmware
> updates, but the issue persists.
> 
>   I am running the KDE Plasma 6 desktop with wayland, but have been
> doing so for a couple of months now without issue.

Assuming you still can reproduce the issue as well with the most
recent iteration in unstable for 6.12.10, can you please report the
issue to upstream at https://gitlab.freedesktop.org/drm/amd (and
report back the the upstream issue here, or set the forwarded
yourself).

Thanks already,

Regards,
Salvatore



Bug#1092624: linux-image-6.12.8-amd64: Numerous amdgpu correctable PCIe bus errors, not present in 6.10 or 6.11 series kernels

2025-01-09 Thread Mathias Gibbens
Package: src:linux
Version: 6.12.8-1
Severity: normal
X-Debbugs-Cc: gib...@debian.org

  I have been running testing on this machine for the past four months,
and the AMD GPU has worked just fine with the 6.10 and 6.11 series
kernels. However, a couple of weeks ago when 6.12.5 migrated to
testing, I began to notice numerous kernel errors:

> [ 2269.495240] amdgpu :e3:00.0: PCIe Bus Error: severity=Correctable, 
> type=Transaction Layer, (Receiver ID)
> [ 2269.495250] amdgpu :e3:00.0:   device [1002:744c] error 
> status/mask=2000/
> [ 2269.495256] amdgpu :e3:00.0:[13] NonFatalErr   
> [ 2269.495275] pcieport :e0:01.1: AER: Correctable error message received 
> from :e3:00.0

  While the GPU still works, I have also observed over the past couple
of weeks that sometimes when I turn on my monitor after a period of
inactivity (such as overnight) the GPU fails to "wake up" and nothing
is displayed. The system is still responsive (if I switch to a virtual
terminal I can CTRL+ALT+DEL to reboot the system), but there's no
graphical output.

  I did try downgrading firmware-amd-graphics to 20240909-2 just in
case there might have been something caused by the latest firmware
updates, but the issue persists.

  I am running the KDE Plasma 6 desktop with wayland, but have been
doing so for a couple of months now without issue.

Mathias

-- Package-specific info:
** Version:
Linux version 6.12.8-amd64 (debian-kernel@lists.debian.org) 
(x86_64-linux-gnu-gcc-14 (Debian 14.2.0-12) 14.2.0, GNU ld (GNU Binutils for 
Debian) 2.43.50.20241230) #1 SMP PREEMPT_DYNAMIC Debian 6.12.8-1 (2025-01-02)

** Command line:
BOOT_IMAGE=/vmlinuz-6.12.8-amd64 root=/dev/mapper/olorin--vg-root ro quiet 
nosplash

** Tainted: PO (4097)
 * proprietary module was loaded
 * externally-built ("out-of-tree") module was loaded

** Kernel log:
See attached files, 6.11.10-good and 6.12.8-bad.

** Model information
sys_vendor: ASUS
product_name: System Product Name
product_version: System Version
chassis_vendor: Default string
chassis_version: Default string
bios_vendor: American Megatrends Inc.
bios_version: 0803
board_vendor: ASUSTeK COMPUTER INC.
board_name: Pro WS WRX90E-SAGE SE
board_version: Rev 1.xx

** Loaded modules:
cfg80211
veth
snd_seq_dummy
snd_hrtimer
snd_seq
snd_seq_device
nft_masq
nft_ct
nft_reject_ipv4
nf_reject_ipv4
nft_reject
act_csum
cls_u32
sch_htb
nft_chain_nat
wireguard
nf_nat
libchacha20poly1305
nf_conntrack
chacha_x86_64
nf_defrag_ipv6
poly1305_x86_64
nf_defrag_ipv4
curve25519_x86_64
libcurve25519_generic
libchacha
ip6_udp_tunnel
nf_tables
udp_tunnel
libcrc32c
bridge
stp
llc
vhost_vsock
vmw_vsock_virtio_transport_common
vhost
vhost_iotlb
qrtr
vsock
sunrpc
binfmt_misc
nls_ascii
nls_cp437
vfat
fat
snd_hda_codec_realtek
snd_hda_codec_generic
eeepc_wmi
snd_hda_scodec_component
asus_wmi
snd_hda_codec_hdmi
amd_atl
intel_rapl_msr
sparse_keymap
snd_hda_intel
intel_rapl_common
platform_profile
snd_intel_dspcfg
amd64_edac
battery
snd_intel_sdw_acpi
edac_mce_amd
snd_hda_codec
snd_hda_core
kvm_amd
snd_hwdep
zfs(PO)
snd_pcm
kvm
snd_timer
spd5118
ucsi_acpi
snd
sp5100_tco
typec_ucsi
rfkill
rapl
wmi_bmof
pcspkr
soundcore
ccp
k10temp
watchdog
typec
joydev
spl(O)
roles
sg
evdev
msr
parport_pc
ppdev
lp
parport
efi_pstore
configfs
nfnetlink
efivarfs
ip_tables
x_tables
autofs4
ext4
mbcache
jbd2
crc32c_generic
dm_crypt
dm_mod
amdgpu
amdxcp
drm_exec
gpu_sched
drm_buddy
video
i2c_algo_bit
drm_suballoc_helper
drm_display_helper
sr_mod
cec
cdrom
sd_mod
crct10dif_pclmul
rc_core
crc32_pclmul
hid_generic
usbhid
hid
drm_ttm_helper
crc32c_intel
ttm
ahci
ghash_clmulni_intel
xhci_pci
libahci
drm_kms_helper
sha512_ssse3
nvme
xhci_hcd
libata
sha256_ssse3
drm
nvme_core
i40e
usbcore
i2c_piix4
scsi_mod
gpio_amdpt
thunderbolt
sha1_ssse3
crc16
nvme_auth
libie
usb_common
i2c_smbus
scsi_common
wmi
gpio_generic
button
aesni_intel
gf128mul
crypto_simd
cryptd

** PCI devices:
e3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev c8) (prog-if 00 [VGA 
controller])
Subsystem: Sapphire Technology Limited NITRO+ RX 7900 XTX Vapor-X
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 
unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxRe