[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-15 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.19.0-45.46

---
linux (5.19.0-45.46) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-45.46 -proposed tracker (LP: #2023057)

  * Kinetic update: upstream stable patchset 2023-05-23 (LP: #2020599)
- wifi: cfg80211: Partial revert "wifi: cfg80211: Fix use after free for 
wext"

linux (5.19.0-44.45) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-44.45 -proposed tracker (LP: #2019827)

  * Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1
(LP: #2018470)
- drm/amdgpu: Fix for BO move issue

  * CVE-2023-32233
- netfilter: nf_tables: deactivate anonymous set from preparation phase

  * CVE-2023-2612
- SAUCE: shiftfs: prevent lock unbalance in shiftfs_create_object()

  * CVE-2023-31436
- net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg

  * CVE-2023-1380
- wifi: brcmfmac: slab-out-of-bounds read in brcmf_get_assoc_ies()

  * conntrack mark is not advertised via netlink (LP: #2016269)
- netfilter: ctnetlink: revert to dumping mark regardless of event type

  * 5.19 not reporting cgroups v1 blkio.throttle.io_serviced  (LP: #2016186)
- SAUCE: blk-throttle: Fix io statistics for cgroup v1

  * [SRU] Backport request for hpwdt from upstream 6.1 to Jammy (LP: #2008751)
- watchdog/hpwdt: Enable HP_WATCHDOG for ARM64 systems.
- watchdog/hpwdt: Include nmi.h only if CONFIG_HPWDT_NMI_DECODING
- [Config] Add arm64 option to CONFIG_HP_WATCHDOG

  * vmwgfx fails to reserve graphics buffer on aarch64 leading to blank display
(LP: #2007001)
- SAUCE: Revert "video/aperture: Disable and unregister sysfb devices via
  aperture helpers"

  * Ubuntu 22.04 raise abnormal NIC MSI-X requests with larger CPU cores (256)
(LP: #2012335)
- ice: Allow operation with reduced device MSI-X

  * Dell: Enable speaker mute hotkey LED indicator (LP: #2015972)
- platform/x86: dell-laptop: Register ctl-led for speaker-mute

  * [SRU]With "Performance per Watt (DAPC)" enabled in the BIOS, Bootup time is
taking longer than expected (LP: #2008527)
- cpufreq: ACPI: Defer setting boost MSRs

  * [SRU][Jammy] CONFIG_PCI_MESON is not enabled (LP: #2007745)
- [Config] arm64: Enable PCI_MESON module

  * Kinetic update: upstream stable patchset 2023-05-08 (LP: #2018948)
- HID: asus: use spinlock to protect concurrent accesses
- HID: asus: use spinlock to safely schedule workers
- powerpc/mm: Rearrange if-else block to avoid clang warning
- ARM: OMAP2+: Fix memory leak in realtime_counter_init()
- arm64: dts: qcom: qcs404: use symbol names for PCIe resets
- arm64: dts: qcom: msm8996-tone: Fix USB taking 6 minutes to wake up
- arm64: dts: qcom: sm8150-kumano: Panel framebuffer is 2.5k instead of 4k
- arm64: dts: qcom: sm6125: Reorder HSUSB PHY clocks to match bindings
- arm64: dts: imx8m: Align SoC unique ID node unit address
- ARM: zynq: Fix refcount leak in zynq_early_slcr_init
- arm64: dts: mediatek: mt8183: Fix systimer 13 MHz clock description
- arm64: dts: qcom: sdm845-db845c: fix audio codec interrupt pin name
- arm64: dts: qcom: sc7180: correct SPMI bus address cells
- arm64: dts: qcom: sc7280: correct SPMI bus address cells
- arm64: dts: meson-gx: Fix Ethernet MAC address unit name
- arm64: dts: meson-g12a: Fix internal Ethernet PHY unit name
- arm64: dts: meson-gx: Fix the SCPI DVFS node name and unit address
- arm64: dts: msm8992-bullhead: add memory hole region
- arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem size
- arm64: dts: qcom: msm8992-bullhead: Disable dfps_data_mem
- arm64: dts: qcom: ipq8074: correct USB3 QMP PHY-s clock output names
- arm64: dts: qcom: ipq8074: fix Gen3 PCIe QMP PHY
- arm64: dts: qcom: ipq8074: correct Gen2 PCIe ranges
- arm64: dts: qcom: ipq8074: fix Gen3 PCIe node
- arm64: dts: qcom: ipq8074: correct PCIe QMP PHY output clock names
- arm64: dts: meson: remove CPU opps below 1GHz for G12A boards
- ARM: OMAP1: call platform_device_put() in error case in
  omap1_dm_timer_init()
- ARM: bcm2835_defconfig: Enable the framebuffer
- ARM: s3c: fix s3c64xx_set_timer_source prototype
- arm64: dts: ti: k3-j7200: Fix wakeup pinmux range
- ARM: dts: exynos: correct wr-active property in Exynos3250 Rinato
- ARM: imx: Call ida_simple_remove() for ida_simple_get
- arm64: dts: amlogic: meson-gx: fix SCPI clock dvfs node name
- arm64: dts: amlogic: meson-axg: fix SCPI clock dvfs node name
- arm64: dts: amlogic: meson-gx: add missing SCPI sensors compatible
- arm64: dts: amlogic: meson-gxl-s905d-sml5442tw: drop invalid clock-names
  property
- arm64: dts: amlogic: meson-gx: add missing unit address to rng node name
- arm64: dts: amlogic: meson-gxl: add missing unit address to eth-phy-mux 
node
  name
- arm64: dts: amlogic: meson-gx-libretech-pc: fix update button name
- arm64: dts: 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-13 Thread Thomas Debesse
I tested linux-nvidia 5.19.0-1014.14 from jammy-proposed on my kinetic
install. Both GCN1 and GCN2 displays work. I get the ASAN debug message
but everything works.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-13 Thread Thomas Debesse
** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-12 Thread Thomas Debesse
Will the patch be dropped if the Nvidia kernel is not tested on AMD
GPUs?

Rebooting the computer affected by the bug will cost me 1 or 2 hours of
work so that's a very high cost for a patch I already marked as
verified.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-10 Thread Thomas Debesse
Should we really test that Nvidia kernel for a bug affecting AMD GPUs,
otherwise the already verified fix would be dropped?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-06 Thread Elmo Ramos
Will this fix be backported to the Ubuntu 22.04 kernel? I can't use the
amdgpu driver with my r7 260X, I had to update to kernel 6.3 to use the
driver while the bug is fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-06 Thread Thomas Debesse
Why should we test an Nvidia kernel for a bug affecting AMD GPUs?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 is the first 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-06-06 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-
nvidia-5.19/5.19.0-1014.14 kernel in -proposed solves the problem.
Please test the kernel and update this bug with the results. If the
problem is solved, change the tag 'verification-needed-jammy' to
'verification-done-jammy'. If the problem still exists, change the tag
'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-jammy-linux-nvidia-5.19 verification-needed-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-22 Thread Mario Limonciello
> - I still have an error message in dmesg:

In this issue you actually discovered two independent bugs.
The first was the regression, the second was the UBSAN issue.
* The first fix is what you tested.
* The second fix wasn't picked up yet.

This is the commit that is now landed upstream for the second one: 
https://github.com/torvalds/linux/commit/58d9b9a14b47c2a3da6effcbb01607ad7edc0275

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-22 Thread Thomas Debesse
** Tags removed: verification-needed-kinetic
** Tags added: verification-done-kinetic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-21 Thread Thomas Debesse
I now see the original message was edited, with this words added:

> [Test case]
> Install the update, check that display works again on amdgpu

I confirm display works again on amdgpu

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-21 Thread Thomas Debesse
I'm on 5.19.0-44.45 right now.

What 5.19.0-44.45 is expected to fix?

- The computer boots properly, I have both R7 240 and R9 390X displaying
something fine, so that error is fixed.

- I still have an error message in dmesg:

```
[7.609329] 

[7.610224] UBSAN: invalid-load in 
/build/linux-le9C0y/linux-5.19.0/drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm.c:1363:37
[7.611125] load of value 232 is not a valid value for type '_Bool'
[7.612025] CPU: 14 PID: 400 Comm: systemd-udevd Not tainted 
5.19.0-44-generic #45-Ubuntu
[7.612928] Hardware name: Default string Default string/Default string, 
BIOS WRX80PRO-F1 08/04/2022
[7.613836] Call Trace:
[7.614736]  
[7.615633]  show_stack+0x4e/0x61
[7.616531]  dump_stack_lvl+0x4a/0x6f
[7.617429]  dump_stack+0x10/0x18
[7.618333]  ubsan_epilogue+0x9/0x3a
[7.619231]  __ubsan_handle_load_invalid_value.cold+0x42/0x47
[7.620124]  amdgpu_dpm_is_overdrive_supported.cold+0x12/0x45 [amdgpu]
[7.621402]  default_attr_update+0x332/0x500 [amdgpu]
[7.622641]  amdgpu_pm_sysfs_init+0x16f/0x1e0 [amdgpu]
[7.623871]  amdgpu_device_init.cold+0x3b7/0x80a [amdgpu]
[7.625107]  amdgpu_driver_load_kms+0x1c/0x170 [amdgpu]
[7.626277]  amdgpu_pci_probe+0x15f/0x3c0 [amdgpu]
[7.627419]  local_pci_probe+0x47/0x90
[7.628270]  pci_call_probe+0x55/0x190
[7.629107]  pci_device_probe+0x84/0x120
[7.629934]  really_probe+0x1df/0x3b0
[7.630767]  __driver_probe_device+0x12c/0x1b0
[7.631596]  driver_probe_device+0x24/0xd0
[7.632426]  __driver_attach+0x10b/0x210
[7.633255]  ? __device_attach_driver+0x170/0x170
[7.634087]  bus_for_each_dev+0x90/0xe0
[7.634917]  driver_attach+0x1e/0x30
[7.635740]  bus_add_driver+0x187/0x230
[7.636562]  driver_register+0x8f/0x100
[7.637379]  __pci_register_driver+0x62/0x70
[7.638203]  amdgpu_init+0x6a/0x1000 [amdgpu]
[7.639307]  ? 0xc05c
[7.640118]  do_one_initcall+0x5e/0x240
[7.640929]  do_init_module+0x50/0x210
[7.641736]  load_module+0xb7d/0xcd0
[7.642532]  __do_sys_finit_module+0xc4/0x140
[7.643316]  ? __do_sys_finit_module+0xc4/0x140
[7.644100]  __x64_sys_finit_module+0x18/0x30
[7.644879]  do_syscall_64+0x5b/0x90
[7.645661]  ? ksys_mmap_pgoff+0x11d/0x260
[7.646444]  ? exit_to_user_mode_prepare+0x30/0xb0
[7.647231]  ? syscall_exit_to_user_mode+0x29/0x50
[7.648016]  ? do_syscall_64+0x67/0x90
[7.648796]  ? do_syscall_64+0x67/0x90
[7.649566]  ? syscall_exit_to_user_mode+0x29/0x50
[7.650347]  ? do_syscall_64+0x67/0x90
[7.651120]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[7.651888] RIP: 0033:0x7f790b1eec4d
[7.652645] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 83 f1 0d 00 f7 d8 64 89 01 48
[7.653432] RSP: 002b:7fffa9b1d908 EFLAGS: 0246 ORIG_RAX: 
0139
[7.654222] RAX: ffda RBX: 556a6508e430 RCX: 7f790b1eec4d
[7.655015] RDX:  RSI: 556a650933b0 RDI: 0015
[7.655815] RBP: 556a650933b0 R08:  R09: 7f790b2cec60
[7.656610] R10: 0015 R11: 0246 R12: 0002
[7.657414] R13: 556a650795d0 R14:  R15: 556a65079030
[7.658216]  
[7.659037] 

```

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-17 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux/5.19.0-44.45 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-kinetic' to 'verification-done-kinetic'. If the
problem still exists, change the tag 'verification-needed-kinetic' to
'verification-failed-kinetic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-kinetic-linux verification-needed-kinetic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-16 Thread Stefan Bader
** Changed in: linux (Ubuntu Kinetic)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Kinetic)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam 
  Date:   Tue Oct 18 07:08:38 2022 -0700

  drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-11 Thread Timo Aaltonen
I guess this bug should concentrate on the regression.

** Description changed:

+ [Impact]
+ A regression caused by incomplete stable backports
+ 
+ [Fix]
+ 
+ commit 8273b4048664fff356fd10059033f0e2f5a422a1
+ Author: Arunpravin Paneer Selvam 
+ Date:   Tue Oct 18 07:08:38 2022 -0700
+ 
+ drm/amdgpu: Fix for BO move issue
+ 
+ [Test case]
+ 
+ Install the update, check that display works again on amdgpu
+ 
+ --
+ 
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.
  
  Here are the versions I tested:
  
  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42
  
  In that list, only Linux 5.19.0-23 is working with that computer.
  
  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.
  
  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).
  
  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue that
  makes the computer unusable: some CPU got locked, and some btrfs process
  runs at 100% CPU, syncing never ends, even preventing to reboot. This
  bug is less important because I don't reproduce it on version 5.19.0-42,
  so if 5.19.0-42 fixes the graphic all will be fine.
  
  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of that
  fear.
  
  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and also
  upgraded the BIOS…, and since that ThreadRipper PRO computer has very
  slow booting BIOS, trying various configurations or software versions
  that requires a reboot quickly eats-up whole hours.
  
  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.
  
  Here are some details on the hardware:
  
  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41
  
  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.
  
  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.
  
  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The ASPEED
  card is a very basic integrated card without hardware acceleration and
  featuring only one VGA output so that's unusable. As an additional
  information I know X11 never start on the ASPEED if there are discrete
  cards plugged in (tested last year).
  
  So right now that computer is sticking on Linux 5.19.0-23 which doesn't
  doesn't the graphic and btrfs bugs.
  
  The last kernel to not feature the graphic bug is Linux 5.19.0-29. Linux
  5.19.0-31 is the first one reproducing the graphic bug (the repository
  doesn't provide 5.19.0-30 for me to test).
  
  I also have reproduced the graphic bug when using the radeon driver
  instead of the amdgpu one.
  
  ProblemType: Bug
  DistroRelease: Ubuntu 22.10
  Package: linux-image-generic 5.19.0.42.38
  ProcVersionSignature: Ubuntu 5.19.0-23.24-generic 5.19.7
  Uname: 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-11 Thread Mario Limonciello
OK here are the fixes for this identified upstream:

Mantic (6.3+):
 * UBSAN issue fixed by 
https://gitlab.freedesktop.org/agd5f/linux/-/commit/3a5fb036af0a18436209fbb16e331edd26a07b3d

Kinetic (5.19):
 * UBSAN issue fixed by 
https://gitlab.freedesktop.org/agd5f/linux/-/commit/3a5fb036af0a18436209fbb16e331edd26a07b3d
 * The hang issue was caused by Canonical backporting 
https://github.com/torvalds/linux/commit/312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
 into 5.19 kernel but not taking the fix 
https://github.com/torvalds/linux/commit/8273b4048664fff356fd10059033f0e2f5a422a1

Canonical team please review these.

** Changed in: linux (Ubuntu Kinetic)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Confirmed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-05 Thread Mario Limonciello
** Changed in: linux (Ubuntu Kinetic)
   Status: Invalid => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Incomplete
Status in linux source package in Mantic:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 is the first one reproducing the graphic bug (the
  repository doesn't provide 5.19.0-30 for me to test).

  I also have reproduced the graphic bug when using the radeon driver
  instead of the amdgpu one.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.10
  Package: linux-image-generic 5.19.0.42.38
  ProcVersionSignature: Ubuntu 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-05 Thread Mario Limonciello
> Tainted: G

In the two upstream bugs it was noted that there was an amdgpu dkms
package in place.  I believe that's where this issue likely was.  Commit
63a9ab264a8c came in 6.3-rc1 and the commit it fixes was also in 6.3-rc1
(b1a9557a7d00).

So at least one of the issues is probably invalid in Ubuntu's 5.19, but
there are valid upstream bugs, including in 6.3 as there is still
another patch to test for one of the problems.

I'll adjust the tasks accordingly, as I think this should still be
tracked to fix in mantic.

** Also affects: linux (Ubuntu Mantic)
   Importance: Undecided
   Status: Confirmed

** Also affects: linux (Ubuntu Kinetic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Kinetic)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Invalid
Status in linux source package in Mantic:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-05 Thread Ubuntu Foundations Team Bug Bot
** Tags added: patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 is the first one reproducing the graphic bug (the
  repository doesn't provide 5.19.0-30 for me to test).

  I also have reproduced the graphic bug when using the radeon driver
  instead of the amdgpu one.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.10
  Package: linux-image-generic 5.19.0.42.38
  ProcVersionSignature: Ubuntu 5.19.0-23.24-generic 5.19.7
  Uname: Linux 5.19.0-23-generic x86_64
  ApportVersion: 2.23.1-0ubuntu3.3
  Architecture: amd64
  CasperMD5CheckResult: unknown
  

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-04 Thread Thomas Debesse
This is the patch by Alex Deucher that is believed to fix the NULL
pointer dereference. I have not tested it but the issue looks very
close. It is needed anyway.

> Guchun Chen
> Regarding the NULL pointer access, it should be duplicated of #2388. And the 
> fix is "63a9ab264a8c drm/amd/pm/smu7: move variables to where they are used" .

** Patch added: 
"0001-drm-amd-pm-smu7-move-variables-to-where-they-are-use.patch"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2018470/+attachment/5671131/+files/0001-drm-amd-pm-smu7-move-variables-to-where-they-are-use.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-04 Thread Thomas Debesse
I've reported the issues upstream on drm side:

- https://gitlab.freedesktop.org/drm/amd/-/issues/2540
  Linux 5.19 amdgpu: NULL pointer on GCN2 (R9 390X Hawaii/Grenada)

- https://gitlab.freedesktop.org/drm/amd/-/issues/2541
  Linux 5.19 amdgpu: invalid load on GCN1 (R7 240 Oland)

For the NULL pointer dereference, it looks like there is a patch there:

- https://gitlab.freedesktop.org/drm/amd/-/issues/2388
  
https://gitlab.freedesktop.org/drm/amd/uploads/a004996ac0c868dfb032af3c35f7b2c6/0001-drm-amd-pm-smu7-move-variables-to-where-they-are-use.patch

I have not tested the patch myself, but the bug this patch fixes looks
very similar.

** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #2540
   https://gitlab.freedesktop.org/drm/amd/-/issues/2540

** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #2541
   https://gitlab.freedesktop.org/drm/amd/-/issues/2541

** Bug watch added: gitlab.freedesktop.org/drm/amd/-/issues #2388
   https://gitlab.freedesktop.org/drm/amd/-/issues/2388

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display 

[Kernel-packages] [Bug 2018470] Re: Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

2023-05-04 Thread Thomas Debesse
** Summary changed:

- Linux 5.19 amdgpu: NULL pointer on GCN1 and invalid load on GCN2
+ Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 is the first one reproducing the graphic bug (the
  repository doesn't provide 5.19.0-30 for me to test).

  I also have reproduced the graphic bug when using the radeon driver
  instead of the amdgpu one.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.10
  Package: linux-image-generic 5.19.0.42.38
  ProcVersionSignature: Ubuntu 5.19.0-23.24-generic 5.19.7
  Uname: