Public bug reported:

Summary
-------
The Lenovo ThinkPad P1 Gen 8 with NVIDIA RTX PRO 2000 (GB206GLM, 10de:2d39) is 
Ubuntu-certified (certificate 202510-37972, certified for 24.04 LTS + OEM 
kernel 6.14.0-1015-oem). On this same hardware running Ubuntu 26.04 / kernel 
7.0.0-22-generic, the discrete GPU's GSP firmware crashes reproducibly 4-18 
minutes after boot, leading to a continuous high-power state and hard power-off 
on battery.

Scope disclosure
----------------
I am running 26.04 + kernel 7.0, i.e. OUTSIDE the certified 24.04/6.14 
combination. Reporting anyway because (a) it is the same certified hardware + 
GPU, (b) the root cause is an NVIDIA GSP-firmware regression that spans driver 
branches and is tracked upstream, and (c) Canonical/Lenovo OEM should be aware 
for the 26.04 cycle and for any BIOS/VBIOS mitigation.

Hardware
--------
ThinkPad P1 Gen 8, machine type 21Q9, BIOS N4EET22W (1.08).
GPU: NVIDIA RTX PRO 2000 Blackwell Laptop (GB206GLM, 10de:2d39). iGPU Intel 
Arrow Lake-P drives the display.

Symptom / kernel signature
--------------------------
Xid 62 or Xid 120 "GSP task exception" (RISC-V cause varies between boots) -> 
Xid 154 "GPU recovery action ... GPU Reset Required" -> endless 
"RmCheckForGcxSupportOnCurrentState: Failed to get GCx pre-requisite, 
status=0x62" (every 5 s, up to 16879x/boot). The GPU never returns to a 
low-power GCx state, draws power continuously, and on battery the laptop 
hard-powers-off. nvidia-smi then reports "GPU requires reset" until reboot.

Both NVIDIA driver branches affected (6 crash boots): nvidia-
driver-595-open 595.71.05 AND nvidia-driver-580-open 580.159.03. Clean
only when the nvidia module is not loaded (prime-select intel).

Example (580.159.03):
  NVRM: Xid (PCI:0000:01:00): 120, GSP kernel exception: load access page fault 
(cause:0xd) @ pc:0xffffffff93008780, partition:4#0
  NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) 
to 0x1 (GPU Reset Required)
  NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx 
pre-requisite, status=0x62

Upstream / NVIDIA
-----------------
Tracked in NVIDIA open-gpu-kernel-modules issue #1045 (NVIDIA internal bug 
5953411). My data point there:
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/1045#issuecomment-4695950002

Workaround in place
-------------------
prime-select intel disables the dGPU (module not loaded -> no GSP -> no crash); 
display runs on the Intel iGPU.

Asks to Canonical OEM / Lenovo
------------------------------
1. Confirm whether this reproduces on the certified 24.04 + 6.14-oem combo.
2. Is a BIOS/VBIOS/GSP-firmware update planned for the 21Q9 that addresses 
Blackwell GSP stability under Linux?
3. Track for the 26.04 certification cycle.

ProblemType: Bug
DistroRelease: Ubuntu 26.04
Package: linux 7.0.0-22-generic
A sanitized nvidia-bug-report.log.gz can be attached on request.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2156637

Title:
  [ThinkPad P1 Gen 8 / RTX PRO 2000 Blackwell] Reproducible NVIDIA GSP
  crash (Xid 120/154 + GCx loop), hard power-off on battery — certified
  hw 202510-37972

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2156637/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to