I had the same super annoying issue.

Hardware:
- 21HF0021US (ThinkPad P14s Gen 4)
- BIOS N3QET51W (1.51)

When I got the machine, I wiped it and installed Ubuntu 24.04 LTS, then
later upgraded to Ubuntu 25.10. After that, I started noticing that the
machine was rebooting randomly.

There were no logs or kernel messages at all. It looked exactly like
someone pulled the power plug from the device.

At first, I thought it might be a kernel issue. So I tried Fedora 43,
and surprisingly the resets dropped significantly, maybe once every two
weeks.

Then I switched to Arch Linux (rolling). I expected the latest kernel
would fix the issue, but sadly the resets came back with a vengeance,
happening more than 6 times a day.

The battery is healthy (95%), and I am using a 100W charger (I upgraded
from 65W hoping it would help), but the resets still happened.

Then I checked the throttling status and found that the CPU was getting
throttled. I replaced the thermal paste with PTM7950, and the throttling
issue is now completely gone. I can stress the CPU for more than 30
minutes with no throttle events.

The weird part is that the resets never happen during heavy compiling or
other CPU-intensive tasks. They usually happen while watching YouTube.

I thought it might be the browser. I switched from Firefox to Zen (same
engine), then tried Chrome, but the resets still happened.

Then I tried the LTS kernel (6.12.73-1-lts) and noticed these logs:
```
10:14:37 PM kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: CT: Failed to 
process CT message (-ENOKEY) 01 00 36 9d 00 01 00 e0
10:14:37 PM kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: CT: Failed to 
process CT message (-ENOKEY) 01 00 36 9d 00 01 00 e0
10:14:37 PM kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: CT: Failed to 
handle HXG message (-ENOKEY) 00 01 00 e0
10:14:37 PM kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: CT: Unsolicited 
response message: len 1, data 0xe0000100 (fence 40246, last 40249)
```

I had never seen these before because the latest/zen kernels (6.18.9)
would just reboot instantly with no time to write logs to disk. The LTS
kernel (6.12.73) luckily left the logs without crashing.

After searching the internet and asking ai, it appears to be a
synchronization failure between the i915 driver and Intel GuC firmware
(a CT protocol mismatch or lost state).

After that, I tried disabling GuC submission by adding
`i915.enable_guc=2` to my systemd-boot options for the latest kernel.

However, the reboots still happened.

Then I found that PSR power saving can trigger the same race condition,
so I also added `i915.enable_psr=0`.

After rebooting, I had zero resets for a while using the latest kernel.

However, after about a week of use, a new release of `linux-firmware`
and the LTS kernel was installed (`20260221-1`, `6.18.16` respectively),
and the reboots came back again.

Since all the issues seemed related to the `i915` driver, I thought that maybe 
switching to the `xe` driver would help. So I blocked `i915` and forced `xe` to 
take the device by adding:
`i915.force_probe=!7d55 xe.force_probe=0xa7a0`

And it worked. No resets for more than a week now.

After that, I completely blocked `i915` by adding it to 
`/etc/modprobe.d/blacklist.conf`:
```
blacklist nouveau
blacklist i915
install i915 /bin/false
```

Then I regenerated the initramfs with`sudo mkinitcpio -P` and rebooted.

I hope this helps anyone experiencing the same issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2084190

Title:
  Ubuntu 24.04 crashes occasionally on Thinkpad P14s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2084190/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to