Public bug reported:
Two kernel crashes occurred on the same machine (Dell Pro Max Tower T2
FCT2250, RTX 4090, Ubuntu 24.04.4 LTS) running kernel 6.17.0-1023-oem
with the NVIDIA 595.71.05 open kernel module.
=== Crash 1 — 2026-05-30 ===
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
__schedule+0x763/0x7a0
CPU: 21 UID: 1001 PID: 104912 Comm: HeapHelper
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
The crashing process "HeapHelper" is a CUDA Unified Memory (UVM)
management helper thread. The kernel stack was corrupted in the
scheduler, detected by the stack-protector canary check.
=== Crash 2 — 2026-06-04 ===
kernel tried to execute NX-protected page - exploit attempt? (uid: 1001)
BUG: unable to handle page fault for address: ffff8e4081580000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0011) - permissions violation
CPU: 21 UID: 1001 PID: 365452 Comm: iou-sqp-365444
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
RIP: 0010:0xffff8e4081580000
Call Trace:
? io_sq_thread+0x3cc/0x810
? ret_from_fork+0x121/0x140
The crashing thread "iou-sqp-365444" is an io_uring SQPOLL kernel
thread. The instruction pointer points to an NX-protected page (all
zeros), indicating a corrupted function pointer.
=== Common Characteristics ===
Both crashes share:
- Same physical CPU core: CPU 21
- Same user: UID 1001 (running CUDA + io_uring workloads)
- Same kernel taint: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE (nvidia open kernel
module)
- Same kernel version: 6.17.0-1023-oem
- Both are memory corruption patterns
=== System Information ===
Machine: Dell Pro Max Tower T2 FCT2250
BIOS: 1.14.1 (2026-04-02)
CPU: Intel Core Ultra 9 285K (24 cores)
GPU: NVIDIA GeForce RTX 4090
Memory: 128 GB
OS: Ubuntu 24.04.4 LTS
Kernel (crashed): 6.17.0-1023-oem
Kernel (current): 6.17.0-1024-oem
NVIDIA driver: 595.71.05 (open kernel module, DKMS)
=== Hypothesis ===
The nvidia_uvm driver (open kernel module) corrupts kernel memory when managing
GPU page tables for processes that simultaneously use CUDA Unified Memory and
io_uring asynchronous I/O. The corruption is later dereferenced by kernel
threads (io_sq_thread, __schedule).
The fact that both crashes hit CPU 21 suggests a possible per-CPU data
structure issue.
=== Mitigation ===
Setting kernel.io_uring_disabled=1 (disables io_uring SQPOLL mode only)
has stabilized the system.
=== Attachments ===
Crash dump files available on the affected machine:
- /var/crash/linux-image-6.17.0-1024-oem-202606041723.crash (June 4)
- /var/crash/linux-image-6.17.0-1023-oem-202605301822.crash (May 30)
Full vmcore dmesg extracts also available upon request.
** Affects: linux-oem-6.17 (Ubuntu)
Importance: Undecided
Status: New
** Tags: kernel-crash memory-corruption nvidia oem-kernel regression
** Attachment added: "linux-image-6.17.0-1024-oem-crash.zip"
https://bugs.launchpad.net/bugs/2155623/+attachment/5975676/+files/linux-image-6.17.0-1024-oem-crash.zip
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2155623
Title:
Kernel memory corruption: io_sq_thread executes NX page on CPU 21 —
likely nvidia_uvm interaction (6.17.0-1023-oem)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.17/+bug/2155623/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs