** Description changed:

  ############ Summary ############
  
  Running modern Windows games through Steam Proton (DXVK/vkd3d) on Ubuntu
  24.04–based Linux Mint consistently triggers kernel oopses in:
  
  syscall_exit_to_user_mode (NULL pointer dereference)
  
  fpregs_assert_state_consistent (FPU state inconsistency)
  
  and previously, on a mainline kernel, amdgpu_hmm_range_get_pages (bad
  page map)
  
  These occur across multiple games, not tied to a specific title, and
  only under Proton workloads. Hardware stress tests show no instability,
  suggesting a kernel regression. The same hardware with a separate
  harddrive running windows, has no issues.
  
  ############ System Information ############
  
  Distro: Linux Mint 22.x (Ubuntu 24.04 base)
  
  Kernel (affected): 6.14.0-37-generic #37~24.04.1-Ubuntu
  
  Also tested: 6.18.1-061801-generic (mainline, not Ubuntu-supported),
  6.6.0-060600, 6.7.0-060700, 6.8.0-90
  
  Motherboard: Gigabyte Z790 UD AC
  
  BIOS: F14 (09/18/2025)
  
  CPU: [Insert output of lscpu | grep 'Model name']
  
  GPU: AMD Radeon RX 7900 XTX
  
  Driver: in‑kernel amdgpu (no proprietary modules)
  
  Taint flags: G D W after oops (no out‑of‑tree modules)
  
  I can attach full outputs of uname -a, lspci -nn, and lsmod if needed.
  
  ############ Description of the Problem ############
  
  When running Proton-based games (DX12/DXVK/vkd3d), the kernel
  intermittently logs:
  
  1. NULL pointer dereference in syscall_exit_to_user_mode
  Example from Europa Universalis V:
  
  #LOG#
  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  Oops: 0000 [#2] PREEMPT SMP NOPTI
  CPU: 5 PID: 15370 Comm: Task_P_BN 1 Tainted: G      D W          
6.14.0-37-generic
  RIP: 0010:syscall_exit_to_user_mode+0x74/0x1d0
  ...
  note: Task_P_BN 1 exited with irqs disabled
  #ENDLOG#
  
  This same oops occurred earlier with a different Proton thread
  (ThreadPoolForeg), same kernel, same RIP offset.
  
  2. FPU state inconsistency warning in DXVK shader worker
  #LOG#
  WARNING: CPU: 4 PID: 8329 at arch/x86/kernel/fpu/core.c:822 
fpregs_assert_state_consistent+0x2d/0x50
  Comm: dxvk-shader-n Tainted: G      D            6.14.0-37-generic
  Call Trace:
-   arch_exit_to_user_mode_prepare
-   irqentry_exit_to_user_mode
-   sysvec_reschedule_ipi
+   arch_exit_to_user_mode_prepare
+   irqentry_exit_to_user_mode
+   sysvec_reschedule_ipi
  #ENDLOG#
  
  This appears during shader compilation under DXVK.
  
  3. On mainline kernel 6.18.1: amdgpu HMM page map corruption
  #LOG#
  BUG: Bad page map in process vkd3d_queue
  amdgpu_hmm_range_get_pages
  amdgpu_ttm_tt_get_user_pages
  amdgpu_cs_ioctl
  #ENDLOG#
  
  To me this might suggest a second failure mode in GPU memory management
  under vkd3d.
- 
  
  ############ Workloads That Trigger the Bug ############
  
  Arc Raiders (Unreal Engine 5, DX12 via Proton)
  
  Europa Universalis V (DX11/DX12 via Proton)
  
  Both games trigger oopses within minutes.
  
  And multiple other games, the issue is not game-specific.
  
  ############ Steps to Reproduce ############
  
  Boot into kernel 6.14.0-37-generic.
  
  Start dmesg -w in a terminal.
  
  Launch Steam → run a Proton game (Arc Raiders or EU5).
  
  Play or idle for a few minutes.
  
  Observe kernel oopses in syscall_exit_to_user_mode or FPU warnings.
  
  Repro rate: high (multiple times per session).
  
  ############ Hardware Stability Verification ############
  
- 
  To rule out hardware faults:
- I tested on windows which was installed on a separate drive completely, and 
had no issues playing games or running benchmarks and no pixel artifact
+ I tested on windows which was installed on a separate drive completely, and 
had no issues playing games or running benchmarks and no pixel artifact. I also 
reseated the GPU and confirmed it was using 3 independent power cables to the 
PSU
  
  I ran memtester (4 GB locked, multiple loops)
  All tests passed (stuck address, random value, bit fade, etc.)
  
  and ran stress-ng (30 minutes)
  Command:
  
  Code
  stress-ng --cpu 8 --matrix 4 --vecmath 4 --timeout 30m --metrics-brief
  Heavy CPU/FPU/AVX/SIMD load
  
  No errors, kernel warnings and no thermal throttling or instability in
  dmesg.
  
  To me this suggests the kernel oopses are not hardware-related.
  
  ############ Expected Behavior ############
  
  Proton games should run without triggering kernel oopses in:
  
  syscall exit path
  
  FPU state handling
  
  amdgpu HMM
  
  ############ Actual Behavior ############
  
  Kernel logs fatal oopses
  
  Threads exit with IRQs disabled
  
  Kernel becomes tainted
  
  Games may crash or behave unpredictably
  
  System remains running but in an unsafe state, have ran into two kernel
  panics, and were not able to grab logs.
  
  ############ Additional Notes ############
-  
+ 
  No proprietary modules loaded
  
  No overclocking; system at stock settings
  
  Issue reproduced across multiple Proton versions
  
  Issue reproduced across multiple games
  
  Issue reproduced across multiple kernels (Ubuntu + mainline)
  
  No errors under synthetic stress tests
  
  ############ Conclusion ############
  
  I've been racking my head on this for about a week now on a fresh Mint
  install, I wasn't sure where to submit this bug report but thought it
  might be a good idea to run it by you guys to see if this is the right
  place to report it.
  
  From my (limited) understanding, this appears to be a kernel regression
  affecting:
  
  x86 syscall exit path
  
  FPU state tracking
  
  amdgpu HMM under vkd3d
  
  Proton/DXVK workloads on RDNA3 hardware
  
  Given the reproducibility and the clean hardware tests, this seems like
  it could be kernal related and not a hardware issue.
  
  I'm happy to try and provide any additiona dmesg/journalctl logging or
  proton logs or testing on any other kernels or debug kernels if
  needed/requested
  
  Please advise on next steps or additional diagnostics.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2136907

Title:
  [REGRESSION?] NULL deref in syscall_exit_to_user_mode and FPU state
  warning under Proton/DXVK/vkd3d on Z790 + RX 7900 XTX

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2136907/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to