[Bug 2158539] Re: 7.0.0-27-generic hard freezes with NVIDIA 595 open module; __warn_thunk warning in nvidia_init_module

Thomas B Sun, 28 Jun 2026 16:45:32 -0700

Update after additional testing and off-host netconsole capture.

  The original report was filed while the freezes were silent in the local 
journal, and the
  best visible clue was the NVIDIA `__warn_thunk` warning during 
`nvidia_init_module`. I have
  since reproduced the hard-freeze symptom on both kernel versions and captured 
the final
  kernel messages from two later freezes using netconsole to a second host.


  New findings:

  - The system also hard-froze on `7.0.0-22-generic`, so `7.0.0-22` is not a 
clean workaround.
  - The NVIDIA warning from the original report remains unexplained: `Unpatched 
return thunk in
  use. This should not happen!`
  - However, the final messages captured immediately before two later freezes 
are xHCI resume
  failures on AMD USB controllers, not NVIDIA NVRM/Xid messages.

  Crash 1:

  - Failed boot: 2026-06-28 15:12:35 to 2026-06-28 16:37:45
  - Kernel: `7.0.0-22-generic`
  - Local journal stopped at 16:37:45.
  - Netconsole captured these final lines at 16:37:46:

  ```text
  xhci_hcd 0000:12:00.3: Controller not ready at resume -19
  xhci_hcd 0000:12:00.3: PCI post-resume error -19!
  xhci_hcd 0000:12:00.3: HC died; cleaning up

  Crash 2:

  - Failed boot: 2026-06-28 16:41:15 to 2026-06-28 18:51:33
  - Kernel: 7.0.0-27-generic
  - Local journal stopped at 18:51:33.
  - Netconsole captured these final lines at 18:52:54:

  2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: Controller not ready at 
resume -19
  2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: PCI post-resume error -19!
  2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: HC died; cleaning up

  Relevant PCI devices:

  12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Raphael/Granite Ridge USB
  3.1 xHCI [1022:15b6]
  12:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Raphael/Granite Ridge USB
  3.1 xHCI [1022:15b7]
  0e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 600 Series 
Chipset USB 3.2
  Controller [1022:43f7] (rev 01)

  Current interpretation:

  - The strongest final-event evidence now points to AMD CPU-side xHCI 
runtime-resume failure,
    possibly involving runtime PM, firmware/AGESA, or platform PCIe power 
management.

  - I am not claiming the NVIDIA __warn_thunk warning is unrelated; it remains 
unresolved. But
    netconsole did not capture NVIDIA NVRM/Xid output at the freeze points it 
caught.

  - Because the same symptom reproduced on both 7.0.0-22-generic and 
7.0.0-27-generic, this may
    not be specific to 7.0.0-27 despite the original title.

  Mitigation currently under test:

  I disabled runtime PM for all PCI xhci_hcd controllers for the current boot 
by setting each
  controller's power/control to on.

  Observed state after applying that mitigation:

  0000:0e:00.0 control=on runtime=active
  0000:10:00.0 control=on runtime=active
  0000:12:00.3 control=on runtime=active
  0000:12:00.4 control=on runtime=active
  0000:13:00.0 control=on runtime=active

  If the system remains stable with xHCI runtime PM disabled, that will further 
support the
  xHCI runtime-resume hypothesis. If it freezes again, netconsole is still 
running and should
  capture whether the final failure remains xHCI-related or moves to another 
subsystem.


Would welcome any other theories or logging or testing recommendations, this 
seems hard to reproduce but it is leading to frequent issues.


** Summary changed:

- 7.0.0-27-generic hard freezes with NVIDIA 595 open module; __warn_thunk 
warning in   nvidia_init_module
+ Hard freezes on 7.0.0-22 and -27 with X670E/Ryzen; AMD xHCI controller not 
ready at resume (-19)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2158539

Title:
  Hard freezes on 7.0.0-22 and -27 with X670E/Ryzen; AMD xHCI controller
  not ready at resume (-19)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2158539/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2158539] Re: 7.0.0-27-generic hard freezes with NVIDIA 595 open module; __warn_thunk warning in nvidia_init_module

Reply via email to