Update after additional testing and off-host netconsole capture.
The original report was filed while the freezes were silent in the local
journal, and the
best visible clue was the NVIDIA `__warn_thunk` warning during
`nvidia_init_module`. I have
since reproduced the hard-freeze symptom on both kernel versions and captured
the final
kernel messages from two later freezes using netconsole to a second host.
New findings:
- The system also hard-froze on `7.0.0-22-generic`, so `7.0.0-22` is not a
clean workaround.
- The NVIDIA warning from the original report remains unexplained: `Unpatched
return thunk in
use. This should not happen!`
- However, the final messages captured immediately before two later freezes
are xHCI resume
failures on AMD USB controllers, not NVIDIA NVRM/Xid messages.
Crash 1:
- Failed boot: 2026-06-28 15:12:35 to 2026-06-28 16:37:45
- Kernel: `7.0.0-22-generic`
- Local journal stopped at 16:37:45.
- Netconsole captured these final lines at 16:37:46:
```text
xhci_hcd 0000:12:00.3: Controller not ready at resume -19
xhci_hcd 0000:12:00.3: PCI post-resume error -19!
xhci_hcd 0000:12:00.3: HC died; cleaning up
Crash 2:
- Failed boot: 2026-06-28 16:41:15 to 2026-06-28 18:51:33
- Kernel: 7.0.0-27-generic
- Local journal stopped at 18:51:33.
- Netconsole captured these final lines at 18:52:54:
2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: Controller not ready at
resume -19
2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: PCI post-resume error -19!
2026-06-28T18:52:54-0400 xhci_hcd 0000:12:00.4: HC died; cleaning up
Relevant PCI devices:
12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Raphael/Granite Ridge USB
3.1 xHCI [1022:15b6]
12:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Raphael/Granite Ridge USB
3.1 xHCI [1022:15b7]
0e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 600 Series
Chipset USB 3.2
Controller [1022:43f7] (rev 01)
Current interpretation:
- The strongest final-event evidence now points to AMD CPU-side xHCI
runtime-resume failure,
possibly involving runtime PM, firmware/AGESA, or platform PCIe power
management.
- I am not claiming the NVIDIA __warn_thunk warning is unrelated; it remains
unresolved. But
netconsole did not capture NVIDIA NVRM/Xid output at the freeze points it
caught.
- Because the same symptom reproduced on both 7.0.0-22-generic and
7.0.0-27-generic, this may
not be specific to 7.0.0-27 despite the original title.
Mitigation currently under test:
I disabled runtime PM for all PCI xhci_hcd controllers for the current boot
by setting each
controller's power/control to on.
Observed state after applying that mitigation:
0000:0e:00.0 control=on runtime=active
0000:10:00.0 control=on runtime=active
0000:12:00.3 control=on runtime=active
0000:12:00.4 control=on runtime=active
0000:13:00.0 control=on runtime=active
If the system remains stable with xHCI runtime PM disabled, that will further
support the
xHCI runtime-resume hypothesis. If it freezes again, netconsole is still
running and should
capture whether the final failure remains xHCI-related or moves to another
subsystem.
Would welcome any other theories or logging or testing recommendations, this
seems hard to reproduce but it is leading to frequent issues.
** Summary changed:
- 7.0.0-27-generic hard freezes with NVIDIA 595 open module; __warn_thunk
warning in nvidia_init_module
+ Hard freezes on 7.0.0-22 and -27 with X670E/Ryzen; AMD xHCI controller not
ready at resume (-19)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2158539
Title:
Hard freezes on 7.0.0-22 and -27 with X670E/Ryzen; AMD xHCI controller
not ready at resume (-19)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2158539/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs