On Tue, Apr 07, 2026 at 01:51:32PM -0400, Tony Camuso wrote: > When the BMC resets while the IPMI watchdog is active, the driver has > three failure modes depending on timing: > > 1. list_add double add panic -- the watchdog daemon retries while the > static smi_msg/recv_msg structures are still queued in the IPMI > layer from the previous (unanswered) request.
I'm trying to make sense of this. Are you sure this didn't start happening after you added a timeout on the wait_for_completion()? Otherwise it would never return, the mutex would be held, and no new message could be added. Just timing out in wait_for_completion() there could cause all kinds of bad things to happen. > > 2. D-state hang -- wait_for_completion() blocks indefinitely because > the BMC never delivers a response. This is an issue. The lower level driver is *always* supposed to return a failure. Something else needs to be fixed. I have seen several creative ways in which BMCs "fail to respond" that have confused the lower level drivers. If my guess is correct, there's a bug in the low level driver that's causing it to not time out the message. If we don't fix this, it will cause other issues outside the watchdog. > > 3. Silent loss of watchdog protection -- the BMC returns a non-zero > completion code, the driver's internal state becomes inconsistent, > writes to /dev/watchdog return -EINVAL, and the daemon gives up. > The system continues running without hardware watchdog coverage. Again, are you sure this didn't start happening after you added the timeout? > > All three stem from the same root cause: the static message structures > and unbounded completion waits were never designed for a BMC that > disappears mid-transaction. All that is supposed to be protected by a mutex. That mutex is claimed on all IPMI watchdog operations, and it shouldn't be released until all resources have been freed. Anything that violates that is asking for trouble. You don't mention the lower level interface (KCS, BT, SMIC, SSIF) but I think we need to start looking there. It may be that the timeouts on the watchdog messages need to be adjusted. The whole IPMI driver was designed on the presumption that the BMC would go away for only a short period of time (5-10 seconds) and not permanantly. That has slowly been fixed over time, but things might need to be adjusted in the watchdog. -corey > > This has been independently reported by Kenta Akagi on a Dell PowerEdge > R640 running 6.18.7, also triggered by a BMC reset with the watchdog > active: > > https://sourceforge.net/p/openipmi/mailman/message/59292850/ > > The fix takes a simple, deterministic approach: detect the failure via > BMC error response, guard against structure reuse (msg_in_flight) and > indefinite waits (completion timeout), then initiate orderly_reboot() > when the watchdog is active. This produces the same outcome the > hardware watchdog would have -- a system reset -- but through a > controlled path with clear logging and no panics or hangs. > > If the watchdog is stopped when the BMC resets, no reboot occurs and > the system continues normally. > > Tested on Dell PowerEdge R640 with kernel 5.14 (RHEL 9) and verified > against mainline (both patches apply cleanly). > > Corey Minyard's recent fix for list corruption in smi_work() > (ipmi_msghandler.c) addresses a related but separate code path. The > watchdog driver's own static structure reuse requires this fix. > > Tony Camuso (2): > ipmi:watchdog: Reboot cleanly on BMC reset > Documentation: ipmi: Update BMC reset behavior for watchdog > > Documentation/driver-api/ipmi.rst | 61 ++++++++++++++++++ > drivers/char/ipmi/ipmi_watchdog.c | 101 ++++++++++++++++++++++++------ > 2 files changed, 144 insertions(+), 18 deletions(-) > > -- > 2.53.0 > _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
