Re: Automatic reboot on kernel crash in Debian 12 - how?
On 16/04/2024 16:17, Michael Kjörling wrote: I have a handful of Debian 12 systems that I want to configure such that they reboot automatically in case of a problem. [...] That leaves kernel-level issues. I have not tried it, but I have seen some systemd options related to configuration of hardware watchdog: systemd.directives(7) and systemd-system.conf(5)
Re: Automatic reboot on kernel crash in Debian 12 - how?
On 16/04/24 at 11:17, Michael Kjörling wrote: Do I need to set some more settings to ensure that the system will automatically reboot on a panic? If so, what? Hi, In the Linux kernel source are available two options to reboot on panic: config BOOTPARAM_SOFTLOCKUP_PANIC bool "Panic (Reboot) On Soft Lockups" depends on SOFTLOCKUP_DETECTOR help Say Y here to enable the kernel to panic on "soft lockups", which are bugs that cause the kernel to loop in kernel mode for more than 20 seconds (configurable using the watchdog_thresh sysctl), without giving other tasks a chance to run. The panic can be used in combination with panic_timeout, to cause the system to reboot automatically after a lockup has been detected. This feature is useful for high-availability systems that have uptime guarantees and where a lockup must be resolved ASAP. Say N if unsure. and: config BOOTPARAM_HARDLOCKUP_PANIC bool "Panic (Reboot) On Hard Lockups" depends on HARDLOCKUP_DETECTOR help Say Y here to enable the kernel to panic on "hard lockups", which are bugs that cause the kernel to loop in kernel mode with interrupts disabled for more than 10 seconds (configurable using the watchdog_thresh sysctl). Say N if unsure. from Documentation/admin-guide/kernel-parameters.txt you can set it as kernel parameter or via sysctls: softlockup_panic= [KNL] Should the soft-lockup detector generate panics. Format: 0 | 1 A value of 1 instructs the soft-lockup detector to panic the machine when a soft-lockup occurs. It is also controlled by the kernel.softlockup_panic sysctl and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the respective build-time switch to that functionality. and the same for "kernel.hardlockup_panic" that it seems it hasn't an help entry in the documentation file, I found it here: nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels Format: [panic,][nopanic,][num] Valid num: 0 or 1 0 - turn hardlockup detector in nmi_watchdog off 1 - turn hardlockup detector in nmi_watchdog on When panic is specified, panic when an NMI watchdog timeout occurs (or 'nopanic' to not panic on an NMI watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) To disable both hard and soft lockup detectors, please see 'nowatchdog'. This is useful when you use a panic=... timeout and need the box quickly up again. These settings can be accessed at runtime via the nmi_watchdog and hardlockup_panic sysctls. To learn more I suggest to install the "linux-source-6.1" package and investigate the "Watchdog" option, it is under "Device Drivers". The BOOTPARAM_SOFTLOCKUP_PANIC and BOOTPARAM_HARDLOCKUP_PANIC options are under "Kernel hacking" → "Debug Oops, Lockups and Hangs". Cheers -- Franco Martelli
Re: Automatic reboot on kernel crash in Debian 12 - how?
On 16 Apr 2024 11:42 +0200, from geo...@nsup.org (Nicolas George): >> Are you saying that the settings themselves are reasonable for the >> purpose, and that this particular crash just happened to be such a one >> that no software running on the system in question can reasonably help >> with that scenario? > > No, unfortunately I do not have the gift of divination, it would be > convenient. I am saying that you cannot use software to protect yourself > entirely from software bugs. Well, naturally. But if there is some setting which I _could_ set that would get me closer to my desired state, I would still like to know which one and perhaps even what might be an appropriate value for it. -- Michael Kjörling https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”
Re: Automatic reboot on kernel crash in Debian 12 - how?
Michael Kjörling (12024-04-16): > Are you saying that the settings themselves are reasonable for the > purpose, and that this particular crash just happened to be such a one > that no software running on the system in question can reasonably help > with that scenario? No, unfortunately I do not have the gift of divination, it would be convenient. I am saying that you cannot use software to protect yourself entirely from software bugs. > This happened on a VM that I can't directly influence the hardware > configuration of (a commercially provided VPS), but I should be able > to jury-rig something using the provider's API if necessary. You probably can. But first check if your VM has an emulated hardware watchdog. Regards, -- Nicolas George
Re: Automatic reboot on kernel crash in Debian 12 - how?
On 16 Apr 2024 11:22 +0200, from geo...@nsup.org (Nicolas George): >> Do I need to set some more settings to ensure that the system will >> automatically reboot on a panic? If so, what? > > If the crash was bad enough to freeze the kernel before it could > trigger the reboot, there is nothing the software can do. > > You need a hardware watchdog. Are you saying that the settings themselves are reasonable for the purpose, and that this particular crash just happened to be such a one that no software running on the system in question can reasonably help with that scenario? This happened on a VM that I can't directly influence the hardware configuration of (a commercially provided VPS), but I should be able to jury-rig something using the provider's API if necessary. -- Michael Kjörling https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”
Re: Automatic reboot on kernel crash in Debian 12 - how?
Michael Kjörling (12024-04-16): > However, this morning I woke up to one of those systems showing a > kernel crash dump and being frozen. Unfortunately the first part of > the crash dump had scrolled past so I couldn't tell what class of > problem caused the crash. > > Do I need to set some more settings to ensure that the system will > automatically reboot on a panic? If so, what? If the crash was bad enough to freeze the kernel before it could trigger the reboot, there is nothing the software can do. You need a hardware watchdog. If your motherboard has one, just install and enable the corresponding daemon, and check it works by SIGSTOPing it. If your motherboard does not have one, you can probably DIY one from a RPi or an Arduino. Regards, -- Nicolas George
Automatic reboot on kernel crash in Debian 12 - how?
I have a handful of Debian 12 systems that I want to configure such that they reboot automatically in case of a problem. I have set them up with userspace scripts (executed through cron) to reboot if something goes wrong there; that appears to work as expected if I induce an issue that those scripts check for. That leaves kernel-level issues. To try to configure this, I have created a file /etc/sysctl.d/local.conf (owned by root:root, mode 0644). # cat /etc/sysctl.d/local.conf kernel.panic = 120 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 1 kernel.panic_on_io_nmi = 1 # With the exception of panic_on_stackoverflow, as far as I can tell these are in effect after a reboot: # sysctl kernel.panic kernel.panic_on_oops kernel.panic_on_stackoverflow kernel.panic_on_io_nmi kernel.panic = 120 kernel.panic_on_oops = 1 sysctl: cannot stat /proc/sys/kernel/panic_on_stackoverflow: No such file or directory kernel.panic_on_io_nmi = 1 # However, this morning I woke up to one of those systems showing a kernel crash dump and being frozen. Unfortunately the first part of the crash dump had scrolled past so I couldn't tell what class of problem caused the crash. Do I need to set some more settings to ensure that the system will automatically reboot on a panic? If so, what? I know that best is to not crash; this is _in case of_. -- Michael Kjörling https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”