For best results I use a out of band network device to cut power to devices
and reboot them when they fail the watchdog criteria.
Normally they stop pinging or a service isn't responding after a NAGIOS
plugin attempt to restart.
I would have a look at webpowerswitch.com
I use this with PCS and
Thanks Peter for your fast answer !
I was thinking about watchdog stack itself for the software part. I had
no idea that it was able to manage the HW watchdog of the RPI4 :)
the watchdog stack is a little confusing because the documentation is so
small...especially when using HW module
Hi Pierre-Francois,
> I am running 6 RPI4s with fedora 37. K3S is powering this cluster and it
> is working well :)
>
> But from time to time, 1 RPI is randomly hanging.
>
> I am thinking about implementing a watchdog :
>
> - software based, using embeded linux kernel
If the RPi itself is