Re: [OpenWrt-Devel] /dev/watchdog from shell script
> I did this on my boxes, but it does not help. > Again a device is _pingable_, but all daemons are > not responding anymore: So either: - watchdog was killed and this just disabled the watchdog timer altogether. - watchdog was not killed for some reason (e.g. because the kernel considered that it holds on to some important resource). Stefan ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
On Thu, 05 Jan 2012 10:42:53 + Bastian Bittorf wrote: > > > # call this in cron.minutely > > > watchdogger -d /dev/watchdog --kick > > > > > > # do all checks with cron-called scripts > > > > > > if cron fails, the watchdog will reboot the device. > > > if you are more conservative, use timeout 900 > > > > What about just using panic_on_oom? > > then we are busted 8-) seriosly: > per default we have > > kernel.panic=3 > > in "/etc/sysctl.conf" - so this should do the reboot, should'nt it? I think he was talking about vm.panic_on_oom -- Greetings, Michael. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> kernel.panic = 3 means that the kernel will reboot 3 seconds after > > getting a panic. OOM is not a condition to trigger a panic by > default, > unless you set panic_on_oom=1. So if you get the following > situation : > > OOM will cause a panic > panic will cause a reboot in 3 seconds. > > kernel.panic=3 + panic_on_oom is enough for auto-reboot thanks - this is in my case a good idea. Question: should'nt this be enabled by default and only the devs should switch it off? bye, bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> An OOM is not considered to be a panic by default afaik. yes, it's only a seldom behaviour > Regarding your cronjob idea; won't work imo. Many (most?) watchdog > drivers do not support >= 60 second intervals. thats the smallest problem: #!/bin/sh watchdogger -d /dev/watchdog --kick sleep 30 watchdogger -d /dev/watchdog --kick and call this each minute... bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
An OOM is not considered to be a panic by default afaik. Regarding your cronjob idea; won't work imo. Many (most?) watchdog drivers do not support >= 60 second intervals. ~ Jow ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> > # call this in cron.minutely > > watchdogger -d /dev/watchdog --kick > > > > # do all checks with cron-called scripts > > > > if cron fails, the watchdog will reboot the device. > > if you are more conservative, use timeout 900 > > What about just using panic_on_oom? then we are busted 8-) seriosly: per default we have kernel.panic=3 in "/etc/sysctl.conf" - so this should do the reboot, should'nt it? bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
Hello, On 01/05/12 10:02, Bastian Bittorf wrote: would'nt it be senseful to adjust "START=01" to /etc/init.d/watchdog and place something like this? pid="$( pidof watchdog )" echo "1000">/proc/$pid/oom_score_adj yeah, could do this. I did this on my boxes, but it does not help. Again a device is _pingable_, but all daemons are not responding anymore: dropbear, uhttpd, netperf, dnsmasq, (crond) i have no idea why the above oom_score-thingy did not work - thats all i can say. it's time for ar more robust solution: # call it once at startup watchdogger -d /dev/watchdog --timeout 90 # call this in cron.minutely watchdogger -d /dev/watchdog --kick # do all checks with cron-called scripts if cron fails, the watchdog will reboot the device. if you are more conservative, use timeout 900 What about just using panic_on_oom? bye, Bastian. PS: someone has time to make a "watchdogger.c" ? ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> > would'nt it be senseful to adjust "START=01" to > /etc/init.d/watchdog > > and place something like this? > > > > pid="$( pidof watchdog )" > > echo "1000" >/proc/$pid/oom_score_adj > > yeah, could do this. I did this on my boxes, but it does not help. Again a device is _pingable_, but all daemons are not responding anymore: dropbear, uhttpd, netperf, dnsmasq, (crond) i have no idea why the above oom_score-thingy did not work - thats all i can say. it's time for ar more robust solution: # call it once at startup watchdogger -d /dev/watchdog --timeout 90 # call this in cron.minutely watchdogger -d /dev/watchdog --kick # do all checks with cron-called scripts if cron fails, the watchdog will reboot the device. if you are more conservative, use timeout 900 bye, Bastian. PS: someone has time to make a "watchdogger.c" ? ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
On Thu, 29 Dec 2011 06:52:18 + Bastian Bittorf wrote: > > > A better way would be IMHO to use a cron.minutely which fire's > > > an ioctl to /dev/watchdog. if crond is removed, the device > > should > > > reboot. so i need a way to invoke an ioctl from shellscript. > > > > I think this doesn't work. > > in our special case it would work, because all "daemon-checking" > is done via cron. so if cron fails everything is not working anymore. > > > What you could try is increasing the likeliness of the watchdog > > process > > to get killed on OOM. > > Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill > > the watchdog > > on any oom condition, as far as I can see. > > Thats seems like an interesting idea. > As far i can read oom_kill.c this will add 100% to the oom_badness() of > the watchdog-kicker, but the value is calculated based on memory-consumption. > IMHO this is not enough, because even our mini-cron > consumes more memory then the watchdog-kicker. No, this is not true. /* * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may * either completely disable oom killing or always prefer a certain * task. */ points += p->signal->oom_score_adj; if (points <= 0) return 1; return (points < 1000) ? points : 1000; This is done in a loop for every process and then the process with the highest score, which is max 1000 and watchdog has 1000, is selected. It is unlikely that there is another process which also has 1000 due to its memory consumption only, that is selected first. And it's even unlikelier that cron consumes 1000 points of badness. So cron will never be selected in favor of watchdog. > beside that: > would'nt it be senseful to adjust "START=01" to /etc/init.d/watchdog > and place something like this? > > pid="$( pidof watchdog )" > echo "1000" >/proc/$pid/oom_score_adj yeah, could do this. -- Greetings, Michael. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> > A better way would be IMHO to use a cron.minutely which fire's > > an ioctl to /dev/watchdog. if crond is removed, the device > should > > reboot. so i need a way to invoke an ioctl from shellscript. > > I think this doesn't work. in our special case it would work, because all "daemon-checking" is done via cron. so if cron fails everything is not working anymore. > What you could try is increasing the likeliness of the watchdog > process > to get killed on OOM. > Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill > the watchdog > on any oom condition, as far as I can see. Thats seems like an interesting idea. As far i can read oom_kill.c this will add 100% to the oom_badness() of the watchdog-kicker, but the value is calculated based on memory-consumption. IMHO this is not enough, because even our mini-cron consumes more memory then the watchdog-kicker. Is there a switch: "kill this pid at first"? https://github.com/mirrors/linux/blob/master/mm/oom_kill.c beside that: would'nt it be senseful to adjust "START=01" to /etc/init.d/watchdog and place something like this? pid="$( pidof watchdog )" echo "1000" >/proc/$pid/oom_score_adj (it would'nt make sense on a desktop-system, but on a router...) bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
> I think jow wrote something like this already, see: > http://luci.subsignal.org/trac/browser/luci/trunk/contrib/package/freifunk-watchdog > Interesting, but has the same design-issue like already mentioned: if the oom-killer is working it will likely kill the freifunk-watchdog and crond, so to check the pidof crond from userspace does not help. what we need is a simple tool like this: watchdogger -d /dev/watchdog --timeout 90 watchdogger -d /dev/watchdog --kick this does not run as a daemon, but will be called from a cron-script. if crond is not running anymore ("oom_kill"), the device will reboot. # just for saying it: # the workaround this time is: (but badly spams our log) /etc/init.d/watchdog stop AND having a file which will be called from crond each minute: #!/bin/sh I=0 while [ $I -lt 60 ]; do I=$(( $I + 5 )) echo >/dev/watchdog sleep 5 done bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
On Wed, 28 Dec 2011 10:39:33 + Bastian Bittorf wrote: > for having a better way not to lost a router i like to > use /dev/watchdog from a shell script. the reason is this: > > Sometimes the oom-killer removes important tasks like > ssh + httpd + routing + cron but leaves the watchdog-petting on, > so the device is running, but in fact lost. > > A better way would be IMHO to use a cron.minutely which fire's > an ioctl to /dev/watchdog. if crond is removed, the device should > reboot. so i need a way to invoke an ioctl from shellscript. I think this doesn't work. What you could try is increasing the likeliness of the watchdog process to get killed on OOM. Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill the watchdog on any oom condition, as far as I can see. -- Greetings, Michael. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
On 28.12.2011 12:21, Florian Fainelli wrote: > > I do not think this will work better. If crond is not killed by the OOM > killer, then the watchdog keeps being kept alive, and you end up in the > same situation. Rather I think we need some kind of software monitoring > by a daemon like upstart which makes sures essential software is > restarted once killed. > -- > Florian I think jow wrote something like this already, see: http://luci.subsignal.org/trac/browser/luci/trunk/contrib/package/freifunk-watchdog regards, soma signature.asc Description: OpenPGP digital signature ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] /dev/watchdog from shell script
Hello Bastian, On 12/28/11 11:39, Bastian Bittorf wrote: hi devs, for having a better way not to lost a router i like to use /dev/watchdog from a shell script. the reason is this: Sometimes the oom-killer removes important tasks like ssh + httpd + routing + cron but leaves the watchdog-petting on, so the device is running, but in fact lost. Once started the watchdog daemon does not longer allocate big chunks of memory (if any at all), so this is kind of expected. A better way would be IMHO to use a cron.minutely which fire's an ioctl to /dev/watchdog. if crond is removed, the device should reboot. so i need a way to invoke an ioctl from shellscript. whats the best way, maybe with onboard-tools? I do not think this will work better. If crond is not killed by the OOM killer, then the watchdog keeps being kept alive, and you end up in the same situation. Rather I think we need some kind of software monitoring by a daemon like upstart which makes sures essential software is restarted once killed. -- Florian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[OpenWrt-Devel] /dev/watchdog from shell script
hi devs, for having a better way not to lost a router i like to use /dev/watchdog from a shell script. the reason is this: Sometimes the oom-killer removes important tasks like ssh + httpd + routing + cron but leaves the watchdog-petting on, so the device is running, but in fact lost. A better way would be IMHO to use a cron.minutely which fire's an ioctl to /dev/watchdog. if crond is removed, the device should reboot. so i need a way to invoke an ioctl from shellscript. whats the best way, maybe with onboard-tools? bye, Bastian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel