Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-06 Thread Stefan Monnier
> I did this on my boxes, but it does not help.
> Again a device is _pingable_, but all daemons are
> not responding anymore:

So either:
- watchdog was killed and this just disabled the watchdog timer altogether.
- watchdog was not killed for some reason (e.g. because the kernel
  considered that it holds on to some important resource).


Stefan

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Michael Büsch
On Thu, 05 Jan 2012 10:42:53 +
Bastian Bittorf  wrote:

> > > # call this in cron.minutely
> > > watchdogger -d /dev/watchdog --kick
> > >
> > > # do all checks with cron-called scripts
> > >
> > > if cron fails, the watchdog will reboot the device.
> > > if you are more conservative, use timeout 900
> > 
> > What about just using panic_on_oom?
> 
> then we are busted 8-) seriosly:
> per default we have
> 
> kernel.panic=3
> 
> in "/etc/sysctl.conf" - so this should do the reboot, should'nt it?

I think he was talking about vm.panic_on_oom

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Bastian Bittorf
> kernel.panic = 3 means that the kernel will reboot 3 seconds after
> 
> getting a panic. OOM is not a condition to trigger a panic by
> default, 
> unless you set panic_on_oom=1. So if you get the following
> situation :
> 
> OOM will cause a panic
> panic will cause a reboot in 3 seconds.
> 
> kernel.panic=3 + panic_on_oom is enough for auto-reboot

thanks - this is in my case a good idea.

Question: should'nt this be enabled by default
and only the devs should switch it off?

bye, bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Bastian Bittorf
> An OOM is not considered to be a panic by default afaik.

yes, it's only a seldom behaviour

> Regarding your cronjob idea; won't work imo. Many (most?) watchdog
> drivers do not support >= 60 second intervals.

thats the smallest problem:

#!/bin/sh
watchdogger -d /dev/watchdog --kick
sleep 30
watchdogger -d /dev/watchdog --kick

and call this each minute...

bye, Bastian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Jo-Philipp Wich
An OOM is not considered to be a panic by default afaik.
Regarding your cronjob idea; won't work imo. Many (most?) watchdog
drivers do not support >= 60 second intervals.

~ Jow
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Bastian Bittorf
> > # call this in cron.minutely
> > watchdogger -d /dev/watchdog --kick
> >
> > # do all checks with cron-called scripts
> >
> > if cron fails, the watchdog will reboot the device.
> > if you are more conservative, use timeout 900
> 
> What about just using panic_on_oom?

then we are busted 8-) seriosly:
per default we have

kernel.panic=3

in "/etc/sysctl.conf" - so this should do the reboot, should'nt it?

bye, Bastian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Florian Fainelli

Hello,

On 01/05/12 10:02, Bastian Bittorf wrote:

would'nt it be senseful to adjust "START=01" to

/etc/init.d/watchdog

and place something like this?

pid="$( pidof watchdog )"
echo "1000">/proc/$pid/oom_score_adj


yeah, could do this.


I did this on my boxes, but it does not help.
Again a device is _pingable_, but all daemons are
not responding anymore:

dropbear, uhttpd, netperf, dnsmasq, (crond)

i have no idea why the above oom_score-thingy
did not work - thats all i can say. it's time for ar more
robust solution:

# call it once at startup
watchdogger -d /dev/watchdog --timeout 90

# call this in cron.minutely
watchdogger -d /dev/watchdog --kick

# do all checks with cron-called scripts

if cron fails, the watchdog will reboot the device.
if you are more conservative, use timeout 900


What about just using panic_on_oom?



bye, Bastian.

PS: someone has time to make a "watchdogger.c" ?

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2012-01-05 Thread Bastian Bittorf
> > would'nt it be senseful to adjust "START=01" to
> /etc/init.d/watchdog
> > and place something like this?
> > 
> > pid="$( pidof watchdog )"
> > echo "1000" >/proc/$pid/oom_score_adj  
> 
> yeah, could do this.

I did this on my boxes, but it does not help.
Again a device is _pingable_, but all daemons are
not responding anymore:

dropbear, uhttpd, netperf, dnsmasq, (crond)

i have no idea why the above oom_score-thingy
did not work - thats all i can say. it's time for ar more
robust solution:

# call it once at startup
watchdogger -d /dev/watchdog --timeout 90

# call this in cron.minutely
watchdogger -d /dev/watchdog --kick

# do all checks with cron-called scripts

if cron fails, the watchdog will reboot the device.
if you are more conservative, use timeout 900

bye, Bastian.

PS: someone has time to make a "watchdogger.c" ?

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-29 Thread Michael Büsch
On Thu, 29 Dec 2011 06:52:18 +
Bastian Bittorf  wrote:

> > > A better way would be IMHO to use a cron.minutely which fire's
> > > an ioctl to /dev/watchdog. if crond is removed, the device  
> > should  
> > > reboot. so i need a way to invoke an ioctl from shellscript.  
> > 
> > I think this doesn't work.  
> 
> in our special case it would work, because all "daemon-checking"
> is done via cron. so if cron fails everything is not working anymore.
>   
> > What you could try is increasing the likeliness of the watchdog
> > process
> > to get killed on OOM.
> > Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill
> > the watchdog
> > on any oom condition, as far as I can see.  
> 
> Thats seems like an interesting idea.
> As far i can read oom_kill.c this will add 100% to the oom_badness() of
> the watchdog-kicker, but the value is calculated based on memory-consumption. 
> IMHO this is not enough, because even our mini-cron
> consumes more memory then the watchdog-kicker.  

No, this is not true.


/*
* /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
* either completely disable oom killing or always prefer a certain
* task.
*/
points += p->signal->oom_score_adj;

if (points <= 0)
return 1;
return (points < 1000) ? points : 1000;


This is done in a loop for every process and then the process with the highest
score, which is max 1000 and watchdog has 1000, is selected. It is
unlikely that there is another process which also has 1000 due to its memory
consumption only, that is selected first. And it's even unlikelier that cron
consumes 1000 points of badness. So cron will never be selected in favor of 
watchdog.

> beside that:
> would'nt it be senseful to adjust "START=01" to /etc/init.d/watchdog
> and place something like this?
> 
> pid="$( pidof watchdog )"
> echo "1000" >/proc/$pid/oom_score_adj  

yeah, could do this.

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Bastian Bittorf
> > A better way would be IMHO to use a cron.minutely which fire's
> > an ioctl to /dev/watchdog. if crond is removed, the device
> should
> > reboot. so i need a way to invoke an ioctl from shellscript.
> 
> I think this doesn't work.

in our special case it would work, because all "daemon-checking"
is done via cron. so if cron fails everything is not working anymore.

> What you could try is increasing the likeliness of the watchdog
> process
> to get killed on OOM.
> Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill
> the watchdog
> on any oom condition, as far as I can see.

Thats seems like an interesting idea.
As far i can read oom_kill.c this will add 100% to the oom_badness() of
the watchdog-kicker, but the value is calculated based on memory-consumption. 
IMHO this is not enough, because even our mini-cron
consumes more memory then the watchdog-kicker.
Is there a switch: "kill this pid at first"?

https://github.com/mirrors/linux/blob/master/mm/oom_kill.c

beside that:
would'nt it be senseful to adjust "START=01" to /etc/init.d/watchdog
and place something like this?

pid="$( pidof watchdog )"
echo "1000" >/proc/$pid/oom_score_adj

(it would'nt make sense on a desktop-system, but on a router...)

bye, Bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Bastian Bittorf
> I think jow wrote something like this already, see:
> http://luci.subsignal.org/trac/browser/luci/trunk/contrib/package/freifunk-watchdog
> 

Interesting, but has the same design-issue like already mentioned:
if the oom-killer is working it will likely kill the freifunk-watchdog and 
crond,
so to check the pidof crond from userspace does not help.

what we need is a simple tool like this:

watchdogger -d /dev/watchdog --timeout 90
watchdogger -d /dev/watchdog --kick

this does not run as a daemon, but will be called from a cron-script.
if crond is not running anymore ("oom_kill"), the device will reboot.

# just for saying it:
# the workaround this time is: (but badly spams our log)

/etc/init.d/watchdog stop

AND having a file which will be called from crond each minute:

#!/bin/sh
I=0
while [ $I -lt 60 ]; do
  I=$(( $I + 5 ))
  echo >/dev/watchdog
  sleep 5
done

bye, Bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Michael Büsch
On Wed, 28 Dec 2011 10:39:33 +
Bastian Bittorf  wrote:

> for having a better way not to lost a router i like to
> use /dev/watchdog from a shell script. the reason is this:
> 
> Sometimes the oom-killer removes important tasks like
> ssh + httpd + routing + cron but leaves the watchdog-petting on,
> so the device is running, but in fact lost.
> 
> A better way would be IMHO to use a cron.minutely which fire's
> an ioctl to /dev/watchdog. if crond is removed, the device should
> reboot. so i need a way to invoke an ioctl from shellscript.

I think this doesn't work.

What you could try is increasing the likeliness of the watchdog process
to get killed on OOM.
Setting /proc/WATCHDOGPID/oom_score_adj to 1000 will always kill the watchdog
on any oom condition, as far as I can see.

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Manuel Munz
On 28.12.2011 12:21, Florian Fainelli wrote:
> 
> I do not think this will work better. If crond is not killed by the OOM
> killer, then the watchdog keeps being kept alive, and you end up in the
> same situation. Rather I think we need some kind of software monitoring
> by a daemon like upstart which makes sures essential software is
> restarted once killed.
> -- 
> Florian

I think jow wrote something like this already, see:
http://luci.subsignal.org/trac/browser/luci/trunk/contrib/package/freifunk-watchdog

regards, soma



signature.asc
Description: OpenPGP digital signature
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Florian Fainelli

Hello Bastian,

On 12/28/11 11:39, Bastian Bittorf wrote:

hi devs,

for having a better way not to lost a router i like to
use /dev/watchdog from a shell script. the reason is this:

Sometimes the oom-killer removes important tasks like
ssh + httpd + routing + cron but leaves the watchdog-petting on,
so the device is running, but in fact lost.


Once started the watchdog daemon does not longer allocate big chunks of 
memory (if any at all), so this is kind of expected.




A better way would be IMHO to use a cron.minutely which fire's
an ioctl to /dev/watchdog. if crond is removed, the device should
reboot. so i need a way to invoke an ioctl from shellscript.

whats the best way, maybe with onboard-tools?


I do not think this will work better. If crond is not killed by the OOM 
killer, then the watchdog keeps being kept alive, and you end up in the 
same situation. Rather I think we need some kind of software monitoring 
by a daemon like upstart which makes sures essential software is 
restarted once killed.

--
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


[OpenWrt-Devel] /dev/watchdog from shell script

2011-12-28 Thread Bastian Bittorf
hi devs,

for having a better way not to lost a router i like to
use /dev/watchdog from a shell script. the reason is this:

Sometimes the oom-killer removes important tasks like
ssh + httpd + routing + cron but leaves the watchdog-petting on,
so the device is running, but in fact lost.

A better way would be IMHO to use a cron.minutely which fire's
an ioctl to /dev/watchdog. if crond is removed, the device should
reboot. so i need a way to invoke an ioctl from shellscript.

whats the best way, maybe with onboard-tools?

bye, Bastian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel