On Fri, Oct 07, 2011 at 10:15:19AM +0200, Pascal Brugier wrote: > Hello, > > Dis you know if a 1wire bus monitoring Nagios plugin exist ? > I search on internet but only find temperature sensor monitoring with > Nagios. > > The idea is to detect bus error and disconnect, if it's relevant. > If it's not a so bad idea can you tel me wich file have to be monitored > on the owfs dir structure, in a better way, what exactly is the meaning
It is a bus, so often when something is wrong, you won't be able to read anything from it (any device that is). I personally dump all my temp sensors to a file every minute, and I look for read error. Any read error is obviously a problem. I would ignore messages about bus reset and short circuits in syslog, I've had those for as long as I've had 1-wire and I'm pretty damn confident I don't have a short circuit that happens once randomly every other day or so whether it's dry or wet (it might be more useful if it told you which branch of your hub had a short circuit alledgedly, but without that info, go figure). I personally power cycle the hub and restart owfs each time owfs goes south (sometimes owserver also dies, even if that's rare) and restart it. Either way, you don't need to monitor as much as detect and restart as soon as it happens. This is what my crontab looks like if it helps: * * * * * root read_owfs | sort > /var/log/temperatures.owfs; cat /var/log/temperatures.owfs >> /var/log/temperatures; ! test -s /var/log/temperatures.owfs && echo "temperatures.owfs empty -> owfs bus has problems: rebooting owfs hub and restarting owfs." && /etc/init.d/owfs force-restart # # This is kind of redundant with the owfs empty test above: * * * * * root sleep 45; pgrep -x owserver >/dev/null || bash -c "echo 'owserver was found not running anymore'; /etc/init.d/owfs restart" * * * * * root sleep 50; pgrep -x owfs >/dev/null || bash -c "echo 'owfs was found not running anymore'; /etc/init.d/owfs restart" */10 * * * * root if [ `tail -1000 /var/log/temperatures | grep '35 Front_Lawn' | sed 's/.* 35 //' | sort -u | wc -l` -le 1 ]; then echo "Moisture Sensor Got Stuck on `tail -1000 /var/log/temperatures | grep '35 Front_Lawn' | sed 's/.* 35 //' | sort -u`"; /etc/init.d/owfs force-restart; fi force restart uses a powered hub to reset the interface, and X10 to power cycle the hub: force-restart) usbpower 2 off $0 stop brecho D10 off "owfs hub off"; sleep 1; brecho D10 on "owfs hub on" sleep 1 usbpower 2 on sleep 3 $0 stop $0 start Hope this helps, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Owfs-developers mailing list Owfs-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/owfs-developers