Re: [Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-03 Thread Roger Price

On Mon, 3 Apr 2017, Spike wrote:


I'll see if I can implement it some time soon.


Hi Spike, I tested the heartbeat proposal on openSUSE 13.1 and 42.2, and 
made some changes so that it would work.  I wrote out some documentation 
which includes the required changes, which you will find at 
http://rogerprice.org/NUT.html#HEARTBEAT


Roger

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-03 Thread Spike
thank you all for your input. Roger, I'm a nut noob and only marginally
understand the implementation (from your other email), but I really like
the idea of a heartbeat and design wise it makes a lot of sense. I'll see
if I can implement it some time soon.

thank you,

Spike

On Sat, Apr 1, 2017 at 1:54 PM Roger Price  wrote:

> On Sat, 1 Apr 2017, Stuart Gathman wrote:
>
> > On 04/01/2017 03:14 PM, Dan Craciun wrote:
> >> On my Nagios monitoring system I use check_nut_plus (that in turn
> >> calls upsc) to monitor the status (ups.status), load (ups.load),
> >> battery charge (battery.charge) and runtime (battery.runtime).
> >>
> >> If these return "unknown", it means upsd is no longer monitoring the
> >> UPS. As long as you get data, upsd is working.
> >>
> > That's great, but Spike wants to know whether *upsmon* is working.  He
> > already has a way to check that upsd is working.
>
> How about using a dummy ups to set up a regular end-to-end heart beat.
> As long as the heart beats, there is no news, but if it stops,
> upssched-cmd sends out an e-mail or other warning.
>
> In ups.conf, add
>
> [heartbeat]
>  driver = dummy-ups
>  port = heartbeat.dev
>  desc = "Dummy ups sends heart beat to upssched-cmd"
>
> In heartbeat.dev, write
>
> ups.status: REPLBATT
> TIMER 300
>
> In upsmon.conf, write
>
> NOTIFYFLAG REPLBATT SYSLOG+EXEC
>
> In upssched.conf, add
>
> # Heatbeat from dummy ups every 5 minutes, re-start 6 minute timer
> AT REPLBATT heartbeat CANCEL-TIMER heatbeat-timer
> AT REPLBATT heartbeat START-TIMER  heatbeat-timer 360
>
> In upssched-cmd, if heatbeat-timer completes, then send "UPS heatbeat
> failure" message to sysadmin.
>
> If this works, let me know, and I will use it myself :-)
> It would be nice to have a HEARTBEAT status instead of using REPLBATT.
>
> Roger
>
> ___
> Nut-upsuser mailing list
> Nut-upsuser@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser
>
___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

Re: [Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-01 Thread Roger Price

On Sat, 1 Apr 2017, Stuart Gathman wrote:


On 04/01/2017 03:14 PM, Dan Craciun wrote:

On my Nagios monitoring system I use check_nut_plus (that in turn
calls upsc) to monitor the status (ups.status), load (ups.load),
battery charge (battery.charge) and runtime (battery.runtime).

If these return "unknown", it means upsd is no longer monitoring the
UPS. As long as you get data, upsd is working.


That's great, but Spike wants to know whether *upsmon* is working.  He
already has a way to check that upsd is working.


How about using a dummy ups to set up a regular end-to-end heart beat. 
As long as the heart beats, there is no news, but if it stops, 
upssched-cmd sends out an e-mail or other warning.


In ups.conf, add

[heartbeat]
driver = dummy-ups
port = heartbeat.dev
desc = "Dummy ups sends heart beat to upssched-cmd"

In heartbeat.dev, write

ups.status: REPLBATT
TIMER 300

In upsmon.conf, write

NOTIFYFLAG REPLBATT SYSLOG+EXEC

In upssched.conf, add

# Heatbeat from dummy ups every 5 minutes, re-start 6 minute timer
AT REPLBATT heartbeat CANCEL-TIMER heatbeat-timer
AT REPLBATT heartbeat START-TIMER  heatbeat-timer 360

In upssched-cmd, if heatbeat-timer completes, then send "UPS heatbeat 
failure" message to sysadmin.


If this works, let me know, and I will use it myself :-)
It would be nice to have a HEARTBEAT status instead of using REPLBATT.

Roger

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-01 Thread Stuart Gathman
On 04/01/2017 03:14 PM, Dan Craciun wrote:
> On my Nagios monitoring system I use check_nut_plus (that in turn
> calls upsc) to monitor the status (ups.status), load (ups.load),
> battery charge (battery.charge) and runtime (battery.runtime).
>
> If these return "unknown", it means upsd is no longer monitoring the
> UPS. As long as you get data, upsd is working.
>
> PS: as an example, this is my check for the status:
> /usr/bin/perl -w $USER$/check_nut_plus -d $ARG1$@$HOSTADDRESS$ -v
> 'ups.status=c!~^OL'
That's great, but Spike wants to know whether *upsmon* is working.  He
already has a way to check that upsd is working.

I don't have a complete solution, but I use NOTIFYCMD in upsmon.conf to
run upssched.  As part of upssched.conf, I append NOCOMM (and COMMOK)
events to a log file.  If NOCOMM in ups.log is not followed by COMMOK,
then upsmon will not shut down the system.  NOPARENT should probably be
logged also, as that makes upsmon unable to shutdown the system.

I agree that this "no news is good news" policy is not ideal - but I've
found it much more effective that no monitoring.

Note this also - if upsmon can't check UPS status, then nagios almost
certainly can't either.

To test, set up upsmon on a remote machine, and block 3493/tcp (nut) in
the firewall on the machine running upsd.  Nagios should scream.


___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser


Re: [Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-01 Thread Dan Craciun
On my Nagios monitoring system I use check_nut_plus (that in turn calls
upsc) to monitor the status (ups.status), load (ups.load), battery
charge (battery.charge) and runtime (battery.runtime).

If these return "unknown", it means upsd is no longer monitoring the
UPS. As long as you get data, upsd is working.

PS: as an example, this is my check for the status:
/usr/bin/perl -w $USER$/check_nut_plus -d $ARG1$@$HOSTADDRESS$ -v
'ups.status=c!~^OL'

HTH,
Dan Craciun

On 4/1/2017 9:00 PM, Spike wrote:
> Dear all,
>
> I got nut going on one machine as standalone and on another 2 as
> master/slave and would like to add some checks to nagios to make sure
> that things are in order.
>
> Most of the checks I've seen out there use upsc to check the ups. This
> is a step forward compared to no monitoring, however as far as I can
> tell it doesn't really address what I think is a critical point:
> upsmon is actually monitoring the ups [and will shut down the box if
> needed].
>
> I looked at the upsd and upsmon man pages, but can't see anything like
> a "status" command that will show me if the connection is healthy (I
> noticed that when I restart the daemons I get a log line saying
> "Communications with UPS eaton5s@127.0.0.1 
> established", but I can't seem to find a place to access that). I
> could in theory check if the port is in use/ESTABLISHED, lsof -i:3493,
> but it's not great.
>
> Is there any command I can run that will confirm if upsmon is
> correctly connected?
>
> thanks,
>
> Spike
>
>
> ___
> Nut-upsuser mailing list
> Nut-upsuser@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

[Nut-upsuser] how do you test (nagios) that upsmon is connected?

2017-04-01 Thread Spike
Dear all,

I got nut going on one machine as standalone and on another 2 as
master/slave and would like to add some checks to nagios to make sure that
things are in order.

Most of the checks I've seen out there use upsc to check the ups. This is a
step forward compared to no monitoring, however as far as I can tell it
doesn't really address what I think is a critical point: upsmon is actually
monitoring the ups [and will shut down the box if needed].

I looked at the upsd and upsmon man pages, but can't see anything like a
"status" command that will show me if the connection is healthy (I noticed
that when I restart the daemons I get a log line saying "Communications
with UPS eaton5s@127.0.0.1 established", but I can't seem to find a place
to access that). I could in theory check if the port is in use/ESTABLISHED,
lsof -i:3493, but it's not great.

Is there any command I can run that will confirm if upsmon is correctly
connected?

thanks,

Spike
___
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser