Re: [Simple-evcorr-users] Trying to report extended NFS problems along with an OK.

Risto Vaarandi Tue, 18 Feb 2014 15:17:25 -0800

hi Douglas,
you could set up a context after an alarm is sent, and issue
AllClear-message for "hostA kernel: nfs server hostb:/filesystem: is alive
again" only if the context exists. Here is an example:


type=PairWithWindow
ptype=RegExp
pattern=(\S+) kernel: nfs server (\S+): not responding
desc=$1: remote fs $2 not responding
action=write - %s; create ALERT_SENT_NODE_$1_FS_$2
ptype2=SubStr
pattern2=$1 kernel: nfs server $2: is alive again
continue2=TakeNext
desc2=$1: remote fs $2 alive again
action2=none
window=60

type=Single
ptype=RegExp
pattern=(\S+) kernel: nfs server (\S+): is alive again
context=ALERT_SENT_NODE_$1_FS_$2
desc=$1: remote fs $2 alive again
action=write - %s; delete ALERT_SENT_NODE_$1_FS_$2

if an error condition is detected at some host for some remote file system
which does not get cleared within 60 seconds, a warning string is written
to standard output by the operation started by the first rule. The second
rule is a Single rule which sends an AllClear message for a host and a file
system if we have issued a warning for this particular host-filesystem
combination previously.
Also, if your actual messages don't start with the host name but have a
preceding timestamp, it is wise to write pattern2 field of the first rule as
pattern2=\s$1 kernel: nfs server $2: is alive again
in order to avoid accidental match by a longer hostname which has the same
ending but extra leading characters (e.g., AhostA or BBhostA)

hope this helps,
risto



2014-02-19 0:31 GMT+02:00 Douglas K. Rand <r...@iteris.com>:

> With the usual BSD syslog messages related to NFS problems:
>
> hostA kernel: nfs server hostb:/filesystem: not responding
> ...
> hostA kernel: nfs server hostb:/filesystem: is alive again
>
> What I'm trying to do is generate an alert email if we see a "not
> responding" message with out a corresponding "is alive again" within 60
> seconds. But if we sent out the alert email I also want to send an
> all-clear email when we do eventually get the "alive again" message,
> perhaps even hours later.
>
> I've figured out how to do one or the other: I can generate the alert
> email if a "not responding" message is not followed by a "alive again"
> message with in 60 seconds.
>
> And I can generate the all-clear message for each and every "alive
> again" message.
>
> But putting them together is stumping me. I only want to send the
> all-clear message if we already have sent out the alert email; but if we
> don't send out the alert there is no reason to send out the all-clear.
>
> I was thinking if when the "alive again" message came in if I could do
> something like:
>
> if ($age > 60) report context mail -s "NFS all-clear" rand;
> delete context
>
> I'm not even sure that is the right approach, seems to not fit into SEC,
> even if I could figure out how to do it.
>
> Advice anybody?
>
>
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> _______________________________________________
> Simple-evcorr-users mailing list
> Simple-evcorr-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk

_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] Trying to report extended NFS problems along with an OK.

Reply via email to