hi,
if you would like to keep track of all PIDs of hanging processes,
generate an error on the appearance of first hanging process, and to
generate OK on the disappearance of the last hanging process, you
could use this ruleset:

type=single
ptype=regexp
pattern=Another process is running: PID \[(\d+)\]
context=$1 -> ( sub { my($n) = scalar(keys %proc); $proc{$_[0]} = time(); \
                      return ($n == 0 && scalar(keys %proc) == 1); } )
desc=There is one hanging process ($1)
action=logonly

type=single
ptype=regexp
pattern=Finished process: \[(\d+)\]
context=$1 -> ( sub { my($n) = scalar(keys %proc); delete $proc{$_[0]}; \
                      return ($n == 1 && scalar(keys %proc) == 0); } )
desc=No processes hanging
action=logonly

In this ruleset, the %proc hash is used for storing process IDs, and
the context expressions are written for detecting the first and last
hanging process. In the case some processes do not generate finish
messages, it is probably wise to delete older entries from the %proc
hash from time to time:

type=Calendar
time=* * * * *
desc=drop info for processes hanging for more than 1 hour
action=lcall %o -> ( sub { foreach my $p (keys %proc) { \
                     if ($proc{$p} < time() - 3600) { delete $proc{$p}; } } } )

Hope this helps,
risto

2012/1/20 Simone Martina <smart...@noc.skylogicnet.com>:
> Hi at all,
> I got a problem but I don't know how to find a suitable solution.
> The problem is this, I got a cron script that lunch a process and writes
> into a log when it starts and ends.
> When a process of these takes too much time for its running, writes into
> log a line like this
> Another process is running: PID [12345]
> When the processo 12345 end it writes:
> Finished process: [12345]
>
> The trouble is that multiples process are started so I can have many
> process hanging, so into error log I found:
> Another process is running: PID [1]
> Another process is running: PID [2]
> Another process is running: PID [3]
>
> With a Single match I could send a nagios alert when founded any
> "Another process is running: PID [DETERMINATED_PID]" and with another
> Single I could send a Nagios OK matching a "Finished process:
> [DETERMINATED_PID]" but Nagios isn't able to count if there is more
> hanging process.
>
> So, I would like to tell SEC to count error status in a way like this:
> error=0
> Another process is running: PID [1] -> error++
> Another process is running: PID [2] -> error++
> Another process is running: PID [3] -> error++
> so I got a error=3
> then a way to decrement the counter:
> Finished process: [1] -> error--
> Finished process: [2] -> error--
> Finished process: [3] -> error--
> So when error==0 SEC could send a Nagios OK.
>
> Any suggestion will be appreciated, thanks for attention and sorry for
> my poor english.
>
> Simone
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Simple-evcorr-users mailing list
> Simple-evcorr-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to