hi, if you would like to keep track of all PIDs of hanging processes, generate an error on the appearance of first hanging process, and to generate OK on the disappearance of the last hanging process, you could use this ruleset:
type=single ptype=regexp pattern=Another process is running: PID \[(\d+)\] context=$1 -> ( sub { my($n) = scalar(keys %proc); $proc{$_[0]} = time(); \ return ($n == 0 && scalar(keys %proc) == 1); } ) desc=There is one hanging process ($1) action=logonly type=single ptype=regexp pattern=Finished process: \[(\d+)\] context=$1 -> ( sub { my($n) = scalar(keys %proc); delete $proc{$_[0]}; \ return ($n == 1 && scalar(keys %proc) == 0); } ) desc=No processes hanging action=logonly In this ruleset, the %proc hash is used for storing process IDs, and the context expressions are written for detecting the first and last hanging process. In the case some processes do not generate finish messages, it is probably wise to delete older entries from the %proc hash from time to time: type=Calendar time=* * * * * desc=drop info for processes hanging for more than 1 hour action=lcall %o -> ( sub { foreach my $p (keys %proc) { \ if ($proc{$p} < time() - 3600) { delete $proc{$p}; } } } ) Hope this helps, risto 2012/1/20 Simone Martina <smart...@noc.skylogicnet.com>: > Hi at all, > I got a problem but I don't know how to find a suitable solution. > The problem is this, I got a cron script that lunch a process and writes > into a log when it starts and ends. > When a process of these takes too much time for its running, writes into > log a line like this > Another process is running: PID [12345] > When the processo 12345 end it writes: > Finished process: [12345] > > The trouble is that multiples process are started so I can have many > process hanging, so into error log I found: > Another process is running: PID [1] > Another process is running: PID [2] > Another process is running: PID [3] > > With a Single match I could send a nagios alert when founded any > "Another process is running: PID [DETERMINATED_PID]" and with another > Single I could send a Nagios OK matching a "Finished process: > [DETERMINATED_PID]" but Nagios isn't able to count if there is more > hanging process. > > So, I would like to tell SEC to count error status in a way like this: > error=0 > Another process is running: PID [1] -> error++ > Another process is running: PID [2] -> error++ > Another process is running: PID [3] -> error++ > so I got a error=3 > then a way to decrement the counter: > Finished process: [1] -> error-- > Finished process: [2] -> error-- > Finished process: [3] -> error-- > So when error==0 SEC could send a Nagios OK. > > Any suggestion will be appreciated, thanks for attention and sorry for > my poor english. > > Simone > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Simple-evcorr-users mailing list > Simple-evcorr-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users