In message <[EMAIL PROTECTED]>,
Tim Peiffer writes:
>John P. Rouillard wrote:
>> In message <[EMAIL PROTECTED]>,
>> Tim Peiffer writes:
>>> [...]
>>> Consider the following log trace:
>>> 61 msec sz 212456 rss 210232 sock ovfl 244477
>>>[...]
>>> Given that the log lines has timings, process sizes and count of UDP 
>>> socket overflows, I wish to compare say the last 5 traces, and if all 
>>> have timings that exceed X msec, I wish to restart the service.
>
>The sketch you offered should accomplish most of what I want. The one 
>issue remaining is that the overflow is a counter rather than a guage.

Ahh. Yeah that screws things up.

>I am capturing socket overflows, dropped full socket buffers, 
>udpInOverflows and packet receive errors from netstat -s.  I wrote the 
>monitor a number of years ago.. I suppose I could just keep a local file 
>to cache the last counter, but I was hoping I wouldn't need to do that.  
>Is there a way to cache values from the previous run in a context?

Using a miniprogram yes. My example:

>> If you need to count the total number of socket resets over the prior
>> 5 events and only if the sum is > Y do you restart, a context like:
>>
>>   =(unshift @overflow, $4; $#overflow=4; $sum=0; map {$sum+=$_} @foo; return 
>> $sum > Y)

Note that I reset $sum every time through. $sum is persistent. So you
could use a mini-program in the context (line split for readability):

  =( $difference = $4 - $prevval; $prevval = $4; 
     if ($difference >= 0) {
        return $difference > Y;
     } else {
        return ($maxint+$difference) > Y;
     } )

where $4 is the overflow value and $maxint is the max value of the counter
before it rolls over.

>The log traces are run every 5 minutes off of cron, and are a self 
>measure of performance (or lack thereof).

Ok then that simplifies the ruleset because you can swap an event
count for a time window. Using my example, you drop rules 3 and 5 (the
single rules) and set the window on rules 2 and 4 (the
singlewiththreshold rules ) to 1500 seconds (5 min/event * 60
seconds/min * 5 failing events) rather than to 0.

--
                                -- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Simple-evcorr-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to