Thanks for the *awesome* response John!

Here's what I've set up.
If I run the rule below using 'sec -conf /etc/sec.conf -debug 10
-input=/var/log/syslog'
I get:
SEC (Simple Event Correlator) 2.4.2
Reading configuration from /etc/sec.conf
2 rules loaded from /etc/sec.conf
Creating context 'alert_10.48.36.42_4087'
Deleting context 'alert_10.48.36.42_4087'
Context 'alert_10.48.36.42_4087' deleted
Feeding event 'Jun  3 16:00:05 10.48.36.42 187792: 188237: Jun  3
16:00:04.214 BST: %RTT-3-IPSLATHRESHOLD: IP SLAs(4087): Threshold Cleared
for timeout' to shell command '/usr/bin/mail -s "IP SLA - Cleared"
cdu...@cisco.com'
Child 26406 created for command '/usr/bin/mail -s "IP SLA - Cleared"
cdu...@cisco.com'

So, the question is:
Why does it trigger on a non-matching message?
See notes in the config below for what I was trying to accomplish, hopefully
a bit more succinct this time :-)




# Match IP SLA events
# Jun  3 11:39:08 10.48.36.39 334031: 334645: Jun  3 11:39:08.208 BST:
%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 3550
# Jun  3 11:39:33 10.48.36.39 334037: 334651: Jun  3 11:39:33.232 BST:
%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3550
#
# The two rules below should watch for a "condition occurred" event and,
# if no matching "condition cleared" event comes in for the same IP with the
same
# probe number BEFORE a new "condition occurred" event comes in for that
device/probe #
# then we need to trigger an alert
# (note that probe # is the "entry number =" in the example above)
type = single
desc = email when an alert is seen while one is asserted
continue = takenext
context = alert_$1_$2
ptype = regexp
rem = $1 is ip address, $2 is entry number
pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*condition
occurred, entry number = (\d+)
action = pipe '$0' /usr/bin/mail -v "Dropped syslog entry. Found alert while
it was pending" cdu...@cisco.com; delete alert_$1_$2; r
eset +1 match the alert for host $1 and event $2

type = pair
desc = match the alert for host $1 and event $2
ptype = regexp
rem = same pattern as in single rule
pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*condition
cleared, entry number = (\d+)
action = create alert_$1_$2
desc2 = match the clear for host $1 and event $2
ptype2 = regexp
pattern2 = $1.*SLAs\($2\)
rem = %1 and %2 are used to reference $1 and $2 from the first pattern
rem = since this is run after pattern2 is matched, $1 and $2 are
rem = reassigned by the pattern2 match
action2 = delete alert_%1_%2; pipe '$0' /usr/bin/mail -s "IP SLA - Cleared"
cdu...@cisco.com
time = 0
#; shellcmd do something else useful





On Wed, Jun 2, 2010 at 11:54 PM, John P. Rouillard <rou...@cs.umb.edu>wrote:

>
> In message <aanlktimeb0zy73gsrorp4ivs_ws-cxpkw7ybpn3h1...@mail.gmail.com>,
> Clayton Dukes writes:
> >I'm trying to come up with a way to match Cisco IP SLA syslog events.
> >
> >Messages come in from 10 different IP SLA shadow routers that look similar
> >to this:
> >Jun  3 03:13:53 10.48.36.33 394491: 379185: Jun  3 03:13:52.982 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3419): Threshold Occurred for
> connectionLoss
> >Jun  3 03:13:54 10.48.36.37 330506: 331273: Jun  3 03:13:53.592 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3242): Threshold Occurred for timeout
> >Jun  3 03:13:54 10.48.36.37 330507: 331274: Jun  3 03:13:54.688 BST:
> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3364
> >Jun  3 03:13:54 10.48.36.39 331498: 332112: Jun  3 03:13:53.916 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3263): Threshold Occurred for timeout
> >Jun  3 03:13:55 10.48.36.37 330508: 331275: Jun  3 03:13:54.704 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3364): Threshold Cleared for timeout
> >Jun  3 03:13:56 10.48.36.39 331499: 332113: Jun  3 03:13:56.816 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1398): Threshold exceeded for packetLossDS
> >Jun  3 03:13:56 10.48.36.39 331500: 332114: Jun  3 03:13:56.916 BST:
> >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 3321
> >Jun  3 03:13:57 10.48.36.39 331501: 332115: Jun  3 03:13:56.932 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3321): Threshold Occurred for timeout
> >Jun  3 03:13:57 10.48.36.39 331502: 332116: Jun  3 03:13:57.812 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1402): Threshold exceeded for packetLossDS
> >Jun  3 03:14:01 10.48.36.42 184927: 185372: Jun  3 03:14:01.167 BST:
> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 4030
> >Jun  3 03:14:02 10.48.36.42 184928: 185373: Jun  3 03:14:01.179 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(4030): Threshold Cleared for timeout
> >Jun  3 03:14:07 10.48.36.37 330510: 331277: Jun  3 03:14:06.005 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1442): Threshold below for packetLossDS
> >Jun  3 03:14:07 10.48.36.42 184929: 185374: Jun  3 03:14:07.936 BST:
> >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 4096
> >Jun  3 03:14:08 10.48.36.33 394492: 379186: Jun  3 03:14:07.942 BST:
> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418
> >Jun  3 03:14:08 10.48.36.33 394493: 379187: Jun  3 03:14:07.954 BST:
> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout
> >
> >For example:
> >I need to find a "condition occurred, entry number = 4030" and match it to
> >"condition cleared, entry number = 4030"
> >The time frame doesn't matter - what does matter is that if I receive
> >another "condition occurred, entry number = 4030" before I receive a clear
> >for that probe number (4030 in this case), then that means I lost a syslog
> >message somewhere (since it's impossible to get a new condition without a
> >clear). In this case, I need to trigger an email.
> >
> >I need to do this for every device and unique probe # (there are
> thousands).
>
> Where are the probe number and device identifier in the above syslog
> messages? Once you have that, a pair rule and a single rule will work
> fine.
>
> Let's assume the alert and clear are these two messages:
>
> alert:
> >Jun  3 03:14:08 10.48.36.33 394492: 379186: Jun  3 03:14:07.942 BST:
>   %RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418
>
> clear:
> >Jun  3 03:14:08 10.48.36.33 394493: 379187: Jun  3 03:14:07.954 BST:
>   %RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout
>
> They match because of the occurrence of 10.48.36.33 as the device
> identifier and the entry/probe number 3418 occurring in both messages.
>
> First create a rule that emails if you see an alert while you were
> still looking for an clear for that alert.
>
>  type = single
>  desc = email when an alert is seen while one is asserted
>  continue = takenext
>  context = alert_$1_$2
>  ptype = regexp
>  rem = $1 is ip address, $2 is entry number
>  pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number =
> (\d+)
>  action = pipe '$0' /bin/mail -v "Dropped syslog entry. Found alert while
> it was pending" root; delete alert_$1_$2; reset +1 match the alert for host
> $1 and event $2
>
> This parses a line that matches the alert line above. It captures the
> ip address and the entry number. It then uses that info to see if a
> context called "alert_10.48.36.33_3418" (using the alert example
> above) exists. If the context doesn't exist the rule is skipped and no
> email is sent. If the context does exist:
>
>   send email
>   delete the context
>   cancel a the correlation started by the next rule (+1 is the next rule
>       in the file) that has a key (desc field) that matches "match the
>       alert for host 10.48.36.33 and event 3418". This allows the pair
>       rule following to start a new correlation operation.
>   pass the current event to the next rule (because of continue=takenext)
>
> The next rule looks for an looks for the alert/clear pattern.
>
>  type = pair
>  desc = match the alert for host $1 and event $2
>  ptype = regexp
>  rem = same pattern as in single rule
>  pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number =
> (\d+)
>  action = create alert_$1_$2
>  desc2 = match the clear for host $1 and event $2
>  ptype2 = regexp
>  pattern2 = $1.*SLAs\($2\)
>  rem = %1 and %2 are used to reference $1 and $2 from the first pattern
>  rem = since this is run after pattern2 is matched, $1 and $2 are
>  rem = reassigned by the pattern2 match
>  action2 = delete alert_%1_%2; shellcmd do something else useful
>  time = 0
>
> Ok, let's see how these work. Let's assume you have just started so
> the context alert_10.48.36.33_3418 is not defined.
>
> Scenario 1: the alert arrives. It matches the pair rule and starts a
> correlation operation looking for the clear. Then the clear comes in,
> it matches the correlation started by the pair rule and something
> useful happens.
>
> Scenario 2: the alert arrives. At this point the alert_10.48.36.33_3418
> context is defined.  Its clear is dropped. Another alert arrives. It
> now matches the first rule (type = single) and sends email. The single
> rule resets the pair rule and the alert is now passed to the pair rule
> where it creates the new alert_10.48.36.33_3418 context and waits for
> a clear to arrive. If a clear arrives, the pair rule captures it (the
> single rule ignores the clear since it doesn't match the pattern). If
> another alert arrives again email is sent and the old pair correlation
> is cancelled and a new pair correlation is opened.
>
> If the single rule didn't reset the pair rule the alert would be
> consumed and ignored by the pair rule and not start a new correlation.
> Depending on what you want to do if the pair is discovered, resetting
> the pair correlation may or may not matter.
>
> >1.1.1.1, probe 1 should have matching pairs
> >1.1.1.1, probe 2 should have a matching pair
> >2.2.2.2, probe 3 should match
> >etc.
>
> Not sure how your "probes" relate to the message sequences you show
> above, but hopefully this gives you some idea of how to handle it.
>
> (Note I didn't test the rulesets above and they may be missing
> required keywords, may not work on alternate Fridays etc. No bits were
> harmed writing this message also some brain cells were quite annoyed
> 8-).)
>
> --
>                                -- rouilj
> John Rouillard
> ===========================================================================
> My employers don't acknowledge my existence much less my opinions.
>



-- 
______________________________________________________________

Clayton Dukes
______________________________________________________________
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to