Hi Guys,
Any insight you can give would be greatly appreciated, I would love to get
this working :-)


On Thu, Jun 3, 2010 at 11:06 AM, Clayton Dukes <cdu...@gmail.com> wrote:

> Thanks for the *awesome* response John!
>
> Here's what I've set up.
> If I run the rule below using 'sec -conf /etc/sec.conf -debug 10
> -input=/var/log/syslog'
> I get:
> SEC (Simple Event Correlator) 2.4.2
> Reading configuration from /etc/sec.conf
> 2 rules loaded from /etc/sec.conf
> Creating context 'alert_10.48.36.42_4087'
> Deleting context 'alert_10.48.36.42_4087'
> Context 'alert_10.48.36.42_4087' deleted
> Feeding event 'Jun  3 16:00:05 10.48.36.42 187792: 188237: Jun  3
> 16:00:04.214 BST: %RTT-3-IPSLATHRESHOLD: IP SLAs(4087): Threshold Cleared
> for timeout' to shell command '/usr/bin/mail -s "IP SLA - Cleared"
> cdu...@cisco.com'
> Child 26406 created for command '/usr/bin/mail -s "IP SLA - Cleared"
> cdu...@cisco.com'
>
> So, the question is:
> Why does it trigger on a non-matching message?
> See notes in the config below for what I was trying to accomplish,
> hopefully a bit more succinct this time :-)
>
>
>
>
> # Match IP SLA events
> # Jun  3 11:39:08 10.48.36.39 334031: 334645: Jun  3 11:39:08.208 BST:
> %RTT-4-OPER_TIMEOUT: condition occurred, entry number = 3550
> # Jun  3 11:39:33 10.48.36.39 334037: 334651: Jun  3 11:39:33.232 BST:
> %RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3550
> #
> # The two rules below should watch for a "condition occurred" event and,
> # if no matching "condition cleared" event comes in for the same IP with
> the same
> # probe number BEFORE a new "condition occurred" event comes in for that
> device/probe #
> # then we need to trigger an alert
> # (note that probe # is the "entry number =" in the example above)
>
> type = single
> desc = email when an alert is seen while one is asserted
> continue = takenext
> context = alert_$1_$2
> ptype = regexp
> rem = $1 is ip address, $2 is entry number
> pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*condition
> occurred, entry number = (\d+)
> action = pipe '$0' /usr/bin/mail -v "Dropped syslog entry. Found alert
> while it was pending" cdu...@cisco.com; delete alert_$1_$2; r
>
> eset +1 match the alert for host $1 and event $2
>
> type = pair
> desc = match the alert for host $1 and event $2
> ptype = regexp
> rem = same pattern as in single rule
> pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*condition
> cleared, entry number = (\d+)
>
> action = create alert_$1_$2
> desc2 = match the clear for host $1 and event $2
> ptype2 = regexp
> pattern2 = $1.*SLAs\($2\)
> rem = %1 and %2 are used to reference $1 and $2 from the first pattern
> rem = since this is run after pattern2 is matched, $1 and $2 are
> rem = reassigned by the pattern2 match
> action2 = delete alert_%1_%2; pipe '$0' /usr/bin/mail -s "IP SLA - Cleared"
> cdu...@cisco.com
> time = 0
>
> #; shellcmd do something else useful
>
>
>
>
>
> On Wed, Jun 2, 2010 at 11:54 PM, John P. Rouillard <rou...@cs.umb.edu>wrote:
>
>>
>> In message <aanlktimeb0zy73gsrorp4ivs_ws-cxpkw7ybpn3h1...@mail.gmail.com
>> >,
>> Clayton Dukes writes:
>> >I'm trying to come up with a way to match Cisco IP SLA syslog events.
>> >
>> >Messages come in from 10 different IP SLA shadow routers that look
>> similar
>> >to this:
>> >Jun  3 03:13:53 10.48.36.33 394491: 379185: Jun  3 03:13:52.982 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3419): Threshold Occurred for
>> connectionLoss
>> >Jun  3 03:13:54 10.48.36.37 330506: 331273: Jun  3 03:13:53.592 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3242): Threshold Occurred for timeout
>> >Jun  3 03:13:54 10.48.36.37 330507: 331274: Jun  3 03:13:54.688 BST:
>> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3364
>> >Jun  3 03:13:54 10.48.36.39 331498: 332112: Jun  3 03:13:53.916 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3263): Threshold Occurred for timeout
>> >Jun  3 03:13:55 10.48.36.37 330508: 331275: Jun  3 03:13:54.704 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3364): Threshold Cleared for timeout
>> >Jun  3 03:13:56 10.48.36.39 331499: 332113: Jun  3 03:13:56.816 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1398): Threshold exceeded for packetLossDS
>> >Jun  3 03:13:56 10.48.36.39 331500: 332114: Jun  3 03:13:56.916 BST:
>> >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 3321
>> >Jun  3 03:13:57 10.48.36.39 331501: 332115: Jun  3 03:13:56.932 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3321): Threshold Occurred for timeout
>> >Jun  3 03:13:57 10.48.36.39 331502: 332116: Jun  3 03:13:57.812 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1402): Threshold exceeded for packetLossDS
>> >Jun  3 03:14:01 10.48.36.42 184927: 185372: Jun  3 03:14:01.167 BST:
>> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 4030
>> >Jun  3 03:14:02 10.48.36.42 184928: 185373: Jun  3 03:14:01.179 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(4030): Threshold Cleared for timeout
>> >Jun  3 03:14:07 10.48.36.37 330510: 331277: Jun  3 03:14:06.005 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(1442): Threshold below for packetLossDS
>> >Jun  3 03:14:07 10.48.36.42 184929: 185374: Jun  3 03:14:07.936 BST:
>> >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 4096
>> >Jun  3 03:14:08 10.48.36.33 394492: 379186: Jun  3 03:14:07.942 BST:
>> >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418
>> >Jun  3 03:14:08 10.48.36.33 394493: 379187: Jun  3 03:14:07.954 BST:
>> >%RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout
>> >
>> >For example:
>> >I need to find a "condition occurred, entry number = 4030" and match it
>> to
>> >"condition cleared, entry number = 4030"
>> >The time frame doesn't matter - what does matter is that if I receive
>> >another "condition occurred, entry number = 4030" before I receive a
>> clear
>> >for that probe number (4030 in this case), then that means I lost a
>> syslog
>> >message somewhere (since it's impossible to get a new condition without a
>> >clear). In this case, I need to trigger an email.
>> >
>> >I need to do this for every device and unique probe # (there are
>> thousands).
>>
>> Where are the probe number and device identifier in the above syslog
>> messages? Once you have that, a pair rule and a single rule will work
>> fine.
>>
>> Let's assume the alert and clear are these two messages:
>>
>> alert:
>> >Jun  3 03:14:08 10.48.36.33 394492: 379186: Jun  3 03:14:07.942 BST:
>>   %RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418
>>
>> clear:
>> >Jun  3 03:14:08 10.48.36.33 394493: 379187: Jun  3 03:14:07.954 BST:
>>   %RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout
>>
>> They match because of the occurrence of 10.48.36.33 as the device
>> identifier and the entry/probe number 3418 occurring in both messages.
>>
>> First create a rule that emails if you see an alert while you were
>> still looking for an clear for that alert.
>>
>>  type = single
>>  desc = email when an alert is seen while one is asserted
>>  continue = takenext
>>  context = alert_$1_$2
>>  ptype = regexp
>>  rem = $1 is ip address, $2 is entry number
>>  pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number
>> = (\d+)
>>  action = pipe '$0' /bin/mail -v "Dropped syslog entry. Found alert while
>> it was pending" root; delete alert_$1_$2; reset +1 match the alert for host
>> $1 and event $2
>>
>> This parses a line that matches the alert line above. It captures the
>> ip address and the entry number. It then uses that info to see if a
>> context called "alert_10.48.36.33_3418" (using the alert example
>> above) exists. If the context doesn't exist the rule is skipped and no
>> email is sent. If the context does exist:
>>
>>   send email
>>   delete the context
>>   cancel a the correlation started by the next rule (+1 is the next rule
>>       in the file) that has a key (desc field) that matches "match the
>>       alert for host 10.48.36.33 and event 3418". This allows the pair
>>       rule following to start a new correlation operation.
>>   pass the current event to the next rule (because of continue=takenext)
>>
>> The next rule looks for an looks for the alert/clear pattern.
>>
>>  type = pair
>>  desc = match the alert for host $1 and event $2
>>  ptype = regexp
>>  rem = same pattern as in single rule
>>  pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number
>> = (\d+)
>>  action = create alert_$1_$2
>>  desc2 = match the clear for host $1 and event $2
>>  ptype2 = regexp
>>  pattern2 = $1.*SLAs\($2\)
>>  rem = %1 and %2 are used to reference $1 and $2 from the first pattern
>>  rem = since this is run after pattern2 is matched, $1 and $2 are
>>  rem = reassigned by the pattern2 match
>>  action2 = delete alert_%1_%2; shellcmd do something else useful
>>  time = 0
>>
>> Ok, let's see how these work. Let's assume you have just started so
>> the context alert_10.48.36.33_3418 is not defined.
>>
>> Scenario 1: the alert arrives. It matches the pair rule and starts a
>> correlation operation looking for the clear. Then the clear comes in,
>> it matches the correlation started by the pair rule and something
>> useful happens.
>>
>> Scenario 2: the alert arrives. At this point the alert_10.48.36.33_3418
>> context is defined.  Its clear is dropped. Another alert arrives. It
>> now matches the first rule (type = single) and sends email. The single
>> rule resets the pair rule and the alert is now passed to the pair rule
>> where it creates the new alert_10.48.36.33_3418 context and waits for
>> a clear to arrive. If a clear arrives, the pair rule captures it (the
>> single rule ignores the clear since it doesn't match the pattern). If
>> another alert arrives again email is sent and the old pair correlation
>> is cancelled and a new pair correlation is opened.
>>
>> If the single rule didn't reset the pair rule the alert would be
>> consumed and ignored by the pair rule and not start a new correlation.
>> Depending on what you want to do if the pair is discovered, resetting
>> the pair correlation may or may not matter.
>>
>> >1.1.1.1, probe 1 should have matching pairs
>> >1.1.1.1, probe 2 should have a matching pair
>> >2.2.2.2, probe 3 should match
>> >etc.
>>
>> Not sure how your "probes" relate to the message sequences you show
>> above, but hopefully this gives you some idea of how to handle it.
>>
>> (Note I didn't test the rulesets above and they may be missing
>> required keywords, may not work on alternate Fridays etc. No bits were
>> harmed writing this message also some brain cells were quite annoyed
>> 8-).)
>>
>> --
>>                                -- rouilj
>> John Rouillard
>>
>> ===========================================================================
>> My employers don't acknowledge my existence much less my opinions.
>>
>
>
>
> --
> ______________________________________________________________
>
> Clayton Dukes
> ______________________________________________________________
>



-- 
______________________________________________________________

Clayton Dukes
______________________________________________________________
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to