In message <aanlktimeb0zy73gsrorp4ivs_ws-cxpkw7ybpn3h1...@mail.gmail.com>, Clayton Dukes writes: >I'm trying to come up with a way to match Cisco IP SLA syslog events. > >Messages come in from 10 different IP SLA shadow routers that look similar >to this: >Jun 3 03:13:53 10.48.36.33 394491: 379185: Jun 3 03:13:52.982 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3419): Threshold Occurred for connectionLoss >Jun 3 03:13:54 10.48.36.37 330506: 331273: Jun 3 03:13:53.592 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3242): Threshold Occurred for timeout >Jun 3 03:13:54 10.48.36.37 330507: 331274: Jun 3 03:13:54.688 BST: >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3364 >Jun 3 03:13:54 10.48.36.39 331498: 332112: Jun 3 03:13:53.916 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3263): Threshold Occurred for timeout >Jun 3 03:13:55 10.48.36.37 330508: 331275: Jun 3 03:13:54.704 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3364): Threshold Cleared for timeout >Jun 3 03:13:56 10.48.36.39 331499: 332113: Jun 3 03:13:56.816 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(1398): Threshold exceeded for packetLossDS >Jun 3 03:13:56 10.48.36.39 331500: 332114: Jun 3 03:13:56.916 BST: >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 3321 >Jun 3 03:13:57 10.48.36.39 331501: 332115: Jun 3 03:13:56.932 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3321): Threshold Occurred for timeout >Jun 3 03:13:57 10.48.36.39 331502: 332116: Jun 3 03:13:57.812 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(1402): Threshold exceeded for packetLossDS >Jun 3 03:14:01 10.48.36.42 184927: 185372: Jun 3 03:14:01.167 BST: >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 4030 >Jun 3 03:14:02 10.48.36.42 184928: 185373: Jun 3 03:14:01.179 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(4030): Threshold Cleared for timeout >Jun 3 03:14:07 10.48.36.37 330510: 331277: Jun 3 03:14:06.005 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(1442): Threshold below for packetLossDS >Jun 3 03:14:07 10.48.36.42 184929: 185374: Jun 3 03:14:07.936 BST: >%RTT-4-OPER_TIMEOUT: condition occurred, entry number = 4096 >Jun 3 03:14:08 10.48.36.33 394492: 379186: Jun 3 03:14:07.942 BST: >%RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418 >Jun 3 03:14:08 10.48.36.33 394493: 379187: Jun 3 03:14:07.954 BST: >%RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout > >For example: >I need to find a "condition occurred, entry number = 4030" and match it to >"condition cleared, entry number = 4030" >The time frame doesn't matter - what does matter is that if I receive >another "condition occurred, entry number = 4030" before I receive a clear >for that probe number (4030 in this case), then that means I lost a syslog >message somewhere (since it's impossible to get a new condition without a >clear). In this case, I need to trigger an email. > >I need to do this for every device and unique probe # (there are thousands).
Where are the probe number and device identifier in the above syslog messages? Once you have that, a pair rule and a single rule will work fine. Let's assume the alert and clear are these two messages: alert: >Jun 3 03:14:08 10.48.36.33 394492: 379186: Jun 3 03:14:07.942 BST: %RTT-4-OPER_TIMEOUT: condition cleared, entry number = 3418 clear: >Jun 3 03:14:08 10.48.36.33 394493: 379187: Jun 3 03:14:07.954 BST: %RTT-3-IPSLATHRESHOLD: IP SLAs(3418): Threshold Cleared for timeout They match because of the occurrence of 10.48.36.33 as the device identifier and the entry/probe number 3418 occurring in both messages. First create a rule that emails if you see an alert while you were still looking for an clear for that alert. type = single desc = email when an alert is seen while one is asserted continue = takenext context = alert_$1_$2 ptype = regexp rem = $1 is ip address, $2 is entry number pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number = (\d+) action = pipe '$0' /bin/mail -v "Dropped syslog entry. Found alert while it was pending" root; delete alert_$1_$2; reset +1 match the alert for host $1 and event $2 This parses a line that matches the alert line above. It captures the ip address and the entry number. It then uses that info to see if a context called "alert_10.48.36.33_3418" (using the alert example above) exists. If the context doesn't exist the rule is skipped and no email is sent. If the context does exist: send email delete the context cancel a the correlation started by the next rule (+1 is the next rule in the file) that has a key (desc field) that matches "match the alert for host 10.48.36.33 and event 3418". This allows the pair rule following to start a new correlation operation. pass the current event to the next rule (because of continue=takenext) The next rule looks for an looks for the alert/clear pattern. type = pair desc = match the alert for host $1 and event $2 ptype = regexp rem = same pattern as in single rule pattern = ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*entry number = (\d+) action = create alert_$1_$2 desc2 = match the clear for host $1 and event $2 ptype2 = regexp pattern2 = $1.*SLAs\($2\) rem = %1 and %2 are used to reference $1 and $2 from the first pattern rem = since this is run after pattern2 is matched, $1 and $2 are rem = reassigned by the pattern2 match action2 = delete alert_%1_%2; shellcmd do something else useful time = 0 Ok, let's see how these work. Let's assume you have just started so the context alert_10.48.36.33_3418 is not defined. Scenario 1: the alert arrives. It matches the pair rule and starts a correlation operation looking for the clear. Then the clear comes in, it matches the correlation started by the pair rule and something useful happens. Scenario 2: the alert arrives. At this point the alert_10.48.36.33_3418 context is defined. Its clear is dropped. Another alert arrives. It now matches the first rule (type = single) and sends email. The single rule resets the pair rule and the alert is now passed to the pair rule where it creates the new alert_10.48.36.33_3418 context and waits for a clear to arrive. If a clear arrives, the pair rule captures it (the single rule ignores the clear since it doesn't match the pattern). If another alert arrives again email is sent and the old pair correlation is cancelled and a new pair correlation is opened. If the single rule didn't reset the pair rule the alert would be consumed and ignored by the pair rule and not start a new correlation. Depending on what you want to do if the pair is discovered, resetting the pair correlation may or may not matter. >1.1.1.1, probe 1 should have matching pairs >1.1.1.1, probe 2 should have a matching pair >2.2.2.2, probe 3 should match >etc. Not sure how your "probes" relate to the message sequences you show above, but hopefully this gives you some idea of how to handle it. (Note I didn't test the rulesets above and they may be missing required keywords, may not work on alternate Fridays etc. No bits were harmed writing this message also some brain cells were quite annoyed 8-).) -- -- rouilj John Rouillard =========================================================================== My employers don't acknowledge my existence much less my opinions. ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users