Hi all: I had a few hours this evening to try out some ideas, and sadly I found out I had forgottne something I recognized the first time I was thinking through this.
In message <201208151856.q7fiuax7026...@mx1.cs.umb.edu>, "John P. Rouillard" writes: >In message ><cagfjscmhp+8fj9gjs+o-p47uvqcxxnynx0cfwf4z9s-nh4d...@mail.gmail.com> , >Risto Vaarandi writes: > >>2012/8/14 John P. Rouillard <rou...@cs.umb.edu>: >>> I know this should be doable, and I know my current >>> thinking is barking up the wrong tree, but my brain is >>> locked into doing something and it's not quite working. >>> >>> Here's what happened. In order to get the output from >>> postgres into the proper form to be used by an >>> performance analysis application, I have to change the >>> prefix for every log message. >>> >>> Needless to say that will require some changes to the >>> exiting parsing rules in SEC. >>> >>> My inital thought was that I could set up a single rule >>> to parse the new prefix and then use the results of that >>> parsing in additional rules via >>> $+{named_variables}. This would let me do 3 things: >>> >>> 1) limit the number of places I need to write the prefix >>> regexp to extract fields (it may change again in >>> the future) >>> 2) only apply the regexp for the prefix extraction once >>> rather than dozens of times for each rule in the >>> ruleset >>> 3) as a result of 2 eliminate a performance bottleneck >>> caused by applying the somewhat complex prefix >>> regexp for every rule >>> >>> My initial setup was: >>> >>> type = single >>> ptype = regexp >>> pattern = ..... >>> varmap = pg_prefix; date=1; host=2; db=3; user=4; level=5; \ >>> event=6; eorder=7 >>> action = none >>> continue=takenext >>> >>> type = single >>> pattern = LOG: duration: (\d+\.\d+) >>> context = !pg_log_event_$+{host}_$+{event} && =( $1 > 2 ) >>> rem = here is an action specific to the duration log lines. >>> action add pg_log_event_$+{host}_$+{event} $0 >>> >>> and so forth. For a varmap without a pg_prefix; >>> argument, the $+{} variables aren't defined except in >>> the scope of the rule with the varmap. This makes sense >>> as their underlying variable $1...$N only have the scope >>> of a single rule. >>> >>> However, it turns out you can't access the $+{} >>> variables in the second rule even with the pg_prefix; >>> argument unless the second rule is of the form: >>> >>> type = single >>> ptype = cached >>> pattern = pg_prefix >>> .... >> >>Just to clarify the question -- are you trying to use $+{} >>match variables in the second rule after the first rule >>has matched, but use another regular expression for the >>pattern in the second rule? > >Exactly. >> >>If so, then unfortunately this way of matching does not >>work, since $+{} and $N match variables have identical >>scopes. They can both be referred only within the current >>rule, and following matches always reset them (unless they >>are cached and then retrieved from the cache with a >>'Cached' pattern). > >>One way to address the problem would be to set up >>appropriate context(s) and/or action list variable(s) for >>passing matching results from one rule to another. Sadly I thought that would be a solution, but it's not. I forgot "action list variables" as their name implies are only good in action lists. So for the following rules: =================== type = single desc = clean all action variables continue = takenext ptype = tvalue pattern = TRUE action = assign %hostname NULL; \ assign %pid NULL; \ assign %thread NULL; \ assign %seq %EMPTY; \ assign %user ''; \ assign %db ; \ assign %level () (Note of these only %thread is empty afterwards, %db and %level both take on the value of the description %s and user has the value of two single quotes. So my musing in the prior email in this chain were wrong wrong wrong. Is there a way to actually delete an action variable?) type = single desc = parse postgres event preamble and assign variables continue = takenext ptype = regexp rem = # 2012-02-02T17:28:28+00:00 ops02.mia1.renesys.com postgres[12657]: \ rem = # [5-1] 2012-02-02 17:28:28 UTC dbuser@db(12657)i: LOG: pattern = ^[^\s]+ ([\w\d._-]+) postgres\[([0-9]+)\]: \[(\d+)-(\d+)\] .{10} .{8} UTC ([]\w[]+)\@([]\w[]+).*?i: ([A-Z]+): varmap = preamble; preamble2; hostname=1; pid=2; thread=3; seq=4; user=5; db=6; level=7 action = assign %hostname $1; \ assign %pid $2; \ assign %thread $3; \ assign %seq $4; \ assign %user $5; \ assign %db $6; \ assign %level $7; type = single desc = parse postgres event preamble and assign variables for continuation lines continue = takenext ptype = regexp rem = Jul 1 00:08:56 dev02 postgres[18417]: [4-2] #011SELECT pattern = ^[^\s]+ ([\w\d._-]+) postgres\[([0-9]+)\]: \[(\d+)-(\d+)\] #011 varmap = preamble2; hostname=1; pid=2; thread=3; seq=4 action = assign %hostname $1; \ assign %pid $2; \ assign %thread $3; \ assign %seq $4 #****f* 30postgres.sr/report_slow_queries # SYNOPSIS # Report slow queries for the past hour # DESCRIPTION # Record any slow queries over the span of an hour and report when the # hour timer runs out. The definition of a slow query at this time is # determined by the postgres setting for logging slow queries. However # the $1 parameter is set to the duration info so a context $2 > 10000 # could be added to segment the list of slow queries into slow, # really slow, glacial if needed. # # It detects the start line with the duration report then gathers # any output from the same postgres process event (determined by pid/thread) # for the next 20 seconds to gather the multiple lines of output # in a slow query dump. # INPUTS # Recognized input events are of the form: # Jun 27 04:08:16 store03 postgres[30929]: [7-1] 2010-06-27 \ # 04:08:16 UTC dbuser@database(30929)i: LOG: duration: \ # 13386.579 ms statement: COPY summary_1277607600 FROM STDIN # USING DELIMITERS E'#011' #or # Jul 1 00:08:56 db02 postgres[18417]: [4-1] 2010-07-01 \ # 00:08:56 UTC dbuser@database(18417)i: LOG: duration: \ # 2196.061 ms statement: INSERT INTO feed_prefix # Jul 1 00:08:56 db02 postgres[18417]: [4-2] #011SELECT # Jul 1 00:08:56 db02 postgres[18417]: [4-3] #011 P.id AS id, # ... # the key element is the duration keyword, pid and log thread # number (the 4 in 4-1 above). The thread is used to identify # all the output lines associated with the original duration statement. # There can be lines in the following 20 seconds that are not associated # with the duration report. # NOTES # Uses three single rules. One to create # a postgres_slow_query_on_<host> with an hour timeout. One # to add to an existing postgres_slow_query_on_<host> # context. One to match "host postgres[pid]: [thread- # for 20 seconds after the initial duration line arrives. # #****** type = single desc = capture slow query, reporting context exists (1) ptype = regexp rem = $1 = duration pattern = LOG:\s+duration: ([\d.]+) ms context = postgres_slow_query_on_%{hostname} action = add postgres_slow_query_on_%{hostname} ==================; \ add postgres_slow_query_on_%{hostname} $0; \ create postgres_slow_query_on_%{hostname}_%{pid}_%{thread} 20 type = single desc = capture slow query reporting context not exist (2) ptype = regexp rem = $1 = duration pattern = LOG:\s+duration: ([\d.]+) ms context = ! postgres_slow_query_on_%{hostname} action = create postgres_slow_query_on_%{hostname} 3600 report \ postgres_slow_query_on_%{hostname} \ /bin/mailx -s "Slow postgres queries on %{hostname}" rouilj; \ add postgres_slow_query_on_%{hostname} $0; \ create postgres_slow_query_on_%{hostname}_%{pid}_%{thread} 20 type = single desc = capture slow query lines after the initial trigger (3) ptype = tvalue pattern = true context = postgres_slow_query_on_%{hostname}_%{pid}_%{thread} action = add postgres_slow_query_on_%{hostname} $0 ========== (Yes I realize these rules are sloppy, they are a testing set derived from a production ruleset.) Only the rule marked (2) is ever triggered. Rules marked (1) and (3) are never triggered because the context expressions: postgres_slow_query_on_%{hostname}_%{pid}_%{thread} evaluate to either (in rule 3's case): postgres_slow_query_on___ or literally postgres_slow_query_on_%{hostname}_%{pid}_%{thread} neither of which is matched by the properly substituted contexts that are created by the actions. 8-/. The actions that add/create contexts work as expected, however in testing I did end up with the action on delete operating in an interesting (but correct IMO) fashion. Context Name: postgres_slow_query_on_db02 Creation Time: Wed Aug 15 21:55:28 2012 Lifetime: 3600 seconds Action on delete: report postgres_slow_query_on_%{hostname} /bin/mailx -s "Slow postgres queries on %{hostname}" rouilj; (%s = capture slow query context not exist) Note the action on delete still has multiple unexpanded %{hostname} in it. When the context expired, the values were substituted with the values defined at the time the context was defined (so it acts as a closure). Which is what I expected/hoped would happen, but I can't find anywhere in the man page that states that it should act like that. >Yeah I mentioned the context work around in the lower part of my >original message. The problem with them is: > [...] > 3) If I use contexts or perl array, how would I use it in a context > or desc keyword? Answer, you don't and there's the problem. >If I use an explicit variable then in the preamble parsing rule: > > action = assign %{host} $1; assign ${pid} $2; ... for all 7 values > >and in my following rules I use: > > type = ... > desc = find login/logout pair for %{host}(%{pid}) > .... > context = !postgres_%{host}_%{pid} > action = create postgres_%{host}_%{pid} > >That looks like the best alternative for right now But %{host} and %{pid} are action variables and thus don't get properly processed in a desc or context string. >Maybe it's time for %+{[label:]host} %+{[label]:pid} to access the >values from the existing varmap defined with: > varmap= host=1; pid=2; .... > >specified without a label or > > varmap = label; host=1; pid=2; .... > >if specified with a label. > >This makes thing easier from the rule writer's perspective as it >places the housekeeping burden on SEC rather than on me. I see three ways to make my rules above work in some fashion. -- method 1 If the %+ mechanism were to be implemented, I would expect that defining a varmap of: varmap = label1; label2; host=1; user=2 would define: %+{label1:host} %+{label2:host} %+{label1:user} %+{label2:user} similar to how I think that defines: %+{host} %+{user} for rules using either ptype = cached pattern = label1 or ptype = cached pattern = label2 -- method 2 Another way to get the same/similar effect is to import the cached variables. Add an optional import = label2, label3 keyword to each rule that imports the variables defined in the cache as though the rule were: ptype = cached pattern = label2 (I just realized you can only pull in one cached set of variables.) But assume you could pull in more than one set of variables. Given: varmap = label2; host=1; user=2 (aka host=earnie user=bert) varmap = label3; host=1; cookie=2 (aka host=cookie_monster cookie=oatmeal) after import = label2, label3 you would have: $+{host}=cookie_monster $+{user}=bert $+{cookie}=oatmeal (i.e. label3's values took precidence over label 2) but after import = label3, label2 you would have: $+{host}=earnie $+{user}=bert $+{cookie}=oatmeal Now if that was used with: type=single ptype = regexp pattern = cookie is (\w+) varmap = cookie=1 import = label3, label2 and the event is "cookie is chocolate", $+ would be: $+{host}=earnie $+{user}=bert $+{cookie}=chocolate this rule: type=single ptype = regexp pattern = cookie is (?<cookie>\w+) import = label3, label2 would have the same $+ array as above with the chocolte cookie. So the import operation would have to occur right before the pattern and varmap are applied so they could override the values in the imported maps. So to implement my rules above, I would add 'import = prefix2' to my rules and use $+{hostname} etc. (I wonder if this method would make ptype cached obsolete. I wonder if something like: context = =(defined($+{prefix2})) may be equivalent to: ptype = cached pattern = prefix2 ) -- method 3 A third way to make this work would be to extend the scope of the "action variables". I wonder what the implications of changing action variables to be substituted like "match variables" so they are replaced everywhere a variable like $1 or $0 would be replaced. Well everywhere with the possible exception of regular expressions. If you had: pattern = %{prefix} #011 you would need to keep substituting and recompiling the pattern for every event which is really expensive. -- -- rouilj John Rouillard =========================================================================== My employers don't acknowledge my existence much less my opinions. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users