Re: [Simple-evcorr-users] Incremental parsing of an event using multiple rules

John P. Rouillard Wed, 15 Aug 2012 19:42:50 -0700

Hi all:

I had a few hours this evening to try out some ideas, and
sadly I found out I had forgottne something I recognized the
first time I was thinking through this.


In message <201208151856.q7fiuax7026...@mx1.cs.umb.edu>,
"John P. Rouillard" writes:
>In message
><cagfjscmhp+8fj9gjs+o-p47uvqcxxnynx0cfwf4z9s-nh4d...@mail.gmail.com> ,
>Risto Vaarandi writes:
>
>>2012/8/14 John P. Rouillard <rou...@cs.umb.edu>:
>>> I know this should be doable, and I know my current
>>> thinking is barking up the wrong tree, but my brain is
>>> locked into doing something and it's not quite working.
>>>
>>> Here's what happened. In order to get the output from
>>> postgres into the proper form to be used by an
>>> performance analysis application, I have to change the
>>> prefix for every log message.
>>>
>>> Needless to say that will require some changes to the
>>> exiting parsing rules in SEC.
>>>
>>> My inital thought was that I could set up a single rule
>>> to parse the new prefix and then use the results of that
>>> parsing in additional rules via
>>> $+{named_variables}. This would let me do 3 things:
>>>

>>>   1) limit the number of places I need to write the prefix
>>>      regexp to extract fields (it may change again in
>>>      the future)
>>>   2) only apply the regexp for the prefix extraction once
>>>      rather than dozens of times for each rule in the
>>>      ruleset
>>>   3) as a result of 2 eliminate a performance bottleneck
>>>      caused by applying the somewhat complex prefix
>>>      regexp for every rule
>>>
>>> My initial setup was:
>>>
>>>  type = single
>>>  ptype = regexp
>>>  pattern = .....
>>>  varmap = pg_prefix; date=1; host=2; db=3; user=4; level=5; \
>>>           event=6; eorder=7
>>>  action = none
>>>  continue=takenext
>>>
>>>  type = single
>>>  pattern = LOG:  duration: (\d+\.\d+)
>>>  context = !pg_log_event_$+{host}_$+{event} && =( $1 > 2 )
>>>  rem = here is an action specific to the duration log lines.
>>>  action add pg_log_event_$+{host}_$+{event} $0

>>>
>>> and so forth. For a varmap without a pg_prefix;
>>> argument, the $+{} variables aren't defined except in
>>> the scope of the rule with the varmap. This makes sense
>>> as their underlying variable $1...$N only have the scope
>>> of a single rule.
>>>
>>> However, it turns out you can't access the $+{}
>>> variables in the second rule even with the pg_prefix;
>>> argument unless the second rule is of the form:
>>>
>>> type = single
>>> ptype = cached
>>> pattern = pg_prefix
>>> ....
>>
>>Just to clarify the question -- are you trying to use $+{}
>>match variables in the second rule after the first rule
>>has matched, but use another regular expression for the
>>pattern in the second rule?
>
>Exactly.
>>
>>If so, then unfortunately this way of matching does not
>>work, since $+{} and $N match variables have identical
>>scopes. They can both be referred only within the current
>>rule, and following matches always reset them (unless they
>>are cached and then retrieved from the cache with a
>>'Cached' pattern).
>
>>One way to address the problem would be to set up
>>appropriate context(s) and/or action list variable(s) for
>>passing matching results from one rule to another.

Sadly I thought that would be a solution, but it's not. I forgot
"action list variables" as their name implies are only good in action
lists. So for the following rules:

===================
type = single
desc = clean all action variables
continue = takenext
ptype = tvalue
pattern = TRUE
action = assign %hostname NULL; \
         assign %pid NULL; \
         assign %thread NULL; \
         assign %seq %EMPTY; \
         assign %user ''; \
         assign %db ; \
         assign %level ()

(Note of these only %thread is empty afterwards, %db and %level both
take on the value of the description %s and user has the value of two
single quotes. So my musing in the prior email in this chain were
wrong wrong wrong. Is there a way to actually delete an action
variable?)

type = single
desc = parse postgres event preamble and assign variables
continue = takenext
ptype = regexp
rem = #   2012-02-02T17:28:28+00:00 ops02.mia1.renesys.com postgres[12657]: \
rem = #      [5-1] 2012-02-02 17:28:28 UTC dbuser@db(12657)i: LOG:
pattern = ^[^\s]+ ([\w\d._-]+) postgres\[([0-9]+)\]: \[(\d+)-(\d+)\] .{10} .{8} 
UTC ([]\w[]+)\@([]\w[]+).*?i: ([A-Z]+):
varmap = preamble; preamble2; hostname=1; pid=2; thread=3; seq=4; user=5; db=6; 
level=7
action = assign %hostname $1; \
         assign %pid $2; \
         assign %thread $3; \
         assign %seq $4; \
         assign %user $5; \
         assign %db $6; \
         assign %level $7;

type = single
desc = parse postgres event preamble and assign variables for continuation lines
continue = takenext
ptype = regexp
rem =  Jul  1 00:08:56 dev02 postgres[18417]: [4-2] #011SELECT
pattern = ^[^\s]+ ([\w\d._-]+) postgres\[([0-9]+)\]: \[(\d+)-(\d+)\] #011
varmap = preamble2; hostname=1; pid=2; thread=3; seq=4
action = assign %hostname $1; \
         assign %pid $2; \
         assign %thread $3; \
         assign %seq $4

#****f* 30postgres.sr/report_slow_queries
# SYNOPSIS
# Report slow queries for the past hour
# DESCRIPTION
# Record any slow queries over the span of an hour and report when the
# hour timer runs out. The definition of a slow query at this time is
# determined by the postgres setting for logging slow queries. However
# the $1 parameter is set to the duration info so a context $2 > 10000
# could be added to segment the list of slow queries into slow,
# really slow, glacial if needed.
#
# It detects the start line with the duration report then gathers
# any output from the same postgres process event (determined by pid/thread)
# for the next 20 seconds to gather the multiple lines of output
# in a slow query dump.
# INPUTS
# Recognized input events are of the form:
#    Jun 27 04:08:16 store03 postgres[30929]: [7-1] 2010-06-27 \
#         04:08:16 UTC dbuser@database(30929)i: LOG:  duration: \
#         13386.579 ms  statement: COPY summary_1277607600 FROM STDIN
#         USING DELIMITERS E'#011'
#or
#    Jul  1 00:08:56 db02 postgres[18417]: [4-1] 2010-07-01 \
#         00:08:56 UTC dbuser@database(18417)i: LOG:  duration: \
#         2196.061 ms  statement: INSERT INTO feed_prefix
#    Jul  1 00:08:56 db02 postgres[18417]: [4-2] #011SELECT
#    Jul  1 00:08:56 db02 postgres[18417]: [4-3] #011  P.id AS id,
# ...
# the key element is the duration keyword, pid and log thread
# number (the 4 in 4-1 above). The thread is used to identify
# all the output lines associated with the original duration statement.
# There can be lines in the following 20 seconds that are not associated
# with the duration report.
# NOTES
# Uses three single rules. One to create
# a postgres_slow_query_on_<host> with an hour timeout. One
# to add to an existing postgres_slow_query_on_<host>
# context. One to match "host postgres[pid]: [thread-
# for 20 seconds after the initial duration line arrives.
#
#******
type = single
desc = capture slow query, reporting context exists (1)
ptype = regexp
rem = $1 = duration
pattern = LOG:\s+duration: ([\d.]+) ms
context = postgres_slow_query_on_%{hostname}
action = add postgres_slow_query_on_%{hostname} ==================; \
         add postgres_slow_query_on_%{hostname} $0; \
         create postgres_slow_query_on_%{hostname}_%{pid}_%{thread} 20

type = single
desc = capture slow query reporting context not exist (2)
ptype = regexp
rem = $1 = duration
pattern = LOG:\s+duration: ([\d.]+) ms
context = ! postgres_slow_query_on_%{hostname}
action = create postgres_slow_query_on_%{hostname} 3600 report \
            postgres_slow_query_on_%{hostname} \
            /bin/mailx -s "Slow postgres queries on %{hostname}" rouilj; \
         add postgres_slow_query_on_%{hostname} $0; \
         create postgres_slow_query_on_%{hostname}_%{pid}_%{thread} 20

type = single
desc = capture slow query lines after the initial trigger (3)
ptype = tvalue
pattern = true
context = postgres_slow_query_on_%{hostname}_%{pid}_%{thread}
action = add postgres_slow_query_on_%{hostname} $0
==========

(Yes I realize these rules are sloppy, they are a testing set derived
from a production ruleset.)

Only the rule marked (2) is ever triggered. Rules marked (1) and (3)
are never triggered because the context expressions:

  postgres_slow_query_on_%{hostname}_%{pid}_%{thread}

evaluate to either (in rule 3's case):

    postgres_slow_query_on___

or literally

    postgres_slow_query_on_%{hostname}_%{pid}_%{thread}

neither of which is matched by the properly substituted
contexts that are created by the actions. 8-/.

The actions that add/create contexts work as expected,
however in testing I did end up with the action on delete
operating in an interesting (but correct IMO) fashion.

Context Name: postgres_slow_query_on_db02
Creation Time: Wed Aug 15 21:55:28 2012
Lifetime: 3600 seconds
Action on delete: report postgres_slow_query_on_%{hostname} /bin/mailx -s "Slow 
postgres queries on %{hostname}" rouilj;  (%s = capture slow query context not 
exist)

Note the action on delete still has multiple unexpanded
%{hostname} in it. When the context expired, the values were
substituted with the values defined at the time the context
was defined (so it acts as a closure). Which is what I
expected/hoped would happen, but I can't find anywhere in
the man page that states that it should act like that.

>Yeah I mentioned the context work around in the lower part of my
>original message. The problem with them is:
> [...]
>   3) If I use contexts or perl array, how would I use it in a context
>     or desc  keyword?

Answer, you don't and there's the problem.

>If I use an explicit variable then in the preamble parsing rule:
>    
>    action = assign %{host} $1; assign ${pid} $2; ... for all 7 values
>
>and in my following rules I use:
>
>      type = ...
>      desc = find login/logout pair for %{host}(%{pid})
>      ....
>      context = !postgres_%{host}_%{pid}
>      action = create postgres_%{host}_%{pid}
>
>That looks like the best alternative for right now 


But %{host} and %{pid} are action variables and thus don't get
properly processed in a desc or context string.

>Maybe it's time for %+{[label:]host} %+{[label]:pid} to access the
>values from the existing varmap defined with:
>
  varmap= host=1; pid=2; ....
>
>specified without a label or
>
> varmap = label; host=1; pid=2; ....
>
>if specified with a label.
>
>This makes thing easier from the rule writer's perspective as it
>places the housekeeping burden on SEC rather than on me.

I see three ways to make my rules above work in some
fashion.

-- method 1
If the %+ mechanism were to be implemented, I
would expect that defining a varmap of:

varmap = label1; label2; host=1; user=2

would define:

  %+{label1:host} 
  %+{label2:host} 
  %+{label1:user} 
  %+{label2:user} 
 
similar to how I think that defines:

  %+{host} 
  %+{user} 

for rules using either

   ptype = cached
   pattern = label1

or

   ptype = cached
   pattern = label2

-- method 2
Another way to get the same/similar effect is to import the
cached variables. Add an optional

  import = label2, label3

keyword to each rule that imports the variables defined in
the cache as though the rule were:

ptype = cached
pattern = label2

(I just realized you can only pull in one cached set of
variables.) But assume you could pull in more than one set
of variables. Given:

  varmap = label2; host=1; user=2 (aka  host=earnie user=bert)

  varmap = label3; host=1; cookie=2 (aka  host=cookie_monster cookie=oatmeal)

after import = label2, label3 you would have:

  $+{host}=cookie_monster
  $+{user}=bert
  $+{cookie}=oatmeal

(i.e. label3's values took precidence over label 2) but
after import = label3, label2 you would have:

  $+{host}=earnie
  $+{user}=bert
  $+{cookie}=oatmeal

Now if that was used with:

  type=single
  ptype = regexp
  pattern = cookie is (\w+)
  varmap = cookie=1
  import = label3, label2

and the event is "cookie is chocolate", $+ would be:

  $+{host}=earnie
  $+{user}=bert
  $+{cookie}=chocolate

this rule:

  type=single
  ptype = regexp
  pattern = cookie is (?<cookie>\w+)
  import = label3, label2

would have the same $+ array as above with the chocolte
cookie.

So the import operation would have to occur right before the
pattern and varmap are applied so they could override the
values in the imported maps.

So to implement my rules above, I would add 'import =
prefix2' to my rules and use $+{hostname} etc.

(I wonder if this method would make ptype cached obsolete. I
wonder if something like:

   context = =(defined($+{prefix2}))

may be equivalent to:

  ptype = cached
  pattern = prefix2
)

-- method 3
A third way to make this work would be to extend the scope
of the "action variables". I wonder what the implications of
changing action variables to be substituted like "match
variables" so they are replaced everywhere a variable like
$1 or $0 would be replaced.

Well everywhere with the possible exception of regular
expressions. If you had:

 pattern = %{prefix} #011

you would need to keep substituting and recompiling the
pattern for every event which is really expensive.

--
                                -- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] Incremental parsing of an event using multiple rules

Reply via email to