hi Richard,

just one followup thought -- have you considered sec native multi-line
patterns such as RegexpN for handling multi-line logs? Of course, there are
scenarios where the value of N (max number of lines in a multi-line event)
can be very large and is difficult to predict, and for such cases an event
converter (like the one from the previous post) is probably the best
approach. However, if you know the value of N and events can be matched
with regular expressions, RegexpN pattern type can be used for this task.
For example, if events have the brace separated format described in the
previous post, and events can contain up to 20 lines, one could utilize the
following Regexp20 pattern for matching:

type=Single
ptype=RegExp20
pattern=(?s)^(?:.+\n)?\{\n(.+\n)?\}$
desc=match a multiline event between braces
action=write - $1

Also, if you want to convert such multi-line events into a single-line
format with builtin features, sec 'rewrite' action allows for that. In the
following example, the first rule takes the multi-line data between braces
and replaces each newline with a space character, and resulting single-line
string (with a prefix "Converted event: ") is used for overwriting sec
event buffer. The second rule is now able to match such converted events:

type=Single
ptype=RegExp20
pattern=(?s)^(?:.+\n)?\{\n(?:(.+)\n)?\}$
continue=TakeNext
desc=convert a multiline event between braces to single-line format
action=lcall %ret $1 -> ( sub { my($t) = $_[0]; $t =~ s/\n/ /g; return $t;
} ); \
       rewrite 20 Converted event: %ret

type=Single
ptype=RegExp
pattern=Converted event: (.*)
desc=match any event
action=write - $1

Maybe above examples are helpful for getting additional insights into
different ways of processing multi-line events.

kind regards,
risto


hi Richard,
>>>
>> ...
>>
>>> In the current code base, identifying the end of each line is done with
>>> a simple search for newline character. The newline is searched not with a
>>> regular expression, but rather with index() function which is much faster.
>>> It is of course possible to change the code, so that a regular expression
>>> pattern is utilized instead, but that would introduce a noticeable
>>> performance penalty. For example, I made couple of quick tests with
>>> replacing the index() function with a regular expression that identifies
>>> the newline separator, and when testing modified sec code against log files
>>> of 4-5 million events, cpu time consumption increased by 25%.
>>>
>>
>> Hmm, this is interesting. The philosophically principial question came to
>> my mind, if this penalty could be decreased (optimized), when doing
>> replacements of these regular newline characters ("\n") and matching
>> endings of "lines" with regexp - through rules (or by other more external
>> way) - before further processing by subsequent rules, instead of potential
>> built-in feature (used optionally on particular logfiles).
>>
>>
> Perhaps I can add few thoughts here. Since the number of multi-line
> formats is essentially infinite, converting multi-line format into
> single-line representation externally (i.e., outside sec) offers most
> flexibility. For instance, in many cases there is no delimiter as such
> between messages, but beginning and end of the message contain different
> character sequences that are part of the message. In addition, any lines
> that are not between valid beginning and end should be discarded. It is
> clear that using one regular expression for matching delimiters is not
> addressing this scenario properly. Also, one can imagine many other
> multi-line formats, and coming up with a single builtin approach for all of
> them is not possible. On the other hand, a custom external converter allows
> for addressing a given event format exactly as we like. For example,
> suppose we are dealing with the following format, where multi-line event
> starts with a lone opening brace on a separate line, and ends with a lone
> closing brace:
>
> {
>   line1
>   line2
>   ...
> }
>
> For converting such events into a single line format, the following simple
> wrapper could be utilized (written in 10 minutes):
>
> #!/usr/bin/perl -w
> # the name of this wrapper is test.pl
>
> if (scalar(@ARGV) != 1) { die "Usage: $0 <file>\n"; }
> $file = $ARGV[0];
> if (!open(FILE, "tail -F $file |")) { die "Can't start tail for $file\n"; }
> $| = 1;
>
> while (<FILE>) {
>   chomp;
>   if (/^{$/) { $message = $_; }
>   elsif (/^}$/ && defined($message)) {
>     $message .= $_;
>     print $message, "\n";
>     $message = undef;
>   }
>   elsif (defined($message)) {
>     $message .= $_;
>   }
> }
>
> If this wrapper is then started from sec with 'spawn' or 'cspawn' action,
> multi-line events from monitored file will appear as single-line synthetic
> events for sec. For example:
>
> type=Single
> ptype=RegExp
> pattern=^(?:SEC_STARTUP|SEC_RESTART)$
> context=SEC_INTERNAL_EVENT
> desc=fork the converter when sec is started or restarted
> action=spawn ./test.pl my.log
>
> type=Single
> ptype=RegExp
> pattern=\{importantmessage\}
> desc=test
> action=write - important message was received
>
> The second rule fires if the following 4-line event is written into my.log:
>
> {
> important
> message
> }
>
> My apologies if the above example is a bit laconic, but hopefully it
> conveys the overall idea how to set up an event converter. And writing a
> suitable converter is often taking not that much time, plus you get
> something which is tailored exactly to your needs :)
>
> kind regards,
> risto
>
>
>
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to