hi Richard,

Risto, thank you for your pre-analysis about multi-lines with regexp, and
> also for suggestions about multi-files yet more sophisticated solution.
>
> My comments are also inline:
>
> st 27. 11. 2019 o 15:07 Risto Vaarandi <risto.vaara...@gmail.com>
> napísal(a):
>
>> hi Richard,
>>
> ...
>
>> In the current code base, identifying the end of each line is done with a
>> simple search for newline character. The newline is searched not with a
>> regular expression, but rather with index() function which is much faster.
>> It is of course possible to change the code, so that a regular expression
>> pattern is utilized instead, but that would introduce a noticeable
>> performance penalty. For example, I made couple of quick tests with
>> replacing the index() function with a regular expression that identifies
>> the newline separator, and when testing modified sec code against log files
>> of 4-5 million events, cpu time consumption increased by 25%.
>>
>
> Hmm, this is interesting. The philosophically principial question came to
> my mind, if this penalty could be decreased (optimized), when doing
> replacements of these regular newline characters ("\n") and matching
> endings of "lines" with regexp - through rules (or by other more external
> way) - before further processing by subsequent rules, instead of potential
> built-in feature (used optionally on particular logfiles).
>
>
Perhaps I can add few thoughts here. Since the number of multi-line formats
is essentially infinite, converting multi-line format into single-line
representation externally (i.e., outside sec) offers most flexibility. For
instance, in many cases there is no delimiter as such between messages, but
beginning and end of the message contain different character sequences that
are part of the message. In addition, any lines that are not between valid
beginning and end should be discarded. It is clear that using one regular
expression for matching delimiters is not addressing this scenario
properly. Also, one can imagine many other multi-line formats, and coming
up with a single builtin approach for all of them is not possible. On the
other hand, a custom external converter allows for addressing a given event
format exactly as we like. For example, suppose we are dealing with the
following format, where multi-line event starts with a lone opening brace
on a separate line, and ends with a lone closing brace:

{
  line1
  line2
  ...
}

For converting such events into a single line format, the following simple
wrapper could be utilized (written in 10 minutes):

#!/usr/bin/perl -w
# the name of this wrapper is test.pl

if (scalar(@ARGV) != 1) { die "Usage: $0 <file>\n"; }
$file = $ARGV[0];
if (!open(FILE, "tail -F $file |")) { die "Can't start tail for $file\n"; }
$| = 1;

while (<FILE>) {
  chomp;
  if (/^{$/) { $message = $_; }
  elsif (/^}$/ && defined($message)) {
    $message .= $_;
    print $message, "\n";
    $message = undef;
  }
  elsif (defined($message)) {
    $message .= $_;
  }
}

If this wrapper is then started from sec with 'spawn' or 'cspawn' action,
multi-line events from monitored file will appear as single-line synthetic
events for sec. For example:

type=Single
ptype=RegExp
pattern=^(?:SEC_STARTUP|SEC_RESTART)$
context=SEC_INTERNAL_EVENT
desc=fork the converter when sec is started or restarted
action=spawn ./test.pl my.log

type=Single
ptype=RegExp
pattern=\{importantmessage\}
desc=test
action=write - important message was received

The second rule fires if the following 4-line event is written into my.log:

{
important
message
}

My apologies if the above example is a bit laconic, but hopefully it
conveys the overall idea how to set up an event converter. And writing a
suitable converter is often taking not that much time, plus you get
something which is tailored exactly to your needs :)

kind regards,
risto
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to