hi Richard, Risto, thank you for your pre-analysis about multi-lines with regexp, and > also for suggestions about multi-files yet more sophisticated solution. > > My comments are also inline: > > st 27. 11. 2019 o 15:07 Risto Vaarandi <risto.vaara...@gmail.com> > napísal(a): > >> hi Richard, >> > ... > >> In the current code base, identifying the end of each line is done with a >> simple search for newline character. The newline is searched not with a >> regular expression, but rather with index() function which is much faster. >> It is of course possible to change the code, so that a regular expression >> pattern is utilized instead, but that would introduce a noticeable >> performance penalty. For example, I made couple of quick tests with >> replacing the index() function with a regular expression that identifies >> the newline separator, and when testing modified sec code against log files >> of 4-5 million events, cpu time consumption increased by 25%. >> > > Hmm, this is interesting. The philosophically principial question came to > my mind, if this penalty could be decreased (optimized), when doing > replacements of these regular newline characters ("\n") and matching > endings of "lines" with regexp - through rules (or by other more external > way) - before further processing by subsequent rules, instead of potential > built-in feature (used optionally on particular logfiles). > > Perhaps I can add few thoughts here. Since the number of multi-line formats is essentially infinite, converting multi-line format into single-line representation externally (i.e., outside sec) offers most flexibility. For instance, in many cases there is no delimiter as such between messages, but beginning and end of the message contain different character sequences that are part of the message. In addition, any lines that are not between valid beginning and end should be discarded. It is clear that using one regular expression for matching delimiters is not addressing this scenario properly. Also, one can imagine many other multi-line formats, and coming up with a single builtin approach for all of them is not possible. On the other hand, a custom external converter allows for addressing a given event format exactly as we like. For example, suppose we are dealing with the following format, where multi-line event starts with a lone opening brace on a separate line, and ends with a lone closing brace:
{ line1 line2 ... } For converting such events into a single line format, the following simple wrapper could be utilized (written in 10 minutes): #!/usr/bin/perl -w # the name of this wrapper is test.pl if (scalar(@ARGV) != 1) { die "Usage: $0 <file>\n"; } $file = $ARGV[0]; if (!open(FILE, "tail -F $file |")) { die "Can't start tail for $file\n"; } $| = 1; while (<FILE>) { chomp; if (/^{$/) { $message = $_; } elsif (/^}$/ && defined($message)) { $message .= $_; print $message, "\n"; $message = undef; } elsif (defined($message)) { $message .= $_; } } If this wrapper is then started from sec with 'spawn' or 'cspawn' action, multi-line events from monitored file will appear as single-line synthetic events for sec. For example: type=Single ptype=RegExp pattern=^(?:SEC_STARTUP|SEC_RESTART)$ context=SEC_INTERNAL_EVENT desc=fork the converter when sec is started or restarted action=spawn ./test.pl my.log type=Single ptype=RegExp pattern=\{importantmessage\} desc=test action=write - important message was received The second rule fires if the following 4-line event is written into my.log: { important message } My apologies if the above example is a bit laconic, but hopefully it conveys the overall idea how to set up an event converter. And writing a suitable converter is often taking not that much time, plus you get something which is tailored exactly to your needs :) kind regards, risto
_______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users