hi Richard, just one followup thought -- have you considered sec native multi-line patterns such as RegexpN for handling multi-line logs? Of course, there are scenarios where the value of N (max number of lines in a multi-line event) can be very large and is difficult to predict, and for such cases an event converter (like the one from the previous post) is probably the best approach. However, if you know the value of N and events can be matched with regular expressions, RegexpN pattern type can be used for this task. For example, if events have the brace separated format described in the previous post, and events can contain up to 20 lines, one could utilize the following Regexp20 pattern for matching:
type=Single ptype=RegExp20 pattern=(?s)^(?:.+\n)?\{\n(.+\n)?\}$ desc=match a multiline event between braces action=write - $1 Also, if you want to convert such multi-line events into a single-line format with builtin features, sec 'rewrite' action allows for that. In the following example, the first rule takes the multi-line data between braces and replaces each newline with a space character, and resulting single-line string (with a prefix "Converted event: ") is used for overwriting sec event buffer. The second rule is now able to match such converted events: type=Single ptype=RegExp20 pattern=(?s)^(?:.+\n)?\{\n(?:(.+)\n)?\}$ continue=TakeNext desc=convert a multiline event between braces to single-line format action=lcall %ret $1 -> ( sub { my($t) = $_[0]; $t =~ s/\n/ /g; return $t; } ); \ rewrite 20 Converted event: %ret type=Single ptype=RegExp pattern=Converted event: (.*) desc=match any event action=write - $1 Maybe above examples are helpful for getting additional insights into different ways of processing multi-line events. kind regards, risto hi Richard, >>> >> ... >> >>> In the current code base, identifying the end of each line is done with >>> a simple search for newline character. The newline is searched not with a >>> regular expression, but rather with index() function which is much faster. >>> It is of course possible to change the code, so that a regular expression >>> pattern is utilized instead, but that would introduce a noticeable >>> performance penalty. For example, I made couple of quick tests with >>> replacing the index() function with a regular expression that identifies >>> the newline separator, and when testing modified sec code against log files >>> of 4-5 million events, cpu time consumption increased by 25%. >>> >> >> Hmm, this is interesting. The philosophically principial question came to >> my mind, if this penalty could be decreased (optimized), when doing >> replacements of these regular newline characters ("\n") and matching >> endings of "lines" with regexp - through rules (or by other more external >> way) - before further processing by subsequent rules, instead of potential >> built-in feature (used optionally on particular logfiles). >> >> > Perhaps I can add few thoughts here. Since the number of multi-line > formats is essentially infinite, converting multi-line format into > single-line representation externally (i.e., outside sec) offers most > flexibility. For instance, in many cases there is no delimiter as such > between messages, but beginning and end of the message contain different > character sequences that are part of the message. In addition, any lines > that are not between valid beginning and end should be discarded. It is > clear that using one regular expression for matching delimiters is not > addressing this scenario properly. Also, one can imagine many other > multi-line formats, and coming up with a single builtin approach for all of > them is not possible. On the other hand, a custom external converter allows > for addressing a given event format exactly as we like. For example, > suppose we are dealing with the following format, where multi-line event > starts with a lone opening brace on a separate line, and ends with a lone > closing brace: > > { > line1 > line2 > ... > } > > For converting such events into a single line format, the following simple > wrapper could be utilized (written in 10 minutes): > > #!/usr/bin/perl -w > # the name of this wrapper is test.pl > > if (scalar(@ARGV) != 1) { die "Usage: $0 <file>\n"; } > $file = $ARGV[0]; > if (!open(FILE, "tail -F $file |")) { die "Can't start tail for $file\n"; } > $| = 1; > > while (<FILE>) { > chomp; > if (/^{$/) { $message = $_; } > elsif (/^}$/ && defined($message)) { > $message .= $_; > print $message, "\n"; > $message = undef; > } > elsif (defined($message)) { > $message .= $_; > } > } > > If this wrapper is then started from sec with 'spawn' or 'cspawn' action, > multi-line events from monitored file will appear as single-line synthetic > events for sec. For example: > > type=Single > ptype=RegExp > pattern=^(?:SEC_STARTUP|SEC_RESTART)$ > context=SEC_INTERNAL_EVENT > desc=fork the converter when sec is started or restarted > action=spawn ./test.pl my.log > > type=Single > ptype=RegExp > pattern=\{importantmessage\} > desc=test > action=write - important message was received > > The second rule fires if the following 4-line event is written into my.log: > > { > important > message > } > > My apologies if the above example is a bit laconic, but hopefully it > conveys the overall idea how to set up an event converter. And writing a > suitable converter is often taking not that much time, plus you get > something which is tailored exactly to your needs :) > > kind regards, > risto > > >
_______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users