>
> I mentiond as an offhand remark to Risto a profile mode that would
> count not only every rule that lead to an action, but every time the
> rule executed its regular expression. Having some sort of profile mode
> (not to be run in production) would help identify these sorts of
> issues.
>
>
I have a question about how to potentially implement this feature -- would
it be useful to calculate the total CPU time spent for processing the rule
in profile mode? That would not only include the CPU time for matching the
pattern against the event, but also the CPU time for evaluating the context
expression or executing the action list. Total CPU time would be more
useful if one is dealing with frequently matching rules (say, the ones
which match 80-90% of incoming events), and wants to identify most
expensive rule. Also, some pattern types like Cached and TValue are known
to have low computational cost, but they are sometimes used with more
complex context expressions (e.g., involving the execution of custom Perl
code). Total CPU time would allow to identify such bottlenecks.

I've added some alpha-level code into SEC and here is a small example case
for a rule with badly written regular expression:

type=single
ptype=regexp
pattern=^([0-9]+,?)*[[:punct:]]$
desc=test pattern
action=write - test pattern has matched

The regular expression expects an event which is consisting of numbers that
are optionally separated with commas, and the sequence of numbers must be
followed by a single punctuation mark. However, when one provides a very
long string consisting of digits only without terminating punctuation mark,
the regular expression engine will attempt to divide the sequence into all
possible sub-sequences during backtracking phase. Here is a fragment for
SEC dump file that highlights this issue:

Performance statistics:
============================================================
Run time: 142 seconds
User time: 63.92 seconds
System time: 0.05 seconds
Child user time: 0 seconds
Child system time: 0 seconds
Processed input lines: 1

Rule usage statistics:
============================================================

Statistics for the rules from test3.sec
(loaded at Fri Apr  3 18:23:03 2020)
------------------------------------------------------------
Rule 1 line 1 matched 0 events (test pattern) CPU time: 63.71

As one can see, SEC has processed just one event which has not matched its
only rule, but unsuccessful matching process took more than a minute. Of
course, it is possible to disable unnecessary backtracking by using
possessive quantifier:
^([0-9]+,?)*+[[:punct:]]$

Finding CPU time spent for processing a rule has one caveat -- processing
of Jump rules is a recursive process that might involve further jumps into
other rulesets, and CPU time of the Jump rule would indicate the cost of
entire recursive process. There is no easy way to subtract the CPU time of
child rules, since they might be invoked by other Jump rules that have
matched different events.

Does the idea outlined in this e-mail sound reasonable? I have also looked
into other implementation ideas, but the current one looks most useful and
involving least amount of overhead in the code base.

kind regards,
risto
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to