> > I mentiond as an offhand remark to Risto a profile mode that would > count not only every rule that lead to an action, but every time the > rule executed its regular expression. Having some sort of profile mode > (not to be run in production) would help identify these sorts of > issues. > > I have a question about how to potentially implement this feature -- would it be useful to calculate the total CPU time spent for processing the rule in profile mode? That would not only include the CPU time for matching the pattern against the event, but also the CPU time for evaluating the context expression or executing the action list. Total CPU time would be more useful if one is dealing with frequently matching rules (say, the ones which match 80-90% of incoming events), and wants to identify most expensive rule. Also, some pattern types like Cached and TValue are known to have low computational cost, but they are sometimes used with more complex context expressions (e.g., involving the execution of custom Perl code). Total CPU time would allow to identify such bottlenecks.
I've added some alpha-level code into SEC and here is a small example case for a rule with badly written regular expression: type=single ptype=regexp pattern=^([0-9]+,?)*[[:punct:]]$ desc=test pattern action=write - test pattern has matched The regular expression expects an event which is consisting of numbers that are optionally separated with commas, and the sequence of numbers must be followed by a single punctuation mark. However, when one provides a very long string consisting of digits only without terminating punctuation mark, the regular expression engine will attempt to divide the sequence into all possible sub-sequences during backtracking phase. Here is a fragment for SEC dump file that highlights this issue: Performance statistics: ============================================================ Run time: 142 seconds User time: 63.92 seconds System time: 0.05 seconds Child user time: 0 seconds Child system time: 0 seconds Processed input lines: 1 Rule usage statistics: ============================================================ Statistics for the rules from test3.sec (loaded at Fri Apr 3 18:23:03 2020) ------------------------------------------------------------ Rule 1 line 1 matched 0 events (test pattern) CPU time: 63.71 As one can see, SEC has processed just one event which has not matched its only rule, but unsuccessful matching process took more than a minute. Of course, it is possible to disable unnecessary backtracking by using possessive quantifier: ^([0-9]+,?)*+[[:punct:]]$ Finding CPU time spent for processing a rule has one caveat -- processing of Jump rules is a recursive process that might involve further jumps into other rulesets, and CPU time of the Jump rule would indicate the cost of entire recursive process. There is no easy way to subtract the CPU time of child rules, since they might be invoked by other Jump rules that have matched different events. Does the idea outlined in this e-mail sound reasonable? I have also looked into other implementation ideas, but the current one looks most useful and involving least amount of overhead in the code base. kind regards, risto
_______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users