Re: [Simple-evcorr-users] SEC CPU utilization

2020-04-03 Thread Risto Vaarandi
>
> I mentiond as an offhand remark to Risto a profile mode that would
> count not only every rule that lead to an action, but every time the
> rule executed its regular expression. Having some sort of profile mode
> (not to be run in production) would help identify these sorts of
> issues.
>
>
I have a question about how to potentially implement this feature -- would
it be useful to calculate the total CPU time spent for processing the rule
in profile mode? That would not only include the CPU time for matching the
pattern against the event, but also the CPU time for evaluating the context
expression or executing the action list. Total CPU time would be more
useful if one is dealing with frequently matching rules (say, the ones
which match 80-90% of incoming events), and wants to identify most
expensive rule. Also, some pattern types like Cached and TValue are known
to have low computational cost, but they are sometimes used with more
complex context expressions (e.g., involving the execution of custom Perl
code). Total CPU time would allow to identify such bottlenecks.

I've added some alpha-level code into SEC and here is a small example case
for a rule with badly written regular expression:

type=single
ptype=regexp
pattern=^([0-9]+,?)*[[:punct:]]$
desc=test pattern
action=write - test pattern has matched

The regular expression expects an event which is consisting of numbers that
are optionally separated with commas, and the sequence of numbers must be
followed by a single punctuation mark. However, when one provides a very
long string consisting of digits only without terminating punctuation mark,
the regular expression engine will attempt to divide the sequence into all
possible sub-sequences during backtracking phase. Here is a fragment for
SEC dump file that highlights this issue:

Performance statistics:

Run time: 142 seconds
User time: 63.92 seconds
System time: 0.05 seconds
Child user time: 0 seconds
Child system time: 0 seconds
Processed input lines: 1

Rule usage statistics:


Statistics for the rules from test3.sec
(loaded at Fri Apr  3 18:23:03 2020)

Rule 1 line 1 matched 0 events (test pattern) CPU time: 63.71

As one can see, SEC has processed just one event which has not matched its
only rule, but unsuccessful matching process took more than a minute. Of
course, it is possible to disable unnecessary backtracking by using
possessive quantifier:
^([0-9]+,?)*+[[:punct:]]$

Finding CPU time spent for processing a rule has one caveat -- processing
of Jump rules is a recursive process that might involve further jumps into
other rulesets, and CPU time of the Jump rule would indicate the cost of
entire recursive process. There is no easy way to subtract the CPU time of
child rules, since they might be invoked by other Jump rules that have
matched different events.

Does the idea outlined in this e-mail sound reasonable? I have also looked
into other implementation ideas, but the current one looks most useful and
involving least amount of overhead in the code base.

kind regards,
risto
___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users


Re: [Simple-evcorr-users] SEC CPU utilization

2020-04-02 Thread Risto Vaarandi
hi Richard,


> We were doing log monitoring migration from HPOM to open-source monitoring
> tool, and using SEC for duplicate events flow reduction before passing to
> monitoring agent, in the manner as HPOM agent with built-in correlations
> was used, so the design of rules and correlations is tributary to how it
> was implemented in HPOM. There were hundreds to thousands of pattern
> conditions in HPOM per host, and the structure of their sections was as
> follows:
>
>- HPOM: suppress unmatched conditions -> SEC: Suppress with NRegExp
>- HPOM: suppress matched conditions -> SEC: Suppress with RegExp
>- HPOM: message conditions (with configured time-based correlations)
>-> SEC: Single with RegExp and GoTo -> duplicate suppress time-based
>correlations, each consisting of 3-4 subsequent rules (Single,
>PairWithWindow, SingleWithSuppress, depending on duplicate suppress
>correlation type)
>
> We decided to automate conversion of HPOM configurations to SEC rules, so
> here was not too much space for conceptual improvements over HPOM concepts
> (e.g. by doing deeper analysis of configurations and actual log traffic),
> and we relied on the premise, that those HPOM configurations are OK, and
> tuned by years of development and operations, so the automated conversion
> was 1:1.
>
> Cca 50 log files per host are of several types (according to message
> structure), but each file was monitored in HPOM independently on each
> other, therefore after 1:1 conversion also in SEC is each file monitoring
> independently, however, there is some maybe uglier "configuration
> redundancy" for log files of the same type, as it was in HPOM. The static
> order of conditions in HPOM is preserved also in generated SEC rules.
>
>
Since I have used HPOM in the past, perhaps I can offer some comments and
advise here. As far as I remember, HPOM agent does not support hierarchical
arrangement of log file monitoring policies and rules. You mentioned that
existing HPOM configuration was converted on 1:1 basis -- does that mean
that SEC is configured to use a number of rule files, where each rule file
corresponds to some HPOM policy, and all rules are applied against all
input messages from 50 input files? Since such use of rules is not very
efficient, perhaps you could introduce hierarchical rule arrangement in the
following way. First, assign each rule file to a configuration file set
with Options rule. For example, if you have an HPOM policy called "sshd",
you can use the following Options rule in the beginning of corresponding
rule file:

type=Options
joincfset=sshd
procallin=no

Secondly, run SEC with --intcontexts flag which will set an internal
context for each file (e.g., the file /var/log/secure will have the context
_FILE_EVENT_/var/log/secure by default, but you can also set a custom
context name in command line). Finally, create a special rule file (e.g.,
main.sec) which will route messages to relevant rule files. For example,
suppose that HPOM policies "sshd" and "sudo" have been used for
/var/log/secure, and there are two SEC rule files that contain Options
rules as described above. For handling messages from /var/log/secure, you
can enter the following Jump rule into main.sec rule file:

type=Jump
context=[ _FILE_EVENT_/var/log/secure ]
ptype=TValue
pattern=True
desc=direct /var/log/secure events to relevant rules files
cfset=sshd sudo

Such configuration ensures that messages of /var/log/secure are being
matched against relevant rules only, and rule files not associated with
/var/log/secure will not be applied. Also, only the Jump rules from
main.sec are applied against the entire event stream, while rules from
other files receive their input from Jump rules in main.sec and are thus
applied only for relevant messages. Perhaps you have already applied this
technique, but if not, I would definitely recommend to use this
optimization, since it is very likely to reduce the CPU load.

hope this helps,
risto
___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users


Re: [Simple-evcorr-users] SEC CPU utilization

2020-04-01 Thread David Lang
re: multi-line rules, my thought is to convert the newlines to some escape 
sequence and then treat the results as a single line.


I tend to not want to run SEC on each system as it can impact that system (both 
CPU utilization and disk I/O). Instead I make SEC part of the log processing 
infrastructure, ideally on a dedicated system (or set of systems) so having 
multiple instances, each eating a core, is a feature not a bug ;-)


David Lang

On Thu, 2 Apr 2020, Richard Ostrochovský wrote:


Date: Thu, 2 Apr 2020 00:15:08 +0200
From: Richard Ostrochovský 
To: simple-evcorr-users@lists.sourceforge.net
Subject: Re: [Simple-evcorr-users] SEC CPU utilization

Hello guys,

thank you for many tips. I have carefully read through e-mails from
Clayton, Risto, John, David, and Dusan, and I am summary reacting to all of
it with this e-mail, as more reactions were in similar spirit.

To my primary question: I am satisfied, that log messages are not lost on
"heavy load", but just (gradually) delayed (currently, delay is not the
problem in our case, as 100% CPU consumption is not all the time, and SEC
has the time "to catch the train in the next station"). These are
application logs and our messages load is variable, but not the order of as
high Dusan's is, and I am still not monitoring this flow (discussed here:
https://sourceforge.net/p/simple-evcorr/mailman/message/36910235/, medium
priority task in my Jira).

To my secondary questions about finding bottlenecks and optimizations, I'll
summarize the recommendations (with my comments), as some was mentioned by
more of you:

  - placing most "heavy-load" rules first (problem: unpredictability;
  potential solution: re-ordering rules dynamically, according to latest
  stats)
  - hierarchical / tree-like setup (Jump rule in Risto's examples, or GoTo
  section ended with Suppress, as in our case)
  - more instances of SEC (but there is also risk to consume not only 1
  CPU, but as many CPUs, as many SEC instances running)
  - preprocessing of log messages with some kind of classification /
  branching (out of SEC)
 - LogZilla (Clayton,
 http://demo.logzilla.net/help/event_correlation/intro_to_event_correlation
 )
 - rsyslog mmnormalize (David)
 - syslog-ng PatternDB (Dusan)
  - RegExp optimization
  - reducing count of contexts, active deletion where suitable

As some of you are interested in the design, I would need to ask for
permission of my employer, to publish it, as I am not the author of rules
design. I just created automation of their generation (topic discussed
here: https://sourceforge.net/p/simple-evcorr/mailman/message/36867012/)
and minor optimizations. Although I consulted a lot in this mailinglist,
mainly with Risto, original pre-conslultations designs remained unchanged.
But for now, I will at least try to high-level describe it, but at first,
it is important to mention, how it originated:

We were doing log monitoring migration from HPOM to open-source monitoring
tool, and using SEC for duplicate events flow reduction before passing to
monitoring agent, in the manner as HPOM agent with built-in correlations
was used, so the design of rules and correlations is tributary to how it
was implemented in HPOM. There were hundreds to thousands of pattern
conditions in HPOM per host, and the structure of their sections was as
follows:

  - HPOM: suppress unmatched conditions -> SEC: Suppress with NRegExp
  - HPOM: suppress matched conditions -> SEC: Suppress with RegExp
  - HPOM: message conditions (with configured time-based correlations) ->
  SEC: Single with RegExp and GoTo -> duplicate suppress time-based
  correlations, each consisting of 3-4 subsequent rules (Single,
  PairWithWindow, SingleWithSuppress, depending on duplicate suppress
  correlation type)

We decided to automate conversion of HPOM configurations to SEC rules, so
here was not too much space for conceptual improvements over HPOM concepts
(e.g. by doing deeper analysis of configurations and actual log traffic),
and we relied on the premise, that those HPOM configurations are OK, and
tuned by years of development and operations, so the automated conversion
was 1:1.

Cca 50 log files per host are of several types (according to message
structure), but each file was monitored in HPOM independently on each
other, therefore after 1:1 conversion also in SEC is each file monitoring
independently, however, there is some maybe uglier "configuration
redundancy" for log files of the same type, as it was in HPOM. The static
order of conditions in HPOM is preserved also in generated SEC rules.

I think, branching multi-level cascade structure with GoTo is OK. What is
harder is, that the majority of log files has multi-line messages, and some
logs are multi-file (discussed here:
https://sourceforge.net/p/simple-evcorr/mailman/message/36861921/).

Only one instance of SEC is running per host, so single CPU can be consumed
maximum

Re: [Simple-evcorr-users] SEC CPU utilization

2020-04-01 Thread Richard Ostrochovský
Hello guys,

thank you for many tips. I have carefully read through e-mails from
Clayton, Risto, John, David, and Dusan, and I am summary reacting to all of
it with this e-mail, as more reactions were in similar spirit.

To my primary question: I am satisfied, that log messages are not lost on
"heavy load", but just (gradually) delayed (currently, delay is not the
problem in our case, as 100% CPU consumption is not all the time, and SEC
has the time "to catch the train in the next station"). These are
application logs and our messages load is variable, but not the order of as
high Dusan's is, and I am still not monitoring this flow (discussed here:
https://sourceforge.net/p/simple-evcorr/mailman/message/36910235/, medium
priority task in my Jira).

To my secondary questions about finding bottlenecks and optimizations, I'll
summarize the recommendations (with my comments), as some was mentioned by
more of you:

   - placing most "heavy-load" rules first (problem: unpredictability;
   potential solution: re-ordering rules dynamically, according to latest
   stats)
   - hierarchical / tree-like setup (Jump rule in Risto's examples, or GoTo
   section ended with Suppress, as in our case)
   - more instances of SEC (but there is also risk to consume not only 1
   CPU, but as many CPUs, as many SEC instances running)
   - preprocessing of log messages with some kind of classification /
   branching (out of SEC)
  - LogZilla (Clayton,
  http://demo.logzilla.net/help/event_correlation/intro_to_event_correlation
  )
  - rsyslog mmnormalize (David)
  - syslog-ng PatternDB (Dusan)
   - RegExp optimization
   - reducing count of contexts, active deletion where suitable

As some of you are interested in the design, I would need to ask for
permission of my employer, to publish it, as I am not the author of rules
design. I just created automation of their generation (topic discussed
here: https://sourceforge.net/p/simple-evcorr/mailman/message/36867012/)
and minor optimizations. Although I consulted a lot in this mailinglist,
mainly with Risto, original pre-conslultations designs remained unchanged.
But for now, I will at least try to high-level describe it, but at first,
it is important to mention, how it originated:

We were doing log monitoring migration from HPOM to open-source monitoring
tool, and using SEC for duplicate events flow reduction before passing to
monitoring agent, in the manner as HPOM agent with built-in correlations
was used, so the design of rules and correlations is tributary to how it
was implemented in HPOM. There were hundreds to thousands of pattern
conditions in HPOM per host, and the structure of their sections was as
follows:

   - HPOM: suppress unmatched conditions -> SEC: Suppress with NRegExp
   - HPOM: suppress matched conditions -> SEC: Suppress with RegExp
   - HPOM: message conditions (with configured time-based correlations) ->
   SEC: Single with RegExp and GoTo -> duplicate suppress time-based
   correlations, each consisting of 3-4 subsequent rules (Single,
   PairWithWindow, SingleWithSuppress, depending on duplicate suppress
   correlation type)

We decided to automate conversion of HPOM configurations to SEC rules, so
here was not too much space for conceptual improvements over HPOM concepts
(e.g. by doing deeper analysis of configurations and actual log traffic),
and we relied on the premise, that those HPOM configurations are OK, and
tuned by years of development and operations, so the automated conversion
was 1:1.

Cca 50 log files per host are of several types (according to message
structure), but each file was monitored in HPOM independently on each
other, therefore after 1:1 conversion also in SEC is each file monitoring
independently, however, there is some maybe uglier "configuration
redundancy" for log files of the same type, as it was in HPOM. The static
order of conditions in HPOM is preserved also in generated SEC rules.

I think, branching multi-level cascade structure with GoTo is OK. What is
harder is, that the majority of log files has multi-line messages, and some
logs are multi-file (discussed here:
https://sourceforge.net/p/simple-evcorr/mailman/message/36861921/).

Only one instance of SEC is running per host, so single CPU can be consumed
maximum. No other preprocessing/classification is used, SEC is doing all
the work.

Anyway, I am happy with Risto's answer, that 100% CPU utilization is not
causing losses, just delays, and currently migration is over, and we'll
see, if there will be demand for further optimizations in the future, so I
don't want to burden you guys with some rules analysis and review, and
concrete optimization advices.

But I am also interested in Risto's and Clayton's initiatives about rules
sharing. I think, that further step to building user community with rules
sharing could be some kind of formalization, "best practise rules
templates", "rulesets / templates catalogue", possibly with possibility to
choose 

Re: [Simple-evcorr-users] SEC CPU utilization

2020-03-26 Thread David Lang
I set up SEC using the omprog option in rsyslog, so rsyslog started SEC and fed 
the logs in via stdin rather than writing to disk.


On Thu, 26 Mar 2020, Dusan Sovic wrote:


I using similar approach as David mention. I processing syslog messages from 
network devices (various Vendors like Arista, Cisco, Juniper etc.).

We have about 7k patterns defined for match. As SEC cannot handle such regexp 
volume, I using syslog-ng PatternDB for pattern matching, classification and 
tagging.

Than all matched logs are written in normalized format into file what monitored 
by SEC and pre-defined correlation schema is applied. This help me to reduce 
flow into SEC to 30 msg/s while incoming syslog message flow is about ~20k 
msg/s (from about 20k unique network devices). My SEC instance is eating ~ 20 – 
30 % of CPU (with about 500 active event correlation operations with ~ 4k 
active contexts in avg).


Od: David Lang 
Odoslané: štvrtok 26. marca 2020 1:24
Komu: Richard Ostrochovský 
Kópia: simple-evcorr-users@lists.sourceforge.net 

Predmet: Re: [Simple-evcorr-users] SEC CPU utilization

Much of the time you can trivially split your rules and then run multiple copies
of SEC, each processing a subnet of the rules on the logs that will match them.

Rsyslog has a very efficient parsing capability (mmnormalize), you can use it to
just classify the logs, or you can go a step further and have it extract the
relevant fields and pass them to SEC in ways that make it cheaper to parse and
process.

where you do have rules that correlate across different types of logs, you may
be able to just combine those logs into one ruleset, or you can have parallel
instances output messages to say that their part of the correlation has tested
true and then have another instance that processes these partial correlation
messages and decide if the combined correlation is matched.

David Lang


___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users


Re: [Simple-evcorr-users] SEC CPU utilization

2020-03-26 Thread Dusan Sovic
I using similar approach as David mention. I processing syslog messages from 
network devices (various Vendors like Arista, Cisco, Juniper etc.).

We have about 7k patterns defined for match. As SEC cannot handle such regexp 
volume, I using syslog-ng PatternDB for pattern matching, classification and 
tagging.

Than all matched logs are written in normalized format into file what monitored 
by SEC and pre-defined correlation schema is applied. This help me to reduce 
flow into SEC to 30 msg/s while incoming syslog message flow is about ~20k 
msg/s (from about 20k unique network devices). My SEC instance is eating ~ 20 – 
30 % of CPU (with about 500 active event correlation operations with ~ 4k 
active contexts in avg).


Od: David Lang 
Odoslané: štvrtok 26. marca 2020 1:24
Komu: Richard Ostrochovský 
Kópia: simple-evcorr-users@lists.sourceforge.net 

Predmet: Re: [Simple-evcorr-users] SEC CPU utilization

Much of the time you can trivially split your rules and then run multiple copies
of SEC, each processing a subnet of the rules on the logs that will match them.

Rsyslog has a very efficient parsing capability (mmnormalize), you can use it to
just classify the logs, or you can go a step further and have it extract the
relevant fields and pass them to SEC in ways that make it cheaper to parse and
process.

where you do have rules that correlate across different types of logs, you may
be able to just combine those logs into one ruleset, or you can have parallel
instances output messages to say that their part of the correlation has tested
true and then have another instance that processes these partial correlation
messages and decide if the combined correlation is matched.

David Lang


___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users
___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users


Re: [Simple-evcorr-users] SEC CPU utilization

2020-03-25 Thread John P. Rouillard
In message 
,
=?UTF-8?Q?Richard_Ostrochovsk=C3=BD?= writes:
>Hello friends,
>
>I have SEC monitoring over 50 log files with various correlations, and it
>is consuming 100% of single CPU (luckily on 10-CPU machine, so not whole
>system affected, as SEC is single-CPU application).
>
>This could mean, that SEC does not prosecute processing of all rules, and I
>am curious, what are possible effects, if this means increasing delays
>(first in, processing, first out), or skipping some lines from input files,
>or anything other (?).
>
>And how to troubleshoot, finding bottlenecks. I can see quantities of log
>messages per contexts or log files in sec.dump, this is some indicator. Are
>there also other indicators? Is it possible, somehow, see also processing
>times of patterns (per rules)?

I have found two operations that are large users of cpu in SEC.

1) If you have a lot of contexts, context lookups can eat cpu. I
   solved a problem where sec used 50% of a cpu by actively deleting
   contexts as soon as I could rather than lettng them time out. You
   can use the USR1 signal (IIRC see manpage) to get a dump of all
   contexts.  I don't remember how many I had but it was a lot. Once I
   started actively deleting contexts the cpu dropped to 1-2% with
   ~100-500 events per second though the system.

2) Inefficient regexps that require a lot of backtracking can kill
   your cpu.  I don't know how you would find these.

You don't say how your rulesets are constructed, but if you have say
100 rules and the most often hit rules are at position 99 and 100, you
are going to waste a lot of time applying regexps (even if they are
efficient). Again a dump file will tell you which rules are being
hit. Move the most used rules so they are applied earlier in you
rulesets. Also you can create rules so they are applied in a tree like
structure. Look at the manpage for rulesets. This structure allows you
to have a layer of filter rules (e.g. look for ssh in the log line)
and only apply rulesets for ssh to the log event. This can cut down
dramaticaly on the number of regexps that have to be applied and
discarded.

Also you can parse the event in one rule and then use more efficient
means to trigger rules (matching on a subset of the original event).

I mentiond as an offhand remark to Risto a profile mode that would
count not only every rule that lead to an action, but every time the
rule executed its regular expression. Having some sort of profile mode
(not to be run in production) would help identify these sorts of
issues.

Good luck and when you figure it out let us know how you found and
fixed it.

--
-- rouilj
John Rouillard
===
My employers don't acknowledge my existence much less my opinions.


___
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users