Hi all:

I was not happy with the performance of some of my rules that handles
~100 events/sec (with 1-2s bursts to 800 events/sec every minute or
so).

It was consuming 50% of the cpu at this rate and the lag was pretty
significant (> 100 events). This is unusual it should be taking a few
percent of the cpu at these levels and the lag should be less than 10
or so events at most times.

I did some rewriting of the rules using jump and cfsets to streamline
the handling of particular high volume event creators (slapd,
firewall) so they only went through one ruleset and not the entire
pipeline of rulesets.

This sped things up by 10-15%. But I was still seeing 35-40% cpu use.

I did a state dump (kill -USR1) and saw that I had 100K+ contexts
defined. These were created to record the postgres connection
information (user, host, ports) and timed out after a day. They are
created at the rate of roughly 1.5/second by our load balancer which
verifies that postgres instances were up and running. Then there were
another 10-20 thousand created by actual postgres queries over a 24
hour period. They timed out after 24 hours but this was creating a
huge burden during runtime since many of my rules use context
expressions to control their operation/application. So the tree of
100K contexts has to be walked for 50% of the rules if not more.
Also the virtual mem size of the process was close to 250MB.

I changed my postgres logging to include logging of disconnects from
the postgres server. I added a rule to delete the connection context
when a disconnect happened. This reduced the steady state number of
contexts to approx 800. It also reduced the memory to 100MB.  As a
result my sec process is now using 2-5% of the cpu and has a lag that
is less than 10 events (which is within the error range of the
technique I am using to measure the lag).

So what's the moral? Clean up after yourself whenever possible.  Even
if you have to add additional logging events to the stream if they help
keep things clean they will help performance.

On another topic, does anybody have a good method for measuring the
sec lag? I have a single input file (/var/log/messages) and I am
using:

  tail -n 75 /var/log/messages > /tmp/a;
  /etc/init.d/sec_rsyslog_master_mon dump | \
    sed -ne '/Content of input buffer/,+101p' | tail -n 75 | \
    comm -13 /tmp/a - | wc -l

which grabs the last 75 lines in the input file and compares it to the
last 75 lines in sec's buffer. The comm prints lines that exist in the
input file but are not matched by lines in sec's input buffer. Then
the lines are counted.

This has a few issues with it, but seems to be the most accurate
mechanism I have come up with. Obviously this falls down if you have
multiple input files, separate buffers/file etc.

Enjoy the new year.

--
                                -- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.




------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to