[rsyslog] Importance of impstats, and omfile cache size...

Robert J. McIntyre Sat, 12 Oct 2013 10:02:19 -0700

Not a question, but sharing a story/experience.


My rsyslog server (v. 7.4.4) takes in about 60K mps right now, does some
light filtering, and then writes the messages out to two different file
shares on Windows servers over CIFS.  I'm using dynamic filenames for both
destinations, and separate disk-assisted action queues.  It can generally
keep up, but during peak hours the two omfile queues usually run about 4
million events behind, and from time-to-time one will spike into the
hundred+ million events behind.  To make matters worse, the remote ruleset
will periodically get backed up as well.

 

I've been beating my head against these issues for several months, and just
nursing the system along.  It's my production system, so I have to be
careful about making changes to it, so I can't just willy-nilly try things
out.  But, because of a recent post by Rainer, I found this page
(http://www.rsyslog.com/rsyslog-statistic-counter/), and started to go
through my stats log with a fresh eye.

 

Looking at my dynafile stats outputs, I was getting millions of missed and
evicted files over time, but I didn't know what that meant.  After reading
the stats description, I knew immediately what was going on.

 

My dynafile template is (in pseudo-code, because I don't have the template
in front of me):

 

<sending devicename  pulled from the message header using the %msg:F
feature>-firewall-YYYY-MM-DDTHH.QH

 

We have between 11 and 15 devices sending logs at any given time.  So, you
can see that we need to have, at any time, at least 11, but possibly 15
files open at any time.  With the default dynafile cache size of 10, we were
guaranteed to have a steady stream of misses and evictions as the file(s)
that weren't in the cache needed to be accessed.  After raising the cache
queue to 20 (comfortably hold one set of files at a time), we've eliminated
the excess cache misses/evictions, and our omfile queues rarely have any
back up at all, with transient peak (queue maxsize) values in the low
hundreds of thousands, vs. 10's or 100's of millions.  Rest assured, we'll
be continuing to look at our impstats output with a magnifying glass now. :)


 

I just thought I'd share this with the group, in case it helps anyone else,
or inspires you to take a closer look at what's happening with your server.

 

Cheers!

Robert

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

[rsyslog] Importance of impstats, and omfile cache size...

Reply via email to