When you said "if you are writing to a local file, you should not need to
have a separate queue", does queue equate to ruleset in this case? The main
reason I have the separate rulesets is to handle different kinds of incoming
logs on different ports, making it easier to write the rules logic for the
different types. Does that make sense?

using rulesets to segment your rules makes lots of sense, adding a queue if everything in the ruleset is writing to local files almost never does.

writing to disk is fast enough that the overhead of moving things in and out of the queue really hurts. In addition, if everything is behind a separate queue, the main worker threads end up not batching messages, and that _really_ hurts performance because the overhead of locking and unlocking the queues for each message is rather high.

There is a high number of dynafiles, specifically for server logs, which we
break out into files by host and programname with these templates:
template(name="LinuxProgramFile" type="string"
string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
template(name="WindowsProgramFile" type="string"
string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
")

The result is about 32,000 total dynafiles which are then rotated out daily.

the key is how many different files are being written to at the same time. As Rainer points out, if you have 1000 files open at a time, you need to have the dynafilecachesize set >1000.

I have systems that write per-server, per 15-min files. for each server this creates 96 files a day. I don't need 96 * servercount slots in the dynafilecache, I need > servercount.

the penalty for having this undersized is huge. when you are thrashing, each message that arrives requires that the system flush it's cache, close the file (which triggers filesystem metadata updates), open a new file and write the data. make the cache size much larger than what you need.

Looking at top, rsyslogd as a whole never goes above 120%. Most of the
threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
about 50% during peak. Although I hadn't noticed this before, watching top
for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
to almost 200% for one second, then disappears from top in the next second,
before returning to 60%. It cycled through this every 10 seconds or so.
After restarting rsyslogd, the behavior went away - it's staying around 60%.

Should I try lowering the number of threads for the rulesets/imudp? This
issue does happen with both TCP and UDP, however, which further compounds my
confusion. You mention systemd - I'll look into that as I have a case open
with RH.

if you are just dealing with logs sent from remote machines, systemd isn't going to be involved.

For the rulesets, try eliminating the queue entirely, combined with fixing the dynafilecachesize issue, I'll bet cpu utilization will plummet and you won't miss the queues.

for imudp, try leaving the number of threads at 1. I've run systems receiving hundreds of thousands of logs/sec with a single thread. It's only when we start getting close to the throughput of Gig-E that a single thread starts having problems in most cases.

set all thread counts to 1 and watch top to see if any of them need help.

From what I'm seeing, you shouldn't end up needing to do anything else.

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to