Thank you David. From what I see, none of the queues are getting full. None of the actions - I have 73 total - have a 'failed' count of greater than zero. Four have a 'suspended' count in the single digits.
When you said "if you are writing to a local file, you should not need to have a separate queue", does queue equate to ruleset in this case? The main reason I have the separate rulesets is to handle different kinds of incoming logs on different ports, making it easier to write the rules logic for the different types. Does that make sense? There is a high number of dynafiles, specifically for server logs, which we break out into files by host and programname with these templates: template(name="LinuxProgramFile" type="string" string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log") template(name="WindowsProgramFile" type="string" string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log ") The result is about 32,000 total dynafiles which are then rotated out daily. Looking at top, rsyslogd as a whole never goes above 120%. Most of the threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting about 50% during peak. Although I hadn't noticed this before, watching top for a few minutes this morning, rsyslogd hangs out around 60%, then bursts to almost 200% for one second, then disappears from top in the next second, before returning to 60%. It cycled through this every 10 seconds or so. After restarting rsyslogd, the behavior went away - it's staying around 60%. Should I try lowering the number of threads for the rulesets/imudp? This issue does happen with both TCP and UDP, however, which further compounds my confusion. You mention systemd - I'll look into that as I have a case open with RH. Dan -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of David Lang Sent: Monday, October 5, 2015 3:23 PM To: rsyslog-users <[email protected]> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss doing a cut-n-past since this client doesn't quote signed messages properly: David, thank you. Apologies, I had cut out the dynafile for testing. Here is some additional config file: template(name="EduroamACSFile" type="string" string="/var/log/collection/eduroam_acs.log") template(name="EduroamACSFormat" type="string" string="%timestamp% %hostname% %syslogtag%%msg:::escape-cc%\n") ruleset(name="ruleset1" queue.type="LinkedList" queue.size="750000" queue.workerthreads="4") { # at very top of ruleset - IPs obscured for privacy if $fromhost-ip == 'x.x.2.75' or $fromhost-ip == 'x.x.2.85' then { /tmp/debug.log;RSYSLOG_DebugFormat action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") stop } #... other rules below } # end ruleset I do have impstats enabled for every 5 minutes. Here's output specific to this file for a recent sample for this file, plus the general rule set queues: Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342 evicted=0 maxused=1 closetimeouts=51 Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile: origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage: origin=impstats utime=217851481196 stime=87433080781 maxrss=628664 minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592 nvcsw=379912551 nivcsw=20999555 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet: origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0 discarded.nf=0 maxqsize=350911 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet: origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0 discarded.nf=0 maxqsize=156862 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet: origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0 discarded.nf=0 maxqsize=4243 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet: origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0 discarded.nf=0 maxqsize=456471 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0 maxqsize=1738 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564 To estimate volume of logs this is receiving, I took a recent impstats output for enqueued for all rule sets and added them, then subtracted the values from an hour ago, divided by 3600 to get msgs/second and got 29521. This is peak time of day for us so I would expect this to be the high point, dropping to less than half that in the middle of the night. Messages are still randomly lost in the middle of the night. If there's a better way to estimate volume being processed by the machine, let me know! look at the inputs, impudp shows the number of messages received then you look at the various queues (if everything is in a separate queue, that also leads to a bottleneck), are any of them getting full at peak load? Then you look at the various actions, are any of them showing errors? the number of messages enqueued for all items (or even all queues) may be substantially larger or smaller (depending on if a message is handled by multiple rules or is filtered out and never processed) so your 29K logs/sec is probably a fairly inflated value, but even that should be just fine with a single worker. if you are writing to a local file, you should not need to have a separate queue (compression could change things, only move things to a separate queue if testing shows that you run out of CPU otherwise, or if that output could block) in your dynafile actions, how many different files is it producing? for example: action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") is hard coded to a single file in the example provided, and would be slightly faster just writing to that file. also, run top, enable per-thread reporting (press 'H') and see if any of the rsyslog threads are using 100% cpu. RHEL/CentOS7 do introduce systemd, and systemd may be doing something to prevent rsyslog from 'hogging' the cpu, but I don't know what it's defaults are. David Lang
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

