2015-10-06 14:57 GMT+02:00 Woodruff, Dan <[email protected]>: > Thank you David. From what I see, none of the queues are getting full. > > None of the actions - I have 73 total - have a 'failed' count of greater > than zero. Four have a 'suspended' count in the single digits. > > When you said "if you are writing to a local file, you should not need to > have a separate queue", does queue equate to ruleset in this case? The main > reason I have the separate rulesets is to handle different kinds of incoming > logs on different ports, making it easier to write the rules logic for the > different types. Does that make sense? > > There is a high number of dynafiles, specifically for server logs, which we > break out into files by host and programname with these templates: > template(name="LinuxProgramFile" type="string" > string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log") > template(name="WindowsProgramFile" type="string" > string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log > ") > > The result is about 32,000 total dynafiles which are then rotated out daily. > > Looking at top, rsyslogd as a whole never goes above 120%. Most of the > threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting > about 50% during peak. Although I hadn't noticed this before, watching top > for a few minutes this morning, rsyslogd hangs out around 60%, then bursts > to almost 200% for one second, then disappears from top in the next second, > before returning to 60%. It cycled through this every 10 seconds or so. > After restarting rsyslogd, the behavior went away - it's staying around 60%.
This could be caused by a too-small dynafile cache. I haven't located the conf for these actions with 32,000 files nor the stats, but depending on size it could be the problem. Please post the *full* impstats log. Rainer > > Should I try lowering the number of threads for the rulesets/imudp? This > issue does happen with both TCP and UDP, however, which further compounds my > confusion. You mention systemd - I'll look into that as I have a case open > with RH. > > Dan > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of David Lang > Sent: Monday, October 5, 2015 3:23 PM > To: rsyslog-users <[email protected]> > Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss > > doing a cut-n-past since this client doesn't quote signed messages properly: > > > David, thank you. Apologies, I had cut out the dynafile for testing. Here is > some additional config file: > > template(name="EduroamACSFile" type="string" > string="/var/log/collection/eduroam_acs.log") > template(name="EduroamACSFormat" type="string" string="%timestamp% > %hostname% %syslogtag%%msg:::escape-cc%\n") > > ruleset(name="ruleset1" queue.type="LinkedList" queue.size="750000" > queue.workerthreads="4") { > # at very top of ruleset - IPs obscured for privacy > if $fromhost-ip == 'x.x.2.75' > or $fromhost-ip == 'x.x.2.85' > then { > /tmp/debug.log;RSYSLOG_DebugFormat > action(type="omfile" dynaFile="EduroamACSFile" > template="EduroamACSFormat" name="writeEduroamACSFile") > stop > } > #... other rules below > } # end ruleset > > I do have impstats enabled for every 5 minutes. Here's output specific to > this file for a recent sample for this file, plus the general rule set > queues: > > Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache > EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342 > evicted=0 maxused=1 closetimeouts=51 > Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile: > origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0 > resumed=0 > > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage: > origin=impstats utime=217851481196 stime=87433080781 maxrss=628664 > minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592 > nvcsw=379912551 nivcsw=20999555 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet: > origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0 > discarded.nf=0 maxqsize=350911 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet: > origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0 > discarded.nf=0 maxqsize=156862 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet: > origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0 > discarded.nf=0 maxqsize=4243 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet: > origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0 > discarded.nf=0 maxqsize=456471 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue > size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0 > maxqsize=1738 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp > called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct 5 > 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp > called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564 > > To estimate volume of logs this is receiving, I took a recent impstats > output for enqueued for all rule sets and added them, then subtracted the > values from an hour ago, divided by 3600 to get msgs/second and got 29521. > This is peak time of day for us so I would expect this to be the high point, > dropping to less than half that in the middle of the night. Messages are > still randomly lost in the middle of the night. If there's a better way to > estimate volume being processed by the machine, let me know! > > > > > > > look at the inputs, impudp shows the number of messages received > > then you look at the various queues (if everything is in a separate queue, > that also leads to a bottleneck), are any of them getting full at peak load? > > Then you look at the various actions, are any of them showing errors? > > the number of messages enqueued for all items (or even all queues) may be > substantially larger or smaller (depending on if a message is handled by > multiple rules or is filtered out and never processed) > > so your 29K logs/sec is probably a fairly inflated value, but even that > should be just fine with a single worker. > > if you are writing to a local file, you should not need to have a separate > queue (compression could change things, only move things to a separate queue > if testing shows that you run out of CPU otherwise, or if that output could > block) > > > in your dynafile actions, how many different files is it producing? > > for example: > action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" > name="writeEduroamACSFile") > is hard coded to a single file in the example provided, and would be > slightly faster just writing to that file. > > also, run top, enable per-thread reporting (press 'H') and see if any of the > rsyslog threads are using 100% cpu. > > RHEL/CentOS7 do introduce systemd, and systemd may be doing something to > prevent rsyslog from 'hogging' the cpu, but I don't know what it's defaults > are. > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

