2015-10-06 14:57 GMT+02:00 Woodruff, Dan <[email protected]>:
> Thank you David. From what I see, none of the queues are getting full.
>
> None of the actions - I have 73 total - have a 'failed' count of greater
> than zero. Four have a 'suspended' count in the single digits.
>
> When you said "if you are writing to a local file, you should not need to
> have a separate queue", does queue equate to ruleset in this case? The main
> reason I have the separate rulesets is to handle different kinds of incoming
> logs on different ports, making it easier to write the rules logic for the
> different types. Does that make sense?
>
> There is a high number of dynafiles, specifically for server logs, which we
> break out into files by host and programname with these templates:
> template(name="LinuxProgramFile" type="string"
> string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
> template(name="WindowsProgramFile" type="string"
> string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
> ")
>
> The result is about 32,000 total dynafiles which are then rotated out daily.
>
> Looking at top, rsyslogd as a whole never goes above 120%. Most of the
> threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
> about 50% during peak. Although I hadn't noticed this before, watching top
> for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
> to almost 200% for one second, then disappears from top in the next second,
> before returning to 60%. It cycled through this every 10 seconds or so.
> After restarting rsyslogd, the behavior went away - it's staying around 60%.

This could be caused by a too-small dynafile cache. I haven't located
the conf for these actions with 32,000 files nor the stats, but
depending on size it could be the problem. Please post the *full*
impstats log.

Rainer
>
> Should I try lowering the number of threads for the rulesets/imudp? This
> issue does happen with both TCP and UDP, however, which further compounds my
> confusion. You mention systemd - I'll look into that as I have a case open
> with RH.
>
> Dan
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of David Lang
> Sent: Monday, October 5, 2015 3:23 PM
> To: rsyslog-users <[email protected]>
> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
>
> doing a cut-n-past since this client doesn't quote signed messages properly:
>
>
> David, thank you. Apologies, I had cut out the dynafile for testing. Here is
> some additional config file:
>
> template(name="EduroamACSFile" type="string"
> string="/var/log/collection/eduroam_acs.log")
> template(name="EduroamACSFormat" type="string" string="%timestamp%
> %hostname% %syslogtag%%msg:::escape-cc%\n")
>
> ruleset(name="ruleset1" queue.type="LinkedList" queue.size="750000"
> queue.workerthreads="4") {
>          # at very top of ruleset - IPs obscured for privacy
>          if $fromhost-ip == 'x.x.2.75'
>                  or $fromhost-ip == 'x.x.2.85'
>          then {
>                  /tmp/debug.log;RSYSLOG_DebugFormat
>                  action(type="omfile" dynaFile="EduroamACSFile"
> template="EduroamACSFormat" name="writeEduroamACSFile")
>                  stop
>          }
>          #... other rules below
> } # end ruleset
>
> I do have impstats enabled for every 5 minutes. Here's output specific to
> this file for a recent sample for this file, plus the general rule set
> queues:
>
> Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache
> EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342
> evicted=0 maxused=1 closetimeouts=51
> Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile:
> origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0
> resumed=0
>
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage:
> origin=impstats utime=217851481196 stime=87433080781 maxrss=628664
> minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592
> nvcsw=379912551 nivcsw=20999555
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet:
> origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0
> discarded.nf=0 maxqsize=350911
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet:
> origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0
> discarded.nf=0 maxqsize=156862
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet:
> origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0
> discarded.nf=0 maxqsize=4243
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet:
> origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0
> discarded.nf=0 maxqsize=456471
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue
> size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0
> maxqsize=1738
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp
> called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct  5
> 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp
> called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564
>
> To estimate volume of logs this is receiving, I took a recent impstats
> output for enqueued for all rule sets and added them, then subtracted the
> values from an hour ago, divided by 3600 to get msgs/second and got 29521.
> This is peak time of day for us so I would expect this to be the high point,
> dropping to less than half that in the middle of the night. Messages are
> still randomly lost in the middle of the night. If there's a better way to
> estimate volume being processed by the machine, let me know!
>
>
>
>
>
>
> look at the inputs, impudp shows the number of messages received
>
> then you look at the various queues (if everything is in a separate queue,
> that also leads to a bottleneck), are any of them getting full at peak load?
>
> Then you look at the various actions, are any of them showing errors?
>
> the number of messages enqueued for all items (or even all queues) may be
> substantially larger or smaller (depending on if a message is handled by
> multiple rules or is filtered out and never processed)
>
> so your 29K logs/sec is probably a fairly inflated value, but even that
> should be just fine with a single worker.
>
> if you are writing to a local file, you should not need to have a separate
> queue (compression could change things, only move things to a separate queue
> if testing shows that you run out of CPU otherwise, or if that output could
> block)
>
>
> in your dynafile actions, how many different files is it producing?
>
> for example:
> action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat"
> name="writeEduroamACSFile")
> is hard coded to a single file in the example provided, and would be
> slightly faster just writing to that file.
>
> also, run top, enable per-thread reporting (press 'H') and see if any of the
> rsyslog threads are using 100% cpu.
>
> RHEL/CentOS7 do introduce systemd, and systemd may be doing something to
> prevent rsyslog from 'hogging' the cpu, but I don't know what it's defaults
> are.
>
> David Lang
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to