Thank you David. From what I see, none of the queues are getting full.

None of the actions - I have 73 total - have a 'failed' count of greater
than zero. Four have a 'suspended' count in the single digits. 

When you said "if you are writing to a local file, you should not need to
have a separate queue", does queue equate to ruleset in this case? The main
reason I have the separate rulesets is to handle different kinds of incoming
logs on different ports, making it easier to write the rules logic for the
different types. Does that make sense?

There is a high number of dynafiles, specifically for server logs, which we
break out into files by host and programname with these templates:
template(name="LinuxProgramFile" type="string"
string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
template(name="WindowsProgramFile" type="string"
string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
")

The result is about 32,000 total dynafiles which are then rotated out daily.

Looking at top, rsyslogd as a whole never goes above 120%. Most of the
threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
about 50% during peak. Although I hadn't noticed this before, watching top
for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
to almost 200% for one second, then disappears from top in the next second,
before returning to 60%. It cycled through this every 10 seconds or so.
After restarting rsyslogd, the behavior went away - it's staying around 60%.

Should I try lowering the number of threads for the rulesets/imudp? This
issue does happen with both TCP and UDP, however, which further compounds my
confusion. You mention systemd - I'll look into that as I have a case open
with RH.

Dan


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Lang
Sent: Monday, October 5, 2015 3:23 PM
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

doing a cut-n-past since this client doesn't quote signed messages properly:


David, thank you. Apologies, I had cut out the dynafile for testing. Here is
some additional config file:

template(name="EduroamACSFile" type="string"
string="/var/log/collection/eduroam_acs.log")
template(name="EduroamACSFormat" type="string" string="%timestamp%
%hostname% %syslogtag%%msg:::escape-cc%\n")

ruleset(name="ruleset1" queue.type="LinkedList" queue.size="750000"
queue.workerthreads="4") {
         # at very top of ruleset - IPs obscured for privacy
         if $fromhost-ip == 'x.x.2.75'
                 or $fromhost-ip == 'x.x.2.85'
         then {
                 /tmp/debug.log;RSYSLOG_DebugFormat
                 action(type="omfile" dynaFile="EduroamACSFile"
template="EduroamACSFormat" name="writeEduroamACSFile")
                 stop
         }
         #... other rules below
} # end ruleset

I do have impstats enabled for every 5 minutes. Here's output specific to
this file for a recent sample for this file, plus the general rule set
queues:

Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache
EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342
evicted=0 maxused=1 closetimeouts=51
Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile:
origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0
resumed=0

Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage:
origin=impstats utime=217851481196 stime=87433080781 maxrss=628664
minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592
nvcsw=379912551 nivcsw=20999555
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet:
origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0
discarded.nf=0 maxqsize=350911
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet:
origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0
discarded.nf=0 maxqsize=156862
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet:
origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0
discarded.nf=0 maxqsize=4243
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet:
origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0
discarded.nf=0 maxqsize=456471
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue
size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0
maxqsize=1738
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp
called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct  5
14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp
called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564

To estimate volume of logs this is receiving, I took a recent impstats
output for enqueued for all rule sets and added them, then subtracted the
values from an hour ago, divided by 3600 to get msgs/second and got 29521.
This is peak time of day for us so I would expect this to be the high point,
dropping to less than half that in the middle of the night. Messages are
still randomly lost in the middle of the night. If there's a better way to
estimate volume being processed by the machine, let me know!






look at the inputs, impudp shows the number of messages received

then you look at the various queues (if everything is in a separate queue,
that also leads to a bottleneck), are any of them getting full at peak load?

Then you look at the various actions, are any of them showing errors?

the number of messages enqueued for all items (or even all queues) may be
substantially larger or smaller (depending on if a message is handled by
multiple rules or is filtered out and never processed)

so your 29K logs/sec is probably a fairly inflated value, but even that
should be just fine with a single worker.

if you are writing to a local file, you should not need to have a separate
queue (compression could change things, only move things to a separate queue
if testing shows that you run out of CPU otherwise, or if that output could
block)


in your dynafile actions, how many different files is it producing?

for example:
action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" 
name="writeEduroamACSFile")
is hard coded to a single file in the example provided, and would be
slightly faster just writing to that file.

also, run top, enable per-thread reporting (press 'H') and see if any of the
rsyslog threads are using 100% cpu.

RHEL/CentOS7 do introduce systemd, and systemd may be doing something to
prevent rsyslog from 'hogging' the cpu, but I don't know what it's defaults
are.

David Lang

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to