ok, the config you posted did specify threads
check and see if you are running out of entropy on your system, that could cause
encryption to be delayed
Rainer, do you know if the encryption used /dev/random or /dev/urandom? if it's
using /dev/random that could be causing it to run out of entropy and have
problems.
David Lang
On Sat, 3 Nov 2018, Rory Toma wrote:
I can't use imptcp as I'm using ssl. I'm not using threads because imtcp
doesn't recongnize it.
David Lang wrote on 11/3/18 12:51 PM:
I've run rsyslog up to 300k messages/sec (gig-e wire speed), so something
is rong here
try using imptcp instead of imtcp.
also try eliminating the threads=3, you should not need more than one
thread for this data rate. Rsyslog is very efficient but if you don't
actually have enough traffic to keep those threads busy, forcing more
threads can actually slow things down. Only increase the thread count when
you see a thread maxing out the cpu.
David Lang
On Sat, 3 Nov 2018, Rory Toma via rsyslog wrote:
Date: Sat, 3 Nov 2018 12:39:01 -0700
From: Rory Toma via rsyslog <[email protected]>
To: rsyslog-users <[email protected]>,
Rory Toma via rsyslog <[email protected]>
Cc: Rory Toma <[email protected]>
Subject: Re: [rsyslog] Need help with high volume forwarding config
My solution for this now is to set up a stream-based nginx proxy, and run
4 different rsyslog servers on the host. The nginx will least_conn balance
to 4 local ports, and I'll restart 1 of the 4 rsyslog processes every 15
minutes.
Rory Toma via rsyslog wrote on 11/2/18 9:04 PM:
Latest file made it longer, then gets stuck here: I must be hitting some
process/OS limit.
Fri Nov 2 21:03:39 2018: global: origin=dynstats
Fri Nov 2 21:03:39 2018: forward1: origin=core.action processed=2104045
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov 2 21:03:39 2018: forward2: origin=core.action processed=0
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov 2 21:03:39 2018: imtcp(110): origin=imtcp submitted=2104045
Fri Nov 2 21:03:39 2018: resource-usage: origin=impstats
utime=5376347690 stime=217621914 maxrss=242456 minflt=66148 majflt=0
inblock=8 oublock=1480 nvcsw=1120907 nivcsw=8027 openfiles=4034
Fri Nov 2 21:03:39 2018: forward1 queue[DA]: origin=core.queue size=0
enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
Fri Nov 2 21:03:39 2018: forward1 queue: origin=core.queue size=0
enqueued=2104045 full=0 discarded.full=0 discarded.nf=0 maxqsize=781
Fri Nov 2 21:03:39 2018: forward2 queue[DA]: origin=core.queue size=0
enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
Fri Nov 2 21:03:39 2018: forward2 queue: origin=core.queue size=0
enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
Fri Nov 2 21:03:39 2018: main Q: origin=core.queue size=0
enqueued=2104045 full=0 discarded.full=0 discarded.nf=0 maxqsize=575
Rory Toma wrote on 11/2/18 5:45 PM:
BTW, when it stops processing, oublock increases. I can't tell from
casual inspection of code what it's supposed to be.
Rory Toma via rsyslog wrote on 11/2/18 5:45 PM:
OK, I'll try to convert. Thx.
David Lang wrote on 11/2/18 4:41 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:
Sure.
I'm going to try and turn on TCP Keepalive next.
$DefaultNetstreamDriver gtls
# certificate files
$DefaultNetstreamDriverCAFile /opt/rsyslog/certs/ca.pem
$DefaultNetstreamDriverCertFile /opt/rsyslog/certs/cert.pem
$DefaultNetstreamDriverKeyFile /opt/rsyslog/certs/key.pem
I don't think these have any effect with the new function style
syntax. I think you have to specify these each time they are needed.
In some ways it's ugly, but it makes it so that when you are looking
at a line, you don't have to worry about what the lines before it may
have set.
$MaxOpenFiles 1000000
see if you can set this in the global() section
module(load="imtcp" threads="3" MaxSessions="1048544"
StreamDriver.Mode="1" Stre
amDriver.AuthMode="anon") # load TCP listener
module(load="impstats" interval="60" severity="7"
log.file="/export/rsyslog/imps
tats.log" log.syslog="off")
I believe that you want impstats to be loaded in the very first line
to get all your stats properly.
$WorkDirectory /export/rsyslog
also set this in the global() section.
$ActionQueueType LinkedList
$ActionQueueFileName mainmsgq
$ActionQueueSaveOnShutdown on
$ActionQueueHighWaterMark 2000000
$ActionQueueLowWaterMark 1800000
#$ActionQueueSize 4000000
$ActionQueueSize 10000
#$ActionQueueDequeueBatchSize 512
$ActionQueueDequeueBatchSize 100
$ActionQueueMaxDiskSpace 3g
$ActionQueueWorkerThreads 2
$ActionQueueWorkerThreadMinimumMessages 512
all this queue stuff has no effect on any action(), it will only
affect the next legacy action line (which you don't have any of, so
all this has no effect)
you need to convert this to the new syntax as part of the action()
statement, or group the actions into a ruleset and put the queue on
the ruleset.
This explains why we weren't seeing any queue info in the impstats
output.
template(name="json_line" type="list" )
{
constant(value="{")
constant(value="\"time\":\"")
property(name="timegenerated" dateFormat="rfc3339" format="json" )
constant(value="\",\"msg\":\"")
property(name="msg" format="json")
constant(value="\",\"host\":\"")
property(name="hostname" format="json")
constant(value="\",\"svr\":\"")
property(name="syslogseverity-text" format="json")
constant(value="\",\"process\":\"")
property(name="programname" format="json")
constant(value="\",\"tag\":\"")
property(name="syslogtag" format="json")
constant(value="\",\"uuid\":\"")
property(name="uuid" format="json")
constant(value="\",\"rsyshost\":\"")
property(name="$myhostname" format="json")
constant(value="\"}")
}
$InputTCPServerRun 110 # Telo Logs
since you use the new style syntax everywhere else, you should change
this to input() syntax
$InputTCPServerKeepAlive on
action(
type="omfwd"
name="forward1"
template="json_line"
target="cortana-relay-1.corp.ooma.com"
Port="9092"
protocol="tcp"
)
action(
type="omfwd"
name="forward2"
template="json_line"
target="cortana-relay-2.corp.ooma.com"
Port="9092"
protocol="tcp"
action.execOnlyWhenPreviousIsSuspended="on"
)
David Lang wrote on 11/2/18 4:28 PM:
I'm done for today, can you re-post your rsyslog.conf (and any
included files)?
David Lang
On Fri, 2 Nov 2018, Rory Toma wrote:
This is 8.39.
tcpdump shows tons of traffic incoming.
David Lang wrote on 11/2/18 4:25 PM:
what version of rsyslog is this? I'm not seeing the queue stats
for the disk queue that you say you have defined (I no longer have
your original message with your full config)
I do not understand what is happening, main queue size is 0 so
imtcp should be able to accept new messages, your outputs are not
showing syspended, the queue says it has never been full
David Lang
On Fri, 2 Nov 2018, Rory Toma wrote:
If I'm reading right, imtcp thinks it has no new messages?
Fri Nov 2 17:12:09 2018: forward1: origin=core.action
processed=712779 failed=0 suspended=0 suspended.duration=0
resumed=0
Fri Nov 2 17:12:09 2018: forward2: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov 2 17:12:09 2018: imtcp(110): origin=imtcp
submitted=712779
Fri Nov 2 17:12:09 2018: resource-usage: origin=impstats
utime=916388169 stime=142221083 maxrss=90124 minflt=26083
majflt=0 inblock=8 oublock=312 nvcsw=321384 nivcsw=1171
openfiles=1434
Fri Nov 2 17:12:09 2018: main Q: origin=core.queue size=0
enqueued=712779 full=0 discarded.full=0 discarded.nf=0
maxqsize=3548
Fri Nov 2 17:13:09 2018: global: origin=dynstats
Fri Nov 2 17:13:09 2018: forward1: origin=core.action
processed=712779 failed=0 suspended=0 suspended.duration=0
resumed=0
Fri Nov 2 17:13:09 2018: forward2: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov 2 17:13:09 2018: imtcp(110): origin=imtcp
submitted=712779
Fri Nov 2 17:13:09 2018: resource-usage: origin=impstats
utime=976430226 stime=142221083 maxrss=90124 minflt=26083
majflt=0 inblock=8 oublock=328 nvcsw=321385 nivcsw=1187
openfiles=1434
Fri Nov 2 17:13:09 2018: main Q: origin=core.queue size=0
enqueued=712779 full=0 discarded.full=0 discarded.nf=0
maxqsize=3548
Fri Nov 2 17:14:09 2018: global: origin=dynstats
Fri Nov 2 17:14:09 2018: forward1: origin=core.action
processed=712779 failed=0 suspended=0 suspended.duration=0
resumed=0
Fri Nov 2 17:14:09 2018: forward2: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov 2 17:14:09 2018: imtcp(110): origin=imtcp
submitted=712779
Fri Nov 2 17:14:09 2018: resource-usage: origin=impstats
utime=1036475780 stime=142221083 maxrss=90124 minflt=26083
majflt=0 inblock=8 oublock=336 nvcsw=321386 nivcsw=1203
openfiles=1434
Fri Nov 2 17:14:09 2018: main Q: origin=core.queue size=0
enqueued=712779 full=0 discarded.full=0 discarded.nf=0
maxqsize=3548
David Lang wrote on 11/2/18 4:05 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:
WHat am I looking for? As far as I can tell, the number
enqueued remains constant for 15 minutes, then starts up again.
you should see the number processed continue to climb, and the
queue sizes drop gradually, when they drop enough, it will start
accepting new messages.
to avoid your impstats logs getting delayed, write them to a
file directly (or use a separate ruleset that has it's own queue
for the impstats output)
David Lang
David Lang wrote on 11/2/18 3:53 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:
Interesting. It stopped processing for 15 minutes, then
started to spontaneously process incoming packets again,
look at the impstats output, I'll bet that it shows that it
started processing messages again, allowing for more messages
to be accepted.
David Lang
Rory Toma via rsyslog wrote on 11/2/18 4:18 PM:
OK, so the submitted maxes out for imtcp at submitted=583161
At this point, rsyslog thinks there are no more messages,
even though tcpdump shows a ton. It's as if imtcp just stops
working.
David Lang wrote on 11/2/18 2:58 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:
OK, turned off counter reset. Also, I noticed that when I
restarted, if I waited about 2 minutes for all connections
to clear out of FIN_WAIT, et al... It seems to log better.
Now we wait an hour...
So now I see:
Fri Nov 2 15:52:49 2018: imtcp(110): origin=imtcp
submitted=142983
Fri Nov 2 15:52:49 2018: resource-usage: origin=impstats
utime=12115102 stime=2320953 maxrss=56332 minflt=16662
majflt=0 inblock=8 oublock=16 nvcsw=37439 nivcsw=99
openfiles=334
Fri Nov 2 15:52:49 2018: main Q: origin=core.queue size=0
enqueued=142983 full=0 discarded.full=0 discarded.nf=0
maxqsize=58676
Fri Nov 2 15:53:49 2018: global: origin=dynstats
Fri Nov 2 15:53:49 2018: forward1: origin=core.action
processed=153749 failed=0 suspended=0 suspended.duration=0
resumed=0
Fri Nov 2 15:53:49 2018: forward2: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0
resumed=0
so this shows 142983 new messages arriving and 153749
processed (do you have a disk queue somewhere to account
for the extra messages??) but at one point it was 58676
messages behind.
let it run a while to where the throughput slows down and
let's see what it shows.
I expect that the number enqueued will continue to climb,
but the number processed will taper off.
David Lang
David Lang wrote on 11/2/18 2:47 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:
Date: Fri, 2 Nov 2018 15:44:10 -0700
From: Rory Toma <[email protected]>
To: David Lang <[email protected]>
Cc: Rory Toma via rsyslog <[email protected]>
Subject: Re: [rsyslog] Need help with high volume
forwarding config
Does this mean that the packets are not even being
forwarded?
Fri Nov 2 15:41:59 2018: main Q: origin=core.queue
size=0 enqueued=8059 full=0 discarded.full=0
discarded.nf=0 maxqsize=632
so you received 8059 messages
Fri Nov 2 15:42:59 2018: global: origin=dynstats
Fri Nov 2 15:42:59 2018: action-0-builtin:omfwd:
origin=core.action processed=2591 failed=0 suspended=0
suspended.duration=0 resumed=0
and you sent 2591 messages through action 0 (this is why
it's good to have name='something' in the action to be
sure you are looking at the right thing
Fri Nov 2 15:42:59 2018: action-1-builtin:omfwd:
origin=core.action processed=0 failed=0 suspended=0
suspended.duration=0 resumed=0
Fri Nov 2 15:42:59 2018: imtcp(110): origin=imtcp
submitted=2591
Fri Nov 2 15:42:59 2018: resource-usage:
origin=impstats utime=2954582 stime=383935 maxrss=19192
minflt=5924 majflt=0 inblock=8 oublock=32 nvcsw=3462
nivcsw=7 openfiles=250
Fri Nov 2 15:42:59 2018: main Q: origin=core.queue
size=0 enqueued=2591 full=0 discarded.full=0
discarded.nf=0 maxqsize=632
but here it says there were only 2591 messages received,
are you resetting the counters each time? if so, it's
probably best not to do that right now.
David Lang
David Lang wrote on 11/2/18 2:34 PM:
On Fri, 2 Nov 2018, Rory Toma via rsyslog wrote:
We have several rsyslog hosts that forward to a
logstash server. It runs great, then after about an
hour, data slows down until we get a trickle. I did
not see anything last time I ran impstats, so I'm
stuck. Here's my config (centos7, rsyslog 8.39) Any
advice how to debug this?
Well, logstash has lots of bottlenecks, is it keeping
up or is it refusing to accept more data?
what does impstats show? does it show the output to
logstash being suspended?
until you know that the recipient is able to receive
more logs, I don't know that it's worth looking at the
rsyslog config.
David Lang
--
Sent from Postbox
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.