I've run rsyslog up to 300k messages/sec (gig-e wire speed), so something is rong here

try using imptcp instead of imtcp.

also try eliminating the threads=3, you should not need more than one thread for this data rate. Rsyslog is very efficient but if you don't actually have enough traffic to keep those threads busy, forcing more threads can actually slow things down. Only increase the thread count when you see a thread maxing out the cpu.

David Lang


On Sat, 3 Nov 2018, Rory Toma via rsyslog wrote:

Date: Sat, 3 Nov 2018 12:39:01 -0700
From: Rory Toma via rsyslog <[email protected]>
To: rsyslog-users <[email protected]>,
    Rory Toma via rsyslog <[email protected]>
Cc: Rory Toma <[email protected]>
Subject: Re: [rsyslog] Need help with high volume forwarding config

My solution for this now is to set up a stream-based nginx proxy, and run 4 different rsyslog servers on the host. The nginx will least_conn balance to 4 local ports, and I'll restart 1 of the 4 rsyslog processes every 15 minutes.

Rory Toma via rsyslog wrote on 11/2/18 9:04 PM:
Latest file made it longer, then gets stuck here: I must be hitting some process/OS limit.

Fri Nov  2 21:03:39 2018: global: origin=dynstats
Fri Nov  2 21:03:39 2018: forward1: origin=core.action processed=2104045 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 21:03:39 2018: forward2: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Nov  2 21:03:39 2018: imtcp(110): origin=imtcp submitted=2104045
Fri Nov  2 21:03:39 2018: resource-usage: origin=impstats utime=5376347690 stime=217621914 maxrss=242456 minflt=66148 majflt=0 inblock=8 oublock=1480 nvcsw=1120907 nivcsw=8027 openfiles=4034 Fri Nov  2 21:03:39 2018: forward1 queue[DA]: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Nov  2 21:03:39 2018: forward1 queue: origin=core.queue size=0 enqueued=2104045 full=0 discarded.full=0 discarded.nf=0 maxqsize=781 Fri Nov  2 21:03:39 2018: forward2 queue[DA]: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Nov  2 21:03:39 2018: forward2 queue: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Nov  2 21:03:39 2018: main Q: origin=core.queue size=0 enqueued=2104045 full=0 discarded.full=0 discarded.nf=0 maxqsize=575

Rory Toma wrote on 11/2/18 5:45 PM:
BTW, when it stops processing, oublock increases. I can't tell from casual inspection of code what it's supposed to be.

Rory Toma via rsyslog wrote on 11/2/18 5:45 PM:
OK, I'll try to convert. Thx.

David Lang wrote on 11/2/18 4:41 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:

Sure.
I'm going to try and turn on TCP Keepalive next.
$DefaultNetstreamDriver gtls

# certificate files
$DefaultNetstreamDriverCAFile /opt/rsyslog/certs/ca.pem
$DefaultNetstreamDriverCertFile /opt/rsyslog/certs/cert.pem
$DefaultNetstreamDriverKeyFile /opt/rsyslog/certs/key.pem

I don't think these have any effect with the new function style syntax. I think you have to specify these each time they are needed. In some ways it's ugly, but it makes it so that when you are looking at a line, you don't have to worry about what the lines before it may have set.

$MaxOpenFiles 1000000

see if you can set this in the global() section

module(load="imtcp" threads="3" MaxSessions="1048544" StreamDriver.Mode="1" Stre
amDriver.AuthMode="anon") # load TCP listener
module(load="impstats" interval="60" severity="7" log.file="/export/rsyslog/imps
tats.log" log.syslog="off")

I believe that you want impstats to be loaded in the very first line to get all your stats properly.

$WorkDirectory /export/rsyslog

also set this in the global() section.

$ActionQueueType LinkedList
$ActionQueueFileName mainmsgq
$ActionQueueSaveOnShutdown on
$ActionQueueHighWaterMark 2000000
$ActionQueueLowWaterMark 1800000
#$ActionQueueSize 4000000
$ActionQueueSize 10000
#$ActionQueueDequeueBatchSize 512
$ActionQueueDequeueBatchSize 100
$ActionQueueMaxDiskSpace 3g
$ActionQueueWorkerThreads 2
$ActionQueueWorkerThreadMinimumMessages 512

all this queue stuff has no effect on any action(), it will only affect the next legacy action line (which you don't have any of, so all this has no effect)

you need to convert this to the new syntax as part of the action() statement, or group the actions into a ruleset and put the queue on the ruleset.

This explains why we weren't seeing any queue info in the impstats output.

template(name="json_line" type="list" )
{
constant(value="{")
constant(value="\"time\":\"")
property(name="timegenerated" dateFormat="rfc3339" format="json" )
constant(value="\",\"msg\":\"")
property(name="msg" format="json")
constant(value="\",\"host\":\"")
property(name="hostname" format="json")
constant(value="\",\"svr\":\"")
property(name="syslogseverity-text" format="json")
constant(value="\",\"process\":\"")
property(name="programname" format="json")
constant(value="\",\"tag\":\"")
property(name="syslogtag" format="json")
constant(value="\",\"uuid\":\"")
property(name="uuid" format="json")
constant(value="\",\"rsyshost\":\"")
property(name="$myhostname" format="json")
constant(value="\"}")
}

$InputTCPServerRun 110 # Telo Logs

since you use the new style syntax everywhere else, you should change this to input() syntax

$InputTCPServerKeepAlive on

action(
        type="omfwd"
        name="forward1"
        template="json_line"
        target="cortana-relay-1.corp.ooma.com"
        Port="9092"
        protocol="tcp"
)

action(
        type="omfwd"
        name="forward2"
        template="json_line"
        target="cortana-relay-2.corp.ooma.com"
        Port="9092"
        protocol="tcp"
        action.execOnlyWhenPreviousIsSuspended="on"
)


David Lang wrote on 11/2/18 4:28 PM:
I'm done for today, can you re-post your rsyslog.conf (and any included files)?

David Lang

On Fri, 2 Nov 2018, Rory Toma wrote:

This is 8.39.
tcpdump shows tons of traffic incoming.


David Lang wrote on 11/2/18 4:25 PM:
what version of rsyslog is this? I'm not seeing the queue stats for the disk queue that you say you have defined (I no longer have your original message with your full config)

I do not understand what is happening, main queue size is 0 so imtcp should be able to accept new messages, your outputs are not showing syspended, the queue says it has never been full

David Lang

On Fri, 2 Nov 2018, Rory Toma wrote:

If I'm reading right, imtcp thinks it has no new messages?
Fri Nov  2 17:12:09 2018: forward1: origin=core.action processed=712779 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:12:09 2018: forward2: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:12:09 2018: imtcp(110): origin=imtcp submitted=712779 Fri Nov  2 17:12:09 2018: resource-usage: origin=impstats utime=916388169 stime=142221083 maxrss=90124 minflt=26083 majflt=0 inblock=8 oublock=312 nvcsw=321384 nivcsw=1171 openfiles=1434 Fri Nov  2 17:12:09 2018: main Q: origin=core.queue size=0 enqueued=712779 full=0 discarded.full=0 discarded.nf=0 maxqsize=3548
Fri Nov  2 17:13:09 2018: global: origin=dynstats
Fri Nov  2 17:13:09 2018: forward1: origin=core.action processed=712779 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:13:09 2018: forward2: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:13:09 2018: imtcp(110): origin=imtcp submitted=712779 Fri Nov  2 17:13:09 2018: resource-usage: origin=impstats utime=976430226 stime=142221083 maxrss=90124 minflt=26083 majflt=0 inblock=8 oublock=328 nvcsw=321385 nivcsw=1187 openfiles=1434 Fri Nov  2 17:13:09 2018: main Q: origin=core.queue size=0 enqueued=712779 full=0 discarded.full=0 discarded.nf=0 maxqsize=3548
Fri Nov  2 17:14:09 2018: global: origin=dynstats
Fri Nov  2 17:14:09 2018: forward1: origin=core.action processed=712779 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:14:09 2018: forward2: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 17:14:09 2018: imtcp(110): origin=imtcp submitted=712779 Fri Nov  2 17:14:09 2018: resource-usage: origin=impstats utime=1036475780 stime=142221083 maxrss=90124 minflt=26083 majflt=0 inblock=8 oublock=336 nvcsw=321386 nivcsw=1203 openfiles=1434 Fri Nov  2 17:14:09 2018: main Q: origin=core.queue size=0 enqueued=712779 full=0 discarded.full=0 discarded.nf=0 maxqsize=3548

David Lang wrote on 11/2/18 4:05 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:

WHat am I looking for? As far as I can tell, the number enqueued remains constant for 15 minutes, then starts up again.

you should see the number processed continue to climb, and the queue sizes drop gradually, when they drop enough, it will start accepting new messages.

to avoid your impstats logs getting delayed, write them to a file directly (or use a separate ruleset that has it's own queue for the impstats output)

David Lang

David Lang wrote on 11/2/18 3:53 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:

Interesting. It stopped processing for 15 minutes, then started to spontaneously process incoming packets again,

look at the impstats output, I'll bet that it shows that it started processing messages again, allowing for more messages to be accepted.

David Lang

Rory Toma via rsyslog wrote on 11/2/18 4:18 PM:
OK, so the submitted maxes out for imtcp at submitted=583161

At this point, rsyslog thinks there are no more messages, even though tcpdump shows a ton. It's as if imtcp just stops working.

David Lang wrote on 11/2/18 2:58 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:

OK, turned off counter reset. Also, I noticed that when I restarted, if I waited about 2 minutes for all connections to clear out of FIN_WAIT, et al... It seems to log better.
Now we wait an hour...

So now I see:
Fri Nov  2 15:52:49 2018: imtcp(110): origin=imtcp submitted=142983 Fri Nov  2 15:52:49 2018: resource-usage: origin=impstats utime=12115102 stime=2320953 maxrss=56332 minflt=16662 majflt=0 inblock=8 oublock=16 nvcsw=37439 nivcsw=99 openfiles=334 Fri Nov  2 15:52:49 2018: main Q: origin=core.queue size=0 enqueued=142983  full=0 discarded.full=0 discarded.nf=0 maxqsize=58676
Fri Nov  2 15:53:49 2018: global: origin=dynstats
Fri Nov  2 15:53:49 2018: forward1: origin=core.action processed=153749 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 15:53:49 2018: forward2: origin=core.action processed=0 failed=0  suspended=0 suspended.duration=0 resumed=0

so this shows 142983 new messages arriving and 153749 processed (do you have a disk queue somewhere to account for the extra messages??) but at one point it was 58676 messages behind.

let it run a while to where the throughput slows down and let's see what it shows.

I expect that the number enqueued will continue to climb, but the number processed will taper off.

David Lang


David Lang wrote on 11/2/18 2:47 PM:
On Fri, 2 Nov 2018, Rory Toma wrote:

Date: Fri, 2 Nov 2018 15:44:10 -0700
From: Rory Toma <[email protected]>
To: David Lang <[email protected]>
Cc: Rory Toma via rsyslog <[email protected]>
Subject: Re: [rsyslog] Need help with high volume forwarding config

Does this mean that the packets are not even being forwarded?

Fri Nov  2 15:41:59 2018: main Q: origin=core.queue size=0 enqueued=8059 full=0 discarded.full=0 discarded.nf=0 maxqsize=632

so you received 8059 messages

Fri Nov  2 15:42:59 2018: global: origin=dynstats
Fri Nov  2 15:42:59 2018: action-0-builtin:omfwd: origin=core.action processed=2591 failed=0 suspended=0 suspended.duration=0 resumed=0

and you sent 2591 messages through action 0 (this is why it's good to have name='something' in the action to be sure you are looking at the right thing

Fri Nov  2 15:42:59 2018: action-1-builtin:omfwd: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Nov  2 15:42:59 2018: imtcp(110): origin=imtcp submitted=2591 Fri Nov  2 15:42:59 2018: resource-usage: origin=impstats utime=2954582 stime=383935 maxrss=19192 minflt=5924 majflt=0 inblock=8 oublock=32 nvcsw=3462 nivcsw=7 openfiles=250 Fri Nov  2 15:42:59 2018: main Q: origin=core.queue size=0 enqueued=2591 full=0 discarded.full=0 discarded.nf=0 maxqsize=632

but here it says there were only 2591 messages received, are you resetting the counters each time? if so, it's probably best not to do that right now.

David Lang


David Lang wrote on 11/2/18 2:34 PM:
On Fri, 2 Nov 2018, Rory Toma via rsyslog wrote:

We have several rsyslog hosts that forward to a logstash server. It runs great, then after about an hour, data slows down until we get a trickle. I did not see anything last time I ran impstats, so I'm stuck. Here's my config (centos7, rsyslog 8.39) Any advice how to debug this?

Well, logstash has lots of bottlenecks, is it keeping up or is it refusing to accept more data?

what does impstats show? does it show the output to logstash being suspended?

until you know that the recipient is able to receive more logs, I don't know that it's worth looking at the rsyslog config.

David Lang

















--
Sent from Postbox
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>




_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to