On Tue, 3 Mar 2015, Tim Powell wrote:
I've run into an unusual issue on one of my syslog hosts over the last
couple of days, where the main queue (viewed from debug log) is accepting
the incoming messages (UDP traffic) but appears to stop processing any of
them after the first few.
you have an action queue for the local file, so if the filesystem fills up it
will continue to try and send to the remote server, but you don't have an action
queue on the remote sender, so if there is any communication problem with the
remote sender, all processing will stall until it clears up.
Normally, having an action queue on writes to local files is a waste of time. It
actually takes significantly longer for rsyslog to move the log to the action
queue when competing with the worker thread that reads from the action queue and
writes to disk than it would take to just write to disk. You would only want to
have such a queue if you expect the disk to not be available on a regular basis
(unlikely, but anything can happen, and somewhere will happen :-)
You probably do want to have an action queue on remote connections, so that when
the remote system isn't available, or a firewall blocks the connection, the
output to local files will continue.
ElasticSearch probably needs an action queue as well.
I would suggest that you add impstats to your config so that you can see the
stats for each queue and output.
David Lang
The syslog host in question receives around 200 messages per second, each
of those messages is logged to a local file per host (which is kept for a
couple of days), forwarded onto a third party collector for a security
monitoring service, and forwarded internally to an elasticsearch.
The config is simple (some parts edited for security) :
module(load="imuxsock
module(load="imklog")
module(load="omelasticsearch")
#UDP syslog reception
module(load="imudp" threads="2" timeRequery="8" batchSize="128")
input(type="imudp" port="514")
#Main queue configuration options
main_queue(queue.size="1000000"
queue.dequeuebatchsize="1000"
queue.workerthreads="4"
#Local file action queue
template(name="local"
type="string"
string="/rsyslog/%$now%/%fromhost-ip%"
)
*.* action(name="LocalFile"
type="omfile"
dirCreateMode="0644"
FileCreateMode="0644"
dynafile="local"
dynaFileCacheSize="200"
ioBufferSize="64k"
flushOnTXEnd="off"
asyncWriting="on"
)
#Offsite sender
*.* action(name="Offsite"
type="omfwd"
Target="**.**.200.15"
Port="514"
Protocol="udp"
)
The rest of some elasticsearch templates, with disk backed action queues.
However even if I strip the config down to just the listener and the omfwd
or omfile action I see the same behaviour.
The ruleset loads without error and are assigned to the parser list, the
queues are created, and rsyslog starts processing messages normally,
however after a small number the main_queue starts to grow and nothing is
passed out to the additional queues. If I reduce the size of the main
queue it does actually seem to start processing events in order to clear
space once full, though it's possible it is just deleting them.
Initially the first few messages process correctly (this is with just the
omfwd action enabled) :
0505.662792001:7f05a605d700: recv(6,665),acl:1,msg:<134>Message removed
0505.662826814:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<134>removed'
0505.662842656:7f05a605d700: parse using parser list 0x2519de0 (the
default list).
0505.662858651:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160
0505.662878951:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).
0505.662922371:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0
0505.662942472:7f05a605d700: imudp: recvmmsg returned -1
0505.662961848:7f05a605d700: main Q: qqueueAdd: entry added, size now log
1, phys 1 entries
0505.663003721:main Q:Reg/w0 : wti 0x254e440: worker awoke from idle
processing
0505.663023752:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
and enqueued 0 objects
0505.663037415:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new sizes: log 1, phys 1
0505.663067689:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
processed
0505.663075034:main Q:Reg/w0 : processBATCH: next msg 0: <134>removed
0505.663081549:main Q:Reg/w0 : ACTION 0
[builtin:omfwd:action(type="builtin:omfwd" ...)]
0505.663094439:main Q:Reg/w0 : executing action 0
0505.663100148:main Q:Reg/w0 : Called action, logging to builtin:omfwd
0505.663113302:main Q:Reg/w0 : dnscache: entry (nil) found
0505.663522101:7f05a605d700: main Q: MultiEnqObj advised worker start
0505.664168111:main Q:Reg/w0 : action 0 is transactional - executing in
commit phase
0505.664277140:main Q:Reg/w0 : omfwd: beginTransaction
0505.664343941:main Q:Reg/w0 : Action 0 transitioned to state: itx
0505.664386642:main Q:Reg/w0 : END batch execution phase, entering to
commit phase
0505.664417442:main Q:Reg/w0 : actionCommitAll: action 0, state 1, nbr to
commit 1 isTransactional 1
0505.664454895:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x2527160
0505.664493525:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 0, nMsgs 1
0505.664530029:main Q:Reg/w0 : **.**.200.15:514/udp
0505.664581992:main Q:Reg/w0 : omfwd: endTransaction, offsSndBuf 0, iRet
0
0505.664643366:main Q:Reg/w0 : Action 0 transitioned to state: rdy
0505.664748610:main Q:Reg/w0 : omfwd: beginTransaction
0505.664809454:main Q:Reg/w0 : Action 0 transitioned to state: itx
0505.664907608:main Q:Reg/w0 : Action 0 transitioned to state: rdy
0505.665064133:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
0505.665225575:main Q:Reg/w0 : processBATCH: batch of 1 elements has been
processed
0505.665350767:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0
sz phys 1
0505.665452514:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects
and enqueued 0 objects
0505.665615316:imudp.c : imudp: epoll_wait() returned with 1 fds
0505.665657702:imudp.c : imudp: recvmmsg returned 2
Then we start to stop taking any further action
0505.765150069:7f05a605d700: recv(6,568),acl:1,msg:<134>(message removed)
0505.765262198:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<134>hostname <Tue, 03 Mar 2015 16:35'
0505.765323278:7f05a605d700: parse using parser list 0x2519de0 (the
default list).
0505.765412683:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160
0505.765474091:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).
0505.765588189:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0
0505.765675642:7f05a605d700: imudp: recvmmsg returned -1
0505.765745277:7f05a605d700: main Q: qqueueAdd: entry added, size now log
1, phys 5 entries
0505.765850825:7f05a605d700: main Q: MultiEnqObj advised worker start
0505.767099018:7f05a605d700: imudp: epoll_wait() returned with 1 fds
0505.767246991:imudp.c : imudp: epoll_wait() returned with 1 fds
0505.767316623:imudp.c : imudp: recvmmsg returned -1
0505.767409356:7f05a605d700: imudp: recvmmsg returned 1
And we end up with a main queue full of unprocessed messages, tcpdump at
the same time shows no outbound traffic.
0520.730971322:7f05a605d700: recv(6,143),acl:1,msg:<182>(message removed)
0520.731028834:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<182>removed'
0520.731060134:7f05a605d700: parse using parser list 0x2519de0 (the
default list).
0520.731087300:7f05a605d700: dropped LF at very end of message
(DropTrailingLF is set)
0520.731140689:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160
0520.731208366:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).
0520.731272740:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0
0520.731340593:7f05a605d700: imudp: recvmmsg returned -1
0520.731389028:7f05a605d700: main Q: qqueueAdd: entry added, size now log
3836, phys 4836 entries
0520.731421704:7f05a605d700: main Q: MultiEnqObj advised worker start
0520.741245762:7f05a605d700: imudp: epoll_wait() returned with 1 fds
0520.741409724:7f05a605d700: imudp: recvmmsg returned 2
On stopping syslog everything is has queued is lost :
0525.902788727:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 267
objects and enqueued 733 objects
0525.902839113:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new sizes: log 4593, phys 4593
0525.902936061:main Q:Reg/w0 : main Q:Reg/w0: done cancelation cleanup
handler.
0525.903050386:main Q:Reg/w0 : main Q:Reg: Worker thread 254e440
requested to be cancelled.
0525.903146548:main Q:Reg/w0 : main Q:Reg: Worker thread 254e440,
terminated, num workers now 0
0525.903257112:main Q:Reg/w0 : destructor for debug call stack
0x7f05980008c0 called
0525.906195836:main thread : main Q: worker threads terminated,
remaining queue size log 4593, phys 4593.
0525.906253112:main thread : main Q: error -7 persisting queue - data
lost!
0525.906266571:main thread : main Q: queue (type 0) will lose 4593
messages, destroying...
This issue actually started with our rsyslog host running 7.6.7, I have
updated to 8.8 off the back of this issue as an attempt to resolve it.
I have tried this with RHEL v6, CentOS v6 and Ubuntu 14.04, the same
behaviour is seen on each. I suspect I must be doing something wrong with
the config, however stripping it down to even the most basic action still
fails which makes me think perhaps something else is happening with this.
Any ideas appreciated.
Tim
Notice:
This message contains information that may be privileged or confidential and is
the property of Skyscape. It is intended only for the person to whom it is
addressed. If you are not the intended recipient, you are not authorised to
read, print, retain, copy, disseminate, distribute, or use this message or any
part thereof. If you receive this message in error, please notify the sender
immediately and delete all copies of this message. Skyscape reserves the right
to monitor all e-mail communications through its networks. Skyscape Cloud
Services Limited is registered in England and Wales: Company No: 07619797.
Registered office: Hartham Park, Hartham, Corsham, Wiltshire SN13 0RP.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
______________________________________________________________________
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.