On Tue, 3 Mar 2015, Tim Powell wrote:

I've run into an unusual issue on one of my syslog hosts over the last
couple of days, where the main queue (viewed from debug log) is accepting
the incoming messages (UDP traffic) but appears to stop processing any of
them after the first few.

you have an action queue for the local file, so if the filesystem fills up it will continue to try and send to the remote server, but you don't have an action queue on the remote sender, so if there is any communication problem with the remote sender, all processing will stall until it clears up.

Normally, having an action queue on writes to local files is a waste of time. It actually takes significantly longer for rsyslog to move the log to the action queue when competing with the worker thread that reads from the action queue and writes to disk than it would take to just write to disk. You would only want to have such a queue if you expect the disk to not be available on a regular basis (unlikely, but anything can happen, and somewhere will happen :-)

You probably do want to have an action queue on remote connections, so that when the remote system isn't available, or a firewall blocks the connection, the output to local files will continue.

ElasticSearch probably needs an action queue as well.

I would suggest that you add impstats to your config so that you can see the stats for each queue and output.

David Lang



The syslog host in question receives around 200 messages per second, each
of those messages  is logged to a local file per host (which is kept for a
couple of days), forwarded onto a third party collector for a security
monitoring service, and forwarded internally to an elasticsearch.



The config is simple (some parts edited for security) :



module(load="imuxsock

module(load="imklog")

module(load="omelasticsearch")



#UDP syslog reception

module(load="imudp" threads="2" timeRequery="8" batchSize="128")

input(type="imudp" port="514")



#Main queue configuration options

main_queue(queue.size="1000000"

          queue.dequeuebatchsize="1000"

          queue.workerthreads="4"



#Local file action queue

template(name="local"

        type="string"

        string="/rsyslog/%$now%/%fromhost-ip%"

)



*.* action(name="LocalFile"

          type="omfile"

          dirCreateMode="0644"

          FileCreateMode="0644"

          dynafile="local"

          dynaFileCacheSize="200"

          ioBufferSize="64k"

          flushOnTXEnd="off"

          asyncWriting="on"

)



#Offsite sender

*.* action(name="Offsite"

          type="omfwd"

          Target="**.**.200.15"

          Port="514"

          Protocol="udp"

)



The rest of some elasticsearch templates, with disk backed action queues.



However even if I strip the config down to just the listener and the omfwd
or omfile action I see the same behaviour.



The ruleset loads without error and are assigned to the parser list, the
queues are created, and rsyslog starts processing messages normally,
however after a small number the main_queue starts to grow and nothing is
passed out to the additional queues. If I reduce the size of the main
queue it does actually seem to start processing events in order to clear
space once full, though it's possible it is just deleting them.



Initially the first few messages process correctly (this is with just the
omfwd action enabled) :



0505.662792001:7f05a605d700: recv(6,665),acl:1,msg:<134>Message removed

0505.662826814:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<134>removed'

0505.662842656:7f05a605d700: parse using parser list 0x2519de0 (the
default list).

0505.662858651:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160

0505.662878951:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).

0505.662922371:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0

0505.662942472:7f05a605d700: imudp: recvmmsg returned -1

0505.662961848:7f05a605d700: main Q: qqueueAdd: entry added, size now log
1, phys 1 entries

0505.663003721:main Q:Reg/w0  : wti 0x254e440: worker awoke from idle
processing

0505.663023752:main Q:Reg/w0  : DeleteProcessedBatch: we deleted 0 objects
and enqueued 0 objects

0505.663037415:main Q:Reg/w0  : doDeleteBatch: delete batch from store,
new sizes: log 1, phys 1

0505.663067689:main Q:Reg/w0  : processBATCH: batch of 1 elements must be
processed

0505.663075034:main Q:Reg/w0  : processBATCH: next msg 0: <134>removed

0505.663081549:main Q:Reg/w0  :     ACTION 0
[builtin:omfwd:action(type="builtin:omfwd" ...)]

0505.663094439:main Q:Reg/w0  : executing action 0

0505.663100148:main Q:Reg/w0  : Called action, logging to builtin:omfwd

0505.663113302:main Q:Reg/w0  : dnscache: entry (nil) found

0505.663522101:7f05a605d700: main Q: MultiEnqObj advised worker start

0505.664168111:main Q:Reg/w0  : action 0 is transactional - executing in
commit phase

0505.664277140:main Q:Reg/w0  : omfwd: beginTransaction

0505.664343941:main Q:Reg/w0  : Action 0 transitioned to state: itx

0505.664386642:main Q:Reg/w0  : END batch execution phase, entering to
commit phase

0505.664417442:main Q:Reg/w0  : actionCommitAll: action 0, state 1, nbr to
commit 1 isTransactional 1

0505.664454895:main Q:Reg/w0  : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x2527160

0505.664493525:main Q:Reg/w0  : entering actionCallCommitTransaction(),
state: itx, actionNbr 0, nMsgs 1

0505.664530029:main Q:Reg/w0  :  **.**.200.15:514/udp

0505.664581992:main Q:Reg/w0  : omfwd: endTransaction, offsSndBuf 0, iRet
0

0505.664643366:main Q:Reg/w0  : Action 0 transitioned to state: rdy

0505.664748610:main Q:Reg/w0  : omfwd: beginTransaction

0505.664809454:main Q:Reg/w0  : Action 0 transitioned to state: itx

0505.664907608:main Q:Reg/w0  : Action 0 transitioned to state: rdy

0505.665064133:main Q:Reg/w0  : actionCommit, in retry loop, iRet 0

0505.665225575:main Q:Reg/w0  : processBATCH: batch of 1 elements has been
processed

0505.665350767:main Q:Reg/w0  : regular consumer finished, iret=0, szlog 0
sz phys 1

0505.665452514:main Q:Reg/w0  : DeleteProcessedBatch: we deleted 1 objects
and enqueued 0 objects

0505.665615316:imudp.c        : imudp: epoll_wait() returned with 1 fds

0505.665657702:imudp.c        : imudp: recvmmsg returned 2



Then we start to stop taking any further action



0505.765150069:7f05a605d700: recv(6,568),acl:1,msg:<134>(message removed)

0505.765262198:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<134>hostname <Tue, 03 Mar 2015 16:35'

0505.765323278:7f05a605d700: parse using parser list 0x2519de0 (the
default list).

0505.765412683:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160

0505.765474091:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).

0505.765588189:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0

0505.765675642:7f05a605d700: imudp: recvmmsg returned -1

0505.765745277:7f05a605d700: main Q: qqueueAdd: entry added, size now log
1, phys 5 entries

0505.765850825:7f05a605d700: main Q: MultiEnqObj advised worker start

0505.767099018:7f05a605d700: imudp: epoll_wait() returned with 1 fds

0505.767246991:imudp.c        : imudp: epoll_wait() returned with 1 fds

0505.767316623:imudp.c        : imudp: recvmmsg returned -1

0505.767409356:7f05a605d700: imudp: recvmmsg returned 1



And we end up with a main queue full of unprocessed messages, tcpdump at
the same time shows no outbound traffic.



0520.730971322:7f05a605d700: recv(6,143),acl:1,msg:<182>(message removed)

0520.731028834:7f05a605d700: msg parser: flags 70, from '~NOTRESOLVED~',
msg '<182>removed'

0520.731060134:7f05a605d700: parse using parser list 0x2519de0 (the
default list).

0520.731087300:7f05a605d700: dropped LF at very end of message
(DropTrailingLF is set)

0520.731140689:7f05a605d700: Parser 'rsyslog.rfc5424' returned -2160

0520.731208366:7f05a605d700: Message will now be parsed by the legacy
syslog parser (one size fits all... ;)).

0520.731272740:7f05a605d700: Parser 'rsyslog.rfc3164' returned 0

0520.731340593:7f05a605d700: imudp: recvmmsg returned -1

0520.731389028:7f05a605d700: main Q: qqueueAdd: entry added, size now log
3836, phys 4836 entries

0520.731421704:7f05a605d700: main Q: MultiEnqObj advised worker start

0520.741245762:7f05a605d700: imudp: epoll_wait() returned with 1 fds

0520.741409724:7f05a605d700: imudp: recvmmsg returned 2



On stopping syslog everything is has queued is lost :



0525.902788727:main Q:Reg/w0  : DeleteProcessedBatch: we deleted 267
objects and enqueued 733 objects

0525.902839113:main Q:Reg/w0  : doDeleteBatch: delete batch from store,
new sizes: log 4593, phys 4593

0525.902936061:main Q:Reg/w0  : main Q:Reg/w0: done cancelation cleanup
handler.

0525.903050386:main Q:Reg/w0  : main Q:Reg: Worker thread 254e440
requested to be cancelled.

0525.903146548:main Q:Reg/w0  : main Q:Reg: Worker thread 254e440,
terminated, num workers now 0

0525.903257112:main Q:Reg/w0  : destructor for debug call stack
0x7f05980008c0 called

0525.906195836:main thread    : main Q: worker threads terminated,
remaining queue size log 4593, phys 4593.

0525.906253112:main thread    : main Q: error -7 persisting queue - data
lost!

0525.906266571:main thread    : main Q: queue (type 0) will lose 4593
messages, destroying...



This issue actually started with our rsyslog host running 7.6.7, I have
updated to 8.8 off the back of this issue as an attempt to resolve it.



I have tried this with RHEL v6, CentOS v6 and Ubuntu 14.04, the same
behaviour is seen on each. I suspect I must be doing something wrong with
the config, however stripping it down to even the most basic action still
fails which makes me think perhaps something else is happening with this.



Any ideas appreciated.



Tim



Notice:
This message contains information that may be privileged or confidential and is 
the property of Skyscape. It is intended only for the person to whom it is 
addressed. If you are not the intended recipient, you are not authorised to 
read, print, retain, copy, disseminate, distribute, or use this message or any 
part thereof. If you receive this message in error, please notify the sender 
immediately and delete all copies of this message. Skyscape reserves the right 
to monitor all e-mail communications through its networks. Skyscape Cloud 
Services Limited is registered in England and Wales: Company No: 07619797. 
Registered office: Hartham Park, Hartham, Corsham, Wiltshire SN13 0RP.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
______________________________________________________________________
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to