On Fri, Mar 6, 2015 at 8:53 AM, Andrew Ruch <[email protected]> wrote: > On Thu, Mar 5, 2015 at 11:45 AM, Andrew Ruch <[email protected]> wrote: >> On Thu, Mar 5, 2015 at 11:38 AM, David Lang <[email protected]> wrote: >>> What do you mean when you say it's getting corrupted? >>> >>> What is the log message that's being delivered, and what does it look like >>> after the corruption? >>> >>> David Lang >>> >>> >>> On Thu, 5 Mar 2015, Andrew Ruch wrote: >>> >>>> Hello, >>>> >>>> I'm using Rsyslog 8.8.0 on RHEL 6.6 on the sender and receiver. >>>> >>>> I'm experiencing an issue where my disk queue is getting corrupted, >>>> even just sending one message. The Rsyslog server is configured to >>>> receive messages over TCP. The message is then put into a disk queue >>>> before being sent to another local application. When I comment out the >>>> 4 “queue” lines on the action, everything works fine. The remote logs >>>> are received and written to “/var/log/remote.log” successfully. Is >>>> there something else I need to configure for the disk queue to work? >>>> >>>> >>>> Below is a simple configuration file created for debugging this >>>> problem. Also included are some relevant lines from the Rsyslog debug >>>> log. >>>> >>>> >>>> ########################################################################## >>>> >>>> SERVER CONFIG FILE >>>> >>>> ########################################################################## >>>> >>>> # rsyslog v8 configuration file (server) >>>> >>>> #### GLOBAL DIRECTIVES #### >>>> >>>> $DebugLevel 2 >>>> $DebugFile /var/log/rsyslog-debug.log >>>> >>>> $umask 0000 >>>> >>>> # Working directory for spooling and file monitoring >>>> $WorkDirectory /var/lib/rsyslog >>>> >>>> # Set the message size to 32k (up from 2k default) >>>> $MaxMessageSize 32768 >>>> >>>> #### MODULES #### >>>> >>>> # Configure file output defaults >>>> module( load="builtin:omfile" >>>> template="RSYSLOG_TraditionalFileFormat" >>>> fileCreateMode="0600" >>>> ) >>>> >>>> # provides input for tcp connections >>>> module( load="imtcp" >>>> NotifyOnConnectionClose="on" >>>> ) >>>> >>>> #### NETSTREAM DRIVER #### >>>> $DefaultNetstreamDriver ptcp >>>> >>>> #### RULESETS #### >>>> >>>> ruleset(name="remote"){ >>>> >>>> # To file >>>> *.* /var/log/remote.log >>>> >>>> # To local process >>>> action( template="RSYSLOG_SyslogProtocol23Format" >>>> name="remotefwd" >>>> type="omfwd" >>>> action.resumeRetryCount="-1" >>>> target="127.0.0.1" >>>> port="7871" >>>> protocol="tcp" >>>> TCP_Framing="octet-counted" >>>> queue.filename="remotefwd" >>>> queue.size="1000000" >>>> queue.type="Disk" >>>> queue.saveonshutdown="on" >>>> StreamDriver="ptcp" >>>> ) >>>> >>>> } >>>> >>>> #### INPUTS #### >>>> input(type="imtcp" port="10514" ruleset="remote") >>>> >>>> >>>> #### LOCAL RULES #### >>>> >>>> *.* /var/log/messages >>>> >>>> >>>> >>>> >>>> ########################################################################## >>>> >>>> DEBUG LOG >>>> >>>> ########################################################################## >>>> >>>> 2608.731281300:main Q:Reg/w0 : processBATCH: next msg 0: <166>1 >>>> 2015-03-05T16:23:28.729746+00:00 localci-hdf dev - - - test 4d >>>> 2608.731286050:main Q:Reg/w0 : ACTION 0 >>>> [builtin:omfile:/var/log/remote.log] >>>> 2608.731296440:main Q:Reg/w0 : executing action 0 >>>> 2608.731300781:main Q:Reg/w0 : Called action, logging to builtin:omfile >>>> 2608.731321713:main Q:Reg/w0 : action 0 is transactional - executing >>>> in commit phase >>>> 2608.731327093:main Q:Reg/w0 : wti 0x7f697aa7e200: we need to create >>>> a new action worker instance for action 0 >>>> 2608.731331936:main Q:Reg/w0 : Action 0 transitioned to state: itx >>>> 2608.731336015:main Q:Reg/w0 : ACTION 1 >>>> [builtin:omfwd:action(type="builtin:omfwd" ...)] >>>> 2608.731345835:main Q:Reg/w0 : executing action 1 >>>> 2608.731349861:main Q:Reg/w0 : Called action, logging to builtin:omfwd >>>> 2608.731373560:main Q:Reg/w0 : strm 0x7f697aa79d70: file >>>> -1(remotefwd) flush, buflen 575 >>>> 2608.731378696:main Q:Reg/w0 : strmPhysWrite, stream 0x7f697aa79d70, len >>>> 575 >>>> 2608.731440218:main Q:Reg/w0 : file >>>> '/var/lib/rsyslog/remotefwd.00000001' opened as #2 with mode 384 >>>> 2608.731447636:main Q:Reg/w0 : strm 0x7f697aa79d70: opened file >>>> '/var/lib/rsyslog/remotefwd.00000001' for WRITE as 2 >>>> 2608.731457063:main Q:Reg/w0 : strm 0x7f697aa79d70: file 2 write >>>> wrote 575 bytes >>>> 2608.731462184:main Q:Reg/w0 : remotefwd queue: write wrote 575 >>>> octets to disk, queue disk size now 575 octets, EnqOnly:0 >>>> 2608.731466581:main Q:Reg/w0 : remotefwd queue: qqueueAdd: entry >>>> added, size now log 1, phys 1 entries >>>> 2608.731471191:main Q:Reg/w0 : remotefwd queue:Reg: high activity - >>>> starting 1 additional worker thread(s). >>>> 2608.731525047:main Q:Reg/w0 : remotefwd queue:Reg: started with >>>> state 0, num workers now 1 >>>> 2608.731534075:main Q:Reg/w0 : remotefwd queue: EnqueueMsg advised worker >>>> start >>>> 2608.731540956:main Q:Reg/w0 : END batch execution phase, entering to >>>> commit phase >>>> 2608.731553966:main Q:Reg/w0 : actionCommitAll: action 0, state 1, >>>> nbr to commit 1 isTransactional 1 >>>> 2608.731567908:main Q:Reg/w0 : doTransaction: have commitTransaction >>>> IF, using that, pWrkrInfo 0x7f697aa7e2d0 >>>> 2608.731580406:main Q:Reg/w0 : entering >>>> actionCallCommitTransaction(), state: itx, actionNbr 0, nMsgs 1 >>>> 2608.731598707:7f69754f4700: thread created, tid 3379, name >>>> 'rs:remotefwd queue:Reg' >>>> 2608.731642781:remotefwd queue:Reg/w0: wti 0x7f697aa7d4f0: worker starting >>>> 2608.731656561:remotefwd queue:Reg/w0: DeleteProcessedBatch: we >>>> deleted 0 objects and enqueued 0 objects >>>> 2608.731669096:remotefwd queue:Reg/w0: doDeleteBatch: delete batch >>>> from store, new sizes: log 1, phys 1 >>>> 2608.731685925:main Q:Reg/w0 : file stream remote.log params: flush >>>> interval 0, async write 0 >>>> 2608.731711325:main Q:Reg/w0 : omfile: write to stream, pData->pStrm >>>> 0x7f6968003980, lenBuf 40, strt data Mar 5 16:23:28 localci-hdf dev >>>> test 4d >>>> 2608.731720487:main Q:Reg/w0 : strm 0x7f6968003980: file >>>> -1(remote.log) flush, buflen 40 >>>> 2608.731726623:main Q:Reg/w0 : strmPhysWrite, stream 0x7f6968003980, len >>>> 40 >>>> 2608.731743671:main Q:Reg/w0 : file '/var/log/remote.log' opened as >>>> #7 with mode 420 >>>> 2608.731756634:main Q:Reg/w0 : strm 0x7f6968003980: opened file >>>> '/var/log/remote.log' for WRITE as 7 >>>> 2608.731768282:main Q:Reg/w0 : strm 0x7f6968003980: file 7 write wrote 40 >>>> bytes >>>> 2608.731774441:main Q:Reg/w0 : Action 0 transitioned to state: rdy >>>> 2608.731781078:main Q:Reg/w0 : Action 0 transitioned to state: itx >>>> 2608.731786633:main Q:Reg/w0 : Action 0 transitioned to state: rdy >>>> 2608.731791771:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 >>>> 2608.731797654:main Q:Reg/w0 : actionCommitAll: action 2, state 0, >>>> nbr to commit 0 isTransactional 1 >>>> 2608.731807120:main Q:Reg/w0 : processBATCH: batch of 1 elements has >>>> been processed >>>> 2608.731812961:main Q:Reg/w0 : regular consumer finished, iret=0, >>>> szlog 0 sz phys 1 >>>> 2608.731820159:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 >>>> objects and enqueued 0 objects >>>> 2608.731825717:main Q:Reg/w0 : doDeleteBatch: delete batch from >>>> store, new sizes: log 0, phys 0 >>>> 2608.731831642:main Q:Reg/w0 : regular consumer finished, iret=4, >>>> szlog 0 sz phys 0 >>>> 2608.731837955:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for >>>> work. >>>> 2608.731967228:remotefwd queue:Reg/w0: file >>>> '/var/lib/rsyslog/remotefwd.00000001' opened as #6 with mode 384 >>>> 2608.731981761:remotefwd queue:Reg/w0: strm 0x7f697aa7aff0: opened >>>> file '/var/lib/rsyslog/remotefwd.00000001' for READ as 6 >>>> 2608.731989606:remotefwd queue:Reg/w0: strm 0x7f697aa7aff0: file 6 >>>> read 575 bytes >>>> 2608.732006243:remotefwd queue:Reg/w0: remotefwd queue: error -2308 >>>> dequeueing element - ignoring, but strange things may happen >>>> 2608.732011036:remotefwd queue:Reg/w0: regular consumer finished, >>>> iret=-2308, szlog 0 sz phys 1 >>>> 2608.732015422:remotefwd queue:Reg/w0: DeleteProcessedBatch: we >>>> deleted 0 objects and enqueued 0 objects >>>> 2608.732019638:remotefwd queue:Reg/w0: doDeleteBatch: delete batch >>>> from store, new sizes: log 0, phys 1 >>>> 2608.732024016:remotefwd queue:Reg/w0: regular consumer finished, >>>> iret=4, szlog 0 sz phys 1 >>>> 2608.732028228:remotefwd queue:Reg/w0: remotefwd queue:Reg/w0: worker >>>> IDLE, waiting for work. >>>> >>>> >>>> Thanks, >>>> Andrew >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T >>>> LIKE THAT. >>> >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of >>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >>> LIKE THAT. >> >> >> By corrupt I mean the log never leaves the queue. The text of the log >> message that corresponds to the debug log I included is "test 4d". The >> debug log shows this error message towards the bottom: >> >> 2608.732006243:remotefwd queue:Reg/w0: remotefwd queue: error -2308 >> dequeueing element - ignoring, but strange things may happen >> >> >> Something bad is happening when the message is read from the queue. >> >> >> Thanks, >> Andrew > > > In doing some more research, it appears that local messages work just > fine when being sent to the disk queue. (Note that I changed the > server configuration slightly to allow for local messages to be sent > to the "remote" action.) > > Here are the steps: > > 1) Call "logger" from local server. This log message is successfully > passed through the queue and sent to my local application using omfwd. > > RSYSLOG DEBUG LOGS: > > 4981.388342088:main Q:Reg/w0 : remotefwd queue: write wrote 395 > octets to disk, queue disk size now 790 octets, EnqOnly:0 > 4981.388462240:remotefwd queue:Reg/w0: strm 0x7f4faae85050: file 2 > read 395 bytes > > 2) Call “logger” from remote client. This log message causes an error > in the queue. > > RSYSLOG DEBUG LOGS: > > 5000.069186937:main Q:Reg/w0 : remotefwd queue: write wrote 572 > octets to disk, queue disk size now 1362 octets, EnqOnly:0 > 5000.069278979:remotefwd queue:Reg/w0: strm 0x7f4faae85050: file 2 > read 572 bytes > 5000.069308645:remotefwd queue:Reg/w0: remotefwd queue: error -2308 > dequeueing element - ignoring, but strange things may happen > > 3) Call “logger” from local server again. This log message is successful. > > RSYSLOG DEBUG LOGS: > > 5033.404190090:main Q:Reg/w0 : remotefwd queue: write wrote 395 > octets to disk, queue disk size now 1757 octets, EnqOnly:0 > 5033.404195439:main Q:Reg/w0 : remotefwd queue: qqueueAdd: entry > added, size now log 1, phys 2 entries > 5033.404273139:remotefwd queue:Reg/w0: doDeleteBatch: delete batch > from store, new sizes: log 1, phys 2 > 5033.404279061:remotefwd queue:Reg/w0: objDeserialize error -2029 > during header processing - trying to recover > 5033.404348551:remotefwd queue:Reg/w0: strm 0x7f4faae85050: file 2 > read 395 bytes > 5033.404362158:remotefwd queue:Reg/w0: deserializer has possibly been > able to re-sync and recover, state 0 > > > > To me, this implies that something is different about the message > being written to disk when it originates from imtcp. However, it still > works to write this message to file, so I think everything is working > as far as the TCP transfer of the message. > > Andrew
The MsgSerialize and MsgDeserialize functions are out of sync in msg.c. They label the structured data field differently and expect it in a different order. When I fix the name and fix the ordering in MsgDeserialize, my systems work fine. I created a bug (#264). _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

