Bump, Rainer, what's your thoughts on what I've posted above? Also, it seems I understand why rsyslog would drop messages, but for our use case it's better to have duplicates than drop them (we want at least once semantics), is it possible to configure rsyslog that way (or add this functionality)?
Thanks! On Tue, Mar 29, 2016 at 10:46 AM, Kane Kim <[email protected]> wrote: > Updated config to: > module(load="omrelp") > local0.* action(type="omrelp" > target="10.0.0.1" > port="2514" > name="debug_relp" > queue.filename="relp_queue" > queue.maxdiskspace="1G" > queue.saveonshutdown="on" > queue.spoolDirectory="/var/spool/rsyslog" > queue.type="Disk" > action.resumeRetryCount="-1" > action.resumeInterval="1" > ) > > Still message is lost: > > 3489.658349703:imuxsock.c : Message from UNIX socket: #3 > 3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size now > log 1, phys 1 entries > 3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker start > 3489.658374985:imuxsock.c : --------imuxsock calling select, active > file descriptors (max 3): 3 > 3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle > processing > 3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects > and enqueued 0 objects > 3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from store, > new sizes: log 1, phys 1 > 3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements must be > processed > 3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29 > 17:44:49 logger: test > 3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*' > 3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X X > X X X X X X FF X X X X X X X X X > 3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1 > 3489.658414686:main Q:Reg/w0 : ACTION 0 [omrelp:action(type="omrelp" > ...)] > 3489.658417648:main Q:Reg/w0 : executing action 0 > 3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging to > omrelp (susp 0/0, direct q 0) > 3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue) flush, > buflen 410 > 3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len 410 > 3489.658484243:main Q:Reg/w0 : file > '/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384 > 3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file > '/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5 > 3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote 410 > bytes > 3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410 octets > to disk, queue disk size now 410 octets, EnqOnly:0 > 3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry added, > size now log 1, phys 1 entries > 3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity - > starting 1 additional worker thread(s). > 3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with state > 0, num workers now 1 > 3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised > worker start > 3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended state > to 0 > 3489.658533672:main Q:Reg/w0 : PRIFILT > '*.info;mail.none;authpriv.none;cron.none' > 3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X > X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F > 3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1 > 3489.658553393:main Q:Reg/w0 : ACTION 1 > [builtin:omfile:/var/log/messages] > 3489.658556301:main Q:Reg/w0 : executing action 1 > 3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to > builtin:omfile (susp 0/0, direct q 1) > 3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional - > executing in commit phase > 3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx > 3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended state to 0 > 3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*' > 3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X X > FF X X X X X X X X X X X X X X X > 3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*' > 3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X X > X X X X X X X X X X X X X X X X > 3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*' > 3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X FF > X X X X X X X X X X X X X X X X > 3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg' > 3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' > 3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X > X X X X X X X X X X X X X X X X > 3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*' > 3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X X > X X X X X X X X X X X X X FF X X > 3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0 > 3489.658675686:main Q:Reg/w0 : END batch execution phase, entering to > commit phase [processed 1 of 1 messages] > 3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to > commit 0 isTransactional 1 > 3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction IF, > using that, pWrkrInfo 0x23c8ce0 > 3489.658680668:main Q:Reg/w0 : entering actionCallCommitTransaction(), > state: itx, actionNbr 1, nMsgs 1 > 3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm > 0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0 logger: > test > > 3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages) > flush, buflen 42 > 3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len > 42 > 3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote 42 > bytes > 3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy > 3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx > 3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy > 3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 > 3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has been > processed > 3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 > sz phys 1 > 3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects > and enqueued 0 objects > 3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from store, > new sizes: log 0, phys 0 > 3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 > sz phys 0 > 3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for > work. > 3489.660013052:7f1efbb14700: thread created, tid 30310, name > 'rs:debug_relp queue:Reg' > 3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker starting > 3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 0 > objects and enqueued 0 objects > 3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from > store, new sizes: log 1, phys 1 > 3489.660947418:debug_relp queue:Reg/w0: file > '/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384 > 3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file > '/var/spool/rsyslog/relp_queue.00000001' for READ as 6 > 3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read 410 > bytes > 3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is > transactional - executing in commit phase > 3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to create a > new action worker instance for action 0 > 3489.661789583:debug_relp queue:Reg/w0: relp engine created new client > 0x7f1ef00027e0 > 3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action > worker instance 1 for action 0 > 3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction > ^C3497.710058579:main thread : DoDie called. > 3497.710272055:main thread : exiting on signal 2 > 3497.710576430:main thread : Terminating input threads... > 3497.710765753:main thread : request term via SIGTTIN for input thread > 'imuxsock' 0xfcb16700 > 3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock - > 0x7f1efcb16700 returned with iRet 0, exiting now. > 3497.711447372:main thread : non-cancel input thread termination > succeeded for thread imuxsock 0xfcb16700 > 3497.713028584:imuxsock.c : destructor for debug call stack > 0x7f1eec0008c0 called > 3497.713390963:main thread : imuxsock: unlinking unix socket file[0] > /dev/log > 3497.713734513:main thread : main Q: qqueueAdd: entry added, size now > log 1, phys 1 entries > 3497.713906635:main thread : main Q: EnqueueMsg advised worker start > 3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle > processing > 3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects > and enqueued 0 objects > 3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from store, > new sizes: log 1, phys 1 > 3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements must be > processed > 3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin > software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info=" > http://www.rsyslog.com"] exiting on signal 2. > 3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*' > 3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X X > X X X X X X FF X X X X X X X X X > 3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.716337713:main Q:Reg/w0 : PRIFILT > '*.info;mail.none;authpriv.none;cron.none' > 3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X > X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F > 3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1 > 3497.716429241:main Q:Reg/w0 : ACTION 1 > [builtin:omfile:/var/log/messages] > 3497.716439928:main Q:Reg/w0 : executing action 1 > 3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to > builtin:omfile (susp 0/0, direct q 1) > 3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional - > executing in commit phase > 3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx > 3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended state to 0 > 3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*' > 3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X X > FF X X X X X X X X X X X X X X X > 3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*' > 3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X X > X X X X X X X X X X X X X X X X > 3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*' > 3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X FF > X X X X X X X X X X X X X X X X > 3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg' > 3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' > 3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X > X X X X X X X X X X X X X X X X > 3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*' > 3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X X > X X X X X X X X X X X X X FF X X > 3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0 > 3497.718842803:main Q:Reg/w0 : END batch execution phase, entering to > commit phase [processed 1 of 1 messages] > 3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to > commit 0 isTransactional 1 > 3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction IF, > using that, pWrkrInfo 0x23c8ce0 > 3497.718857172:main Q:Reg/w0 : entering actionCallCommitTransaction(), > state: itx, actionNbr 1, nMsgs 1 > 3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm > 0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0 > rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master" > x-pid="30306" x-info="http://www.rs > 3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages) > flush, buflen 161 > 3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len > 161 > 3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote > 161 bytes > 3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy > 3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx > 3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy > 3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 > 3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has been > processed > 3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 > sz phys 1 > 3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects > and enqueued 0 objects > 3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from store, > new sizes: log 0, phys 0 > 3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 > sz phys 0 > 3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for > work. > 3497.718942403:main thread : Terminating main queue... > 3497.718946280:main thread : main Q: initiating worker thread shutdown > sequence > 3497.718950170:main thread : main Q: trying shutdown of regular workers > 3497.718957067:main thread : sent SIGTTIN to worker thread 0xfc315700 > 3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker > thread termination, 1 still running > 3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle > processing > 3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects > and enqueued 0 objects > 3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from store, > new sizes: log 0, phys 0 > 3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 > sz phys 0 > 3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker > terminateRet=5, bInactivityTOOccured=0 > 3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup action > instances > 3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil) > 3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr 0x7f1ef40022a0 > 3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil) > 3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil) > 3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil) > 3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil) > 3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil) > 3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil) > 3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting > 3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0, > terminated, num workers now 0 > 3497.719090631:main thread : main Q: regular queue workers shut down. > 3497.719097597:main thread : main Q: checking to see if we need to > cancel any worker threads of the primary queue > 3497.719101335:main thread : main Q: worker threads terminated, > remaining queue size log 0, phys 0. > 3497.719111115:main thread : main Q: queue (type 0) will lose 0 > messages, destroying... > 3497.719501847:main Q:Reg/w0 : destructor for debug call stack > 0x7f1ef40008c0 called > 3497.719547524:main thread : Terminating outputs... > 3497.719553777:main thread : destructing ruleset 0x23ab150, name > 0x23ab190 > 3497.719558276:main thread : debug_relp queue: initiating worker thread > shutdown sequence > 3497.719561940:main thread : debug_relp queue: trying shutdown of > regular workers > 3497.719568068:main thread : sent SIGTTIN to worker thread 0xfbb14700 > 3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on > worker thread termination, 1 still running > 3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 > 3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to state: > rtry > 3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0, > currIParam 1 > 3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter > loop, iRetries=0 > 3497.719677407:main thread : debug_relp queue:Reg: timeout waiting on > worker thread termination > 3497.719677554:main thread : sent SIGTTIN to worker thread 0xfbb14700 > 3497.719677725:main thread : debug_relp queue: regular shutdown timed > out on primary queue (this is OK) > 3497.719677897:main thread : debug_relp queue: trying to shutdown > workers within Action Timeoutsetting EnqOnly mode > 3497.719678191:main thread : debug_relp queue: trying immediate > shutdown of regular workers (if any) > 3497.719678362:main thread : sent SIGTTIN to worker thread 0xfbb14700 > 3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms on > worker thread termination, 1 still running > 3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 > 3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp > action->tryResume returned -2007 > 3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check > for max retries, iResumeRetryCount -1, iRetries 0 > 3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action 0x23bebf0 > state: rtry, next retry (if applicable): 0 [now 1459273498] > 3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet > -2007 > 3498.721034552:debug_relp queue:Reg/w0: regular consumer finished, > iret=-2007, szlog 0 sz phys 1 > 3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 1 > objects and enqueued 0 objects > 3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from > store, new sizes: log 0, phys 0 > 3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating worker > because of TERMINATE_NOW mode, del iRet 0 > 3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker > cleanup action instances > 3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr > 0x7f1ef00020d0 > 3498.722431811:debug_relp queue:Reg/w0: relp engine destructing client > 0x7f1ef00027e0 > 3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr (nil) > 3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr (nil) > 3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr (nil) > 3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr (nil) > 3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr (nil) > 3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr (nil) > 3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr (nil) > 3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting > 3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker > thread 23c78b0, terminated, num workers now 0 > 3498.724243673:debug_relp queue:Reg/w0: destructor for debug call stack > 0x7f1ef00008c0 called > 3498.724371931:main thread : debug_relp queue: checking to see if we > need to cancel any worker threads of the primary queue > 3498.724379712:main thread : debug_relp queue: worker threads > terminated, remaining queue size log 0, phys 0. > 3498.724386288:main thread : debug_relp queue: persisting queue to > disk, 0 entries... > 3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue) closing > 3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue) flush, > buflen 0 (no need to flush) > 3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue) closing > 3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue) closing > 3498.724423956:main thread : strmCloseFile: deleting > '/var/spool/rsyslog/relp_queue.00000001' > 3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages) > closing > 3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages) > flush, buflen 0 (no need to flush) > 3498.724731690:main thread : all primary multi-thread sources have been > terminated - now doing aux cleanup... > 3498.724737248:main thread : destructing current config... > 3498.724741760:main thread : calling freeCnf(0x23a91e0) for module > 'builtin:omfile' > 3498.724745773:main thread : calling freeCnf(0x23a93a0) for module > 'builtin:ompipe' > 3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module > 'builtin:omfwd' > 3498.724754588:main thread : calling freeCnf(0x23b9b00) for module > 'imuxsock' > 3498.724758956:main thread : calling freeCnf(0x23bac70) for module > 'imklog' > 3498.724807014:main thread : destructing parser 'rsyslog.rfc5424' > 3498.724871141:main thread : destructing parser 'rsyslog.rfc3164' > 3498.724876440:main thread : pmrfc3164: free parser instance 0x23aa8b0 > 3498.724887558:main thread : file rsyslogd.c released module 'lmnet', > reference count now 2 > 3498.724892004:main thread : module lmnet NOT unloaded because it still > has a refcount of 2 > 3498.724895745:main thread : Unloading module builtin:omfile > 3498.724900191:main thread : module lmnet NOT unloaded because it still > has a refcount of 2 > 3498.724904194:main thread : Unloading module builtin:ompipe > 3498.724908035:main thread : module lmnet NOT unloaded because it still > has a refcount of 2 > 3498.724911707:main thread : Unloading module builtin-shell > 3498.725044125:main thread : module lmnet NOT unloaded because it still > has a refcount of 2 > 3498.725050401:main thread : Unloading module builtin:omdiscard > 3498.725054843:main thread : module lmnet NOT unloaded because it still > has a refcount of 2 > 3498.725058772:main thread : Unloading module builtin:omfwd > 3498.725064464:main thread : file omfwd.c released module 'lmnet', > reference count now 1 > 3498.725068747:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725072460:main thread : Unloading module builtin:omusrmsg > 3498.725076324:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725080023:main thread : Unloading module builtin:pmrfc5424 > 3498.725084039:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725087646:main thread : Unloading module builtin:pmrfc3164 > 3498.725091651:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725095255:main thread : Unloading module builtin:smfile > 3498.725099010:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725102601:main thread : Unloading module builtin:smtradfile > 3498.725106419:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725110003:main thread : Unloading module builtin:smfwd > 3498.725113689:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725117357:main thread : Unloading module builtin:smtradfwd > 3498.725121132:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725124782:main thread : Unloading module imuxsock > 3498.725157459:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.725162797:main thread : Unloading module imklog > 3498.725179208:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.726112994:main thread : Unloading module omrelp > 3498.726184405:main thread : module lmnet NOT unloaded because it still > has a refcount of 1 > 3498.726189821:main thread : module lmregexp NOT unloaded because it > still has a refcount of 1 > 3498.726195161:main thread : file conf.c released module 'lmnet', > reference count now 0 > 3498.726198620:main thread : module 'lmnet' has zero reference count, > unloading... > 3498.726201936:main thread : Unloading module lmnet > 3498.726218655:main thread : rsyslog runtime de-initialized, current > users 0 > 3498.726222660:main thread : Clean shutdown completed, bye > > > On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]> wrote: > >> Hello Reiner, >> >> I'm seeing the same behavior with omrelp module. I've configured omrelp >> to non-routable ip address: >> module(load="omrelp") >> local0.* action(type="omrelp" >> target="10.0.0.1" >> port="2514" >> name="debug_relp" >> queue.filename="relp_queue" >> queue.maxdiskspace="1G" >> queue.saveonshutdown="on" >> queue.spoolDirectory="/var/spool/rsyslog" >> queue.type="Disk" >> ) >> Then I try to log a line: >> [root@314959ac8cb0 prism]# logger -p local0.info test >> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/ >> relp_queue.00000001 >> # I shutdown rsyslog at this point >> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/ >> # after shutdown queue is deleted and message is lost >> [root@314959ac8cb0 prism]# >> >> Here is rsyslog log, so omrelp starts transaction and then I shutdown >> rsyslog, it seems at some point queue is deleted: >> >> 8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction >> ^C8194.245122305:main thread : DoDie called. >> 8194.245301525:main thread : exiting on signal 2 >> 8194.245532709:main thread : Terminating input threads... >> 8194.245873683:main thread : request term via SIGTTIN for input thread >> 'imuxsock' 0xcb767700 >> 8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock - >> 0x7f9bcb767700 returned with iRet 0, exiting now. >> 8194.247556671:main thread : non-cancel input thread termination >> succeeded for thread imuxsock 0xcb767700 >> 8194.249357547:imuxsock.c : destructor for debug call stack >> 0x7f9bbc0008c0 called >> 8194.249784271:main thread : imuxsock: unlinking unix socket file[0] >> /dev/log >> 8194.250043368:main thread : main Q: qqueueAdd: entry added, size now >> log 1, phys 1 entries >> 8194.250321193:main thread : main Q: EnqueueMsg advised worker start >> 8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle >> processing >> 8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 >> objects and enqueued 0 objects >> 8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 1, phys 1 >> 8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements must be >> processed >> 8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin >> software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info=" >> http://www.rsyslog.com"] exiting on signal 2. >> 8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*' >> 8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X FF X X X X X X X X X >> 8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.252902083:main Q:Reg/w0 : PRIFILT >> '*.info;mail.none;authpriv.none;cron.none' >> 8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X >> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F >> 8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1 >> 8194.252959849:main Q:Reg/w0 : ACTION 1 >> [builtin:omfile:/var/log/messages] >> 8194.252967510:main Q:Reg/w0 : executing action 1 >> 8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging to >> builtin:omfile (susp 0/0, direct q 1) >> 8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional - >> executing in commit phase >> 8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended state to >> 0 >> 8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*' >> 8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X X X >> FF X X X X X X X X X X X X X X X >> 8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*' >> 8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X X X >> X X X X X X X X X X X X X X X X >> 8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*' >> 8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X X FF >> X X X X X X X X X X X X X X X X >> 8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg' >> 8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 >> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> 8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' >> 8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X >> X X X X X X X X X X X X X X X X >> 8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*' >> 8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X X X X X X X X FF X X >> 8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0 >> 8194.254450651:main Q:Reg/w0 : END batch execution phase, entering to >> commit phase [processed 1 of 1 messages] >> 8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr >> to commit 0 isTransactional 1 >> 8194.254461646:main Q:Reg/w0 : doTransaction: have commitTransaction IF, >> using that, pWrkrInfo 0x1c13cf0 >> 8194.254466238:main Q:Reg/w0 : entering actionCallCommitTransaction(), >> state: itx, actionNbr 1, nMsgs 1 >> 8194.254471876:main Q:Reg/w0 : omfile: write to stream, pData->pStrm >> 0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0 >> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master" >> x-pid="30298" x-info="http://www.rs >> 8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4(messages) >> flush, buflen 161 >> 8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream 0x7f9bc4002480, len >> 161 >> 8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write wrote >> 161 bytes >> 8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 >> 8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements has >> been processed >> 8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0, szlog >> 0 sz phys 1 >> 8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 >> objects and enqueued 0 objects >> 8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 0, phys 0 >> 8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4, szlog >> 0 sz phys 0 >> 8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for >> work. >> 8194.254556418:main thread : Terminating main queue... >> 8194.254560776:main thread : main Q: initiating worker thread shutdown >> sequence >> 8194.254564653:main thread : main Q: trying shutdown of regular workers >> 8194.254572961:main thread : sent SIGTTIN to worker thread 0xcaf66700 >> 8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker >> thread termination, 1 still running >> 8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle >> processing >> 8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 >> objects and enqueued 0 objects >> 8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 0, phys 0 >> 8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4, szlog >> 0 sz phys 0 >> 8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker >> terminateRet=5, bInactivityTOOccured=0 >> 8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup >> action instances >> 8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil) >> 8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr >> 0x7f9bc40022a0 >> 8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil) >> 8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil) >> 8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil) >> 8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil) >> 8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil) >> 8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil) >> 8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting >> 8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0, >> terminated, num workers now 0 >> 8194.254656279:main thread : main Q: regular queue workers shut down. >> 8194.254656402:main thread : main Q: checking to see if we need to >> cancel any worker threads of the primary queue >> 8194.254656524:main thread : main Q: worker threads terminated, >> remaining queue size log 0, phys 0. >> 8194.254656647:main thread : main Q: queue (type 0) will lose 0 >> messages, destroying... >> 8194.255021761:main Q:Reg/w0 : destructor for debug call stack >> 0x7f9bc40008c0 called >> 8194.255069521:main thread : Terminating outputs... >> 8194.255084818:main thread : destructing ruleset 0x1bf6150, name >> 0x1bf6190 >> 8194.255089525:main thread : debug_relp queue: initiating worker >> thread shutdown sequence >> 8194.255093230:main thread : debug_relp queue: trying shutdown of >> regular workers >> 8194.255099210:main thread : sent SIGTTIN to worker thread 0xca765700 >> 8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on >> worker thread termination, 1 still running >> 8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >> 8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to state: >> rtry >> 8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0, >> currIParam 1 >> 8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter >> loop, iRetries=0 >> 8194.255210939:main thread : debug_relp queue:Reg: timeout waiting on >> worker thread termination >> 8194.255211086:main thread : sent SIGTTIN to worker thread 0xca765700 >> 8194.255211257:main thread : debug_relp queue: regular shutdown timed >> out on primary queue (this is OK) >> 8194.255211429:main thread : debug_relp queue: trying to shutdown >> workers within Action Timeoutsetting EnqOnly mode >> 8194.255215492:main thread : debug_relp queue: trying immediate >> shutdown of regular workers (if any) >> 8194.255221059:main thread : sent SIGTTIN to worker thread 0xca765700 >> 8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms on >> worker thread termination, 1 still running >> 8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >> 8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp >> action->tryResume returned -2007 >> 8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check >> for max retries, iResumeRetryCount 0, iRetries 0 >> 8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to state: >> susp >> 8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action >> 'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016 >> 8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not submit >> message - queue does (no longer?) exist - ignored >> 8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp' suspended, >> earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration 30 >> 8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action 0x1c09af0 >> state: susp, next retry (if applicable): 1459208224 [now 1459208194] >> 8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet >> -2123 >> 8194.255282409:debug_relp queue:Reg/w0: regular consumer finished, >> iret=-2123, szlog 0 sz phys 1 >> 8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted >> 1 objects and enqueued 0 objects >> 8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from >> store, new sizes: log 0, phys 0 >> 8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating worker >> because of TERMINATE_NOW mode, del iRet 0 >> 8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker >> cleanup action instances >> 8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr >> 0x7f9bc00020d0 >> 8194.255305929:debug_relp queue:Reg/w0: relp engine destructing client >> 0x7f9bc00027e0 >> 8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr (nil) >> 8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr (nil) >> 8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr (nil) >> 8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr (nil) >> 8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr (nil) >> 8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr (nil) >> 8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr (nil) >> 8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker exiting >> 8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker >> thread 1c12880, terminated, num workers now 0 >> 8194.255344540:debug_relp queue:Reg/w0: destructor for debug call stack >> 0x7f9bc00008c0 called >> 8194.255371534:main thread : debug_relp queue: checking to see if we >> need to cancel any worker threads of the primary queue >> 8194.255376798:main thread : debug_relp queue: worker threads >> terminated, remaining queue size log 0, phys 0. >> 8194.255381256:main thread : debug_relp queue: persisting queue to >> disk, 0 entries... >> 8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue) closing >> 8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue) flush, >> buflen 0 (no need to flush) >> 8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue) closing >> 8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue) >> closing >> 8194.255409145:main thread : strmCloseFile: deleting >> '/var/spool/rsyslog/relp_queue.00000001' >> 8194.255460812:main thread : strm 0x7f9bc4002480: file 4(messages) >> closing >> 8194.255466309:main thread : strm 0x7f9bc4002480: file 4(messages) >> flush, buflen 0 (no need to flush) >> 8194.255479325:main thread : all primary multi-thread sources have >> been terminated - now doing aux cleanup... >> 8194.255482809:main thread : destructing current config... >> 8194.255486662:main thread : calling freeCnf(0x1bf41e0) for module >> 'builtin:omfile' >> 8194.255490098:main thread : calling freeCnf(0x1bf43a0) for module >> 'builtin:ompipe' >> 8194.255493809:main thread : calling freeCnf(0x1bf53a0) for module >> 'builtin:omfwd' >> 8194.255497793:main thread : calling freeCnf(0x1c04b00) for module >> 'imuxsock' >> 8194.255501565:main thread : calling freeCnf(0x1c05c70) for module >> 'imklog' >> 8194.255543579:main thread : destructing parser 'rsyslog.rfc5424' >> 8194.255547468:main thread : destructing parser 'rsyslog.rfc3164' >> 8194.255550784:main thread : pmrfc3164: free parser instance 0x1bf58b0 >> 8194.255559659:main thread : file rsyslogd.c released module 'lmnet', >> reference count now 2 >> 8194.255563470:main thread : module lmnet NOT unloaded because it >> still has a refcount of 2 >> 8194.255566676:main thread : Unloading module builtin:omfile >> 8194.255570420:main thread : module lmnet NOT unloaded because it >> still has a refcount of 2 >> 8194.255573674:main thread : Unloading module builtin:ompipe >> 8194.255576857:main thread : module lmnet NOT unloaded because it >> still has a refcount of 2 >> 8194.255579938:main thread : Unloading module builtin-shell >> 8194.255583060:main thread : module lmnet NOT unloaded because it >> still has a refcount of 2 >> 8194.255586144:main thread : Unloading module builtin:omdiscard >> 8194.255589509:main thread : module lmnet NOT unloaded because it >> still has a refcount of 2 >> 8194.255592658:main thread : Unloading module builtin:omfwd >> 8194.255596441:main thread : file omfwd.c released module 'lmnet', >> reference count now 1 >> 8194.255599981:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255603068:main thread : Unloading module builtin:omusrmsg >> 8194.255606240:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255609287:main thread : Unloading module builtin:pmrfc5424 >> 8194.255612755:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255615826:main thread : Unloading module builtin:pmrfc3164 >> 8194.255619252:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255622315:main thread : Unloading module builtin:smfile >> 8194.255625406:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255628486:main thread : Unloading module builtin:smtradfile >> 8194.255631753:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255634819:main thread : Unloading module builtin:smfwd >> 8194.255637916:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255640987:main thread : Unloading module builtin:smtradfwd >> 8194.255644187:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255647267:main thread : Unloading module imuxsock >> 8194.255676232:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255680751:main thread : Unloading module imklog >> 8194.255694428:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255698085:main thread : Unloading module omrelp >> 8194.255750953:main thread : module lmnet NOT unloaded because it >> still has a refcount of 1 >> 8194.255755236:main thread : module lmregexp NOT unloaded because it >> still has a refcount of 1 >> 8194.255759705:main thread : file conf.c released module 'lmnet', >> reference count now 0 >> 8194.255762875:main thread : module 'lmnet' has zero reference count, >> unloading... >> 8194.255765901:main thread : Unloading module lmnet >> 8194.255780970:main thread : rsyslog runtime de-initialized, current >> users 0 >> 8194.255784704:main thread : Clean shutdown completed, bye >> >> >> >> On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote: >> >>> On Sun, 13 Mar 2016, Kane Kim wrote: >>> >>> +1to David, having module handle the errors is not 100% reliable, what if >>>> module dies before having a chance to save error log, those messages >>>> will >>>> be lost. >>>> >>> >>> well, the module is a thread of rsyslog, so if it dies, rsyslog has died. >>> >>> But if you are doing large batches (hundreds or thousands), putting all >>> of them in a 'bad messages' file because one has a problem seems like the >>> wrong thing to do. >>> >>> given that the error is not detected until the batch is being flushed >>> for the output, any outputs that have already flushed their batches can't >>> be changed, so the question is if it's worse to have messages resent to >>> those outputs as the batch gets re-processed, or having to have the output >>> go to a "couldn't deliver" file. >>> >>> It's bad enough if the problem is a corrupt message that can never be >>> delivered and the error is a permanent one, but if the problem is instead a >>> temporary error (server is down), you will have a LOT of messages going >>> into this 'couldn't deliver' bucket, when what should really be happening >>> is that rsyslog should be pausing in the processing of messages, retrying >>> after the suspended timeout. >>> >>> David Lang >>> >>> >>> On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote: >>>> >>>> On Sun, 13 Mar 2016, Rainer Gerhards wrote: >>>>> >>>>> Rainer, can you explain what happens when a batch fails on an action? >>>>> >>>>>> >>>>>>> >>>>>> It depends on when what happens and if we have batches of one or >>>>>> batches of many. For batches of one, the failure is immediately >>>>>> detected and acted upon. For batches of many,all action strings or >>>>>> stored temporarily. During that time, it is assume that the messages >>>>>> could be processed. There is no way around this assumption, because we >>>>>> now run each action completely through the rule engine before the next >>>>>> one is begun to process, which means that other parts of the config >>>>>> have actually acted on a message, which means we cannot undo that >>>>>> state. If is needed to act on the "real" state, we need to process >>>>>> batches of one. There simply is no other solution possible. So let's >>>>>> stay with batches of many. Once the batch is completed, the temporary >>>>>> strings are passed to the output's commitTransaction entry point. It >>>>>> is currently assumed that the output does inteligently handle the >>>>>> case, and as it looks most do this well (no regression tests failed, >>>>>> no bad reports since we introduced this with the initial v8 version >>>>>> (15+ month ago). If the action fails, status is reflected accordingly. >>>>>> Also, the core engine *should* (but obviously does not support this >>>>>> right now) write an error file with the messages in error (just like >>>>>> omelasticsearch does, which has proven to be quite good). >>>>>> >>>>>> >>>>> well, it's not in the specs that modules should have local error files >>>>> the >>>>> way omelasticsearch does. >>>>> >>>>> What I would expect if there is a problem when commiting the batch is >>>>> that >>>>> the entire batch would be marked as not delivered and try again with >>>>> half >>>>> of it. >>>>> >>>>> This will cause messages to other outputs working from the same queue >>>>> to >>>>> get duplciate messages, but it's better than loosing an entire batch of >>>>> messages because one is corrupt in some way. or there is a temporary >>>>> problem with the output. >>>>> >>>>> David Lang >>>>> >>>>> I have managed to look at least briefly at the commitTransaction core >>>>> >>>>>> engine code. I think we can even improve it and may get some ability >>>>>> to do retries if the action itself does not handle that. At the >>>>>> minimum, it should have the ability (which was intended originally, >>>>>> but pushed away by more urgent work) to write that error file. >>>>>> >>>>>> I am very grateful that Kane is working on a repro with the standard >>>>>> modules. Once we have that, I can go deeper into the code and check >>>>>> what I am thinking about. It might be possible that we cannot manage >>>>>> to repro this with the standard modules. Then we need to think about a >>>>>> way via the testing module to do the same thing. >>>>>> >>>>>> I will try to address this within the 8.18 timeframe, but cannot >>>>>> totally commit to it. There has a lot of work been going on, and >>>>>> especially all that CI testing has taken up lots of works. >>>>>> Nevertheless, I've been able to iron out quite some nits, so it's >>>>>> probably worth it. I would still like to continue to get the CI >>>>>> environment into better shape, because now I know what I am doing. If >>>>>> I stop again and let it go for a couple of weeks, I would lose a lot >>>>>> of momentum. I have also do to some work for support customers, which >>>>>> keeps me able to work on rsyslog in general, so this, too, has >>>>>> priority (and validly has so because without support customers I would >>>>>> be totally unable to work on rsyslog other than as a hobby) >>>>>> >>>>>> I hope this explains the current state. >>>>>> Rainer >>>>>> >>>>>> >>>>>> Kane, can you try and duplicate your problem using standard modules? >>>>>>> I'm >>>>>>> thinking that if you define a omrelp action that doesn't go >>>>>>> anywhere, it >>>>>>> should return errors similar to what you are trying to do, so you >>>>>>> should >>>>>>> be >>>>>>> able to duplicate the problem that way. >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 23 Feb 2016, Kane Kim wrote: >>>>>>> >>>>>>> Looking at omkafka module source code it seems that it relies on >>>>>>> rsyslog >>>>>>> >>>>>>>> retries in DoAction, returning RS_RET_SUSPENDED: >>>>>>>> >>>>>>>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet); >>>>>>>> if(iRet != RS_RET_OK) { >>>>>>>> iRet = RS_RET_SUSPENDED; >>>>>>>> } >>>>>>>> >>>>>>>> I've tried similar code in DoAction, it seems that action processing >>>>>>>> is suspended at that point and messages are not retried to module. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Thanks for your help Rainer, I'll try to debug what's going on, so >>>>>>>> far >>>>>>>> >>>>>>>>> it >>>>>>>>> seems rsyslog doesn't retry even with batch size 1. It doesn't >>>>>>>>> retry >>>>>>>>> if I >>>>>>>>> return RS_RET_SUSPENDED from DoAction as well. >>>>>>>>> >>>>>>>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards >>>>>>>>> <[email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote: >>>>>>>>>>> >>>>>>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some >>>>>>>>>>>>> context: >>>>>>>>>>>>> I >>>>>>>>>>>>> want >>>>>>>>>>>>> to write module that both using batching and also can't loose >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> messages in >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> any circumstances. Are you saying it is by design that rsyslog >>>>>>>>>>>>> can't >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> do >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> that together? According to documentation rsyslog will retry if >>>>>>>>>>>>> module >>>>>>>>>>>>> returns any error. Do you plan to fix this in rsyslog or update >>>>>>>>>>>>> documentation to say batching and retries don't work? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It depends on many things. In almost all cases, the retry should >>>>>>>>>>>> work >>>>>>>>>>>> well (and does so in practice). Unfortunately, I am pretty >>>>>>>>>>>> swamped. >>>>>>>>>>>> I >>>>>>>>>>>> need to go to a conference tomorrow and have had quite some >>>>>>>>>>>> unexpected >>>>>>>>>>>> work today. It would probably be good if you could ping me next >>>>>>>>>>>> week >>>>>>>>>>>> to see if we can look into more details what is causing you >>>>>>>>>>>> pain. >>>>>>>>>>>> But >>>>>>>>>>>> I can't guarantee that I will be available early next week. >>>>>>>>>>>> >>>>>>>>>>>> In general, we cannot handle a fatal error here from an engine >>>>>>>>>>>> PoV, >>>>>>>>>>>> because everything is already processed and we do no longer >>>>>>>>>>>> have the >>>>>>>>>>>> original messages. This is simply needed if you want to process >>>>>>>>>>>> messages one after another through the full config (a goal for >>>>>>>>>>>> v8 >>>>>>>>>>>> that >>>>>>>>>>>> was muuuuch requested). As I said, the solution is to use >>>>>>>>>>>> batches of >>>>>>>>>>>> one, because otherwise we would really need to turn back time >>>>>>>>>>>> and >>>>>>>>>>>> undo >>>>>>>>>>>> everything that was already done on the messages in question by >>>>>>>>>>>> other >>>>>>>>>>>> modules (including state advances). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I thought that if a batch failed, it pushed all the messages >>>>>>>>>>> back on >>>>>>>>>>> the >>>>>>>>>>> queue and retried with a half size batch until it got to the one >>>>>>>>>>> message >>>>>>>>>>> that could not be processed and only did a fatal fail on that >>>>>>>>>>> message. >>>>>>>>>>> >>>>>>>>>>> Now, there is a big difference between a module giving a hard >>>>>>>>>>> error >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> "this >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> message is never going to be able to be processed no matter how >>>>>>>>>>> many >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> times >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> it's retried" vs a soft error "there is a problem delivering >>>>>>>>>>> things >>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> this >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> destination right now, retry later". I thought the batch >>>>>>>>>>> processing >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> handled >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> these differently. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> That's no longer possible with v8, at least generically. As I said >>>>>>>>>> above, we would need to turn back time. >>>>>>>>>> >>>>>>>>>> But I really run out of time now... >>>>>>>>>> >>>>>>>>>> Rainer >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> David Lang >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> myriad of >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>>>> you >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> DON'T >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> LIKE THAT. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> rsyslog mailing list >>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>> myriad >>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>>>> if you >>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>> you >>>>>>>> DON'T >>>>>>>> LIKE THAT. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of >>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>>> DON'T >>>>>>> LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>> you DON'T LIKE THAT. >>>> >>>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >> >> > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

