2016-04-01 15:58 GMT+02:00 Kane Kim <[email protected]>: > Bump, Rainer, what's your thoughts on what I've posted above?^
one thing after another, please ;) I did not yet have time to dig deeper into that (but much of the work I did was kind of prep work). I hope to be able to look some time next week into it. Rainer > Also, it seems I understand why rsyslog would drop messages, but for our > use case it's better to have duplicates than drop them (we want at least > once semantics), is it possible to configure rsyslog that way (or add this > functionality)? > > Thanks! > > On Tue, Mar 29, 2016 at 10:46 AM, Kane Kim <[email protected]> wrote: > >> Updated config to: >> module(load="omrelp") >> local0.* action(type="omrelp" >> target="10.0.0.1" >> port="2514" >> name="debug_relp" >> queue.filename="relp_queue" >> queue.maxdiskspace="1G" >> queue.saveonshutdown="on" >> queue.spoolDirectory="/var/spool/rsyslog" >> queue.type="Disk" >> action.resumeRetryCount="-1" >> action.resumeInterval="1" >> ) >> >> Still message is lost: >> >> 3489.658349703:imuxsock.c : Message from UNIX socket: #3 >> 3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size now >> log 1, phys 1 entries >> 3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker start >> 3489.658374985:imuxsock.c : --------imuxsock calling select, active >> file descriptors (max 3): 3 >> 3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle >> processing >> 3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects >> and enqueued 0 objects >> 3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 1, phys 1 >> 3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements must be >> processed >> 3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29 >> 17:44:49 logger: test >> 3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*' >> 3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X FF X X X X X X X X X >> 3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1 >> 3489.658414686:main Q:Reg/w0 : ACTION 0 [omrelp:action(type="omrelp" >> ...)] >> 3489.658417648:main Q:Reg/w0 : executing action 0 >> 3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging to >> omrelp (susp 0/0, direct q 0) >> 3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue) flush, >> buflen 410 >> 3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len 410 >> 3489.658484243:main Q:Reg/w0 : file >> '/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384 >> 3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file >> '/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5 >> 3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote 410 >> bytes >> 3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410 octets >> to disk, queue disk size now 410 octets, EnqOnly:0 >> 3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry added, >> size now log 1, phys 1 entries >> 3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity - >> starting 1 additional worker thread(s). >> 3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with state >> 0, num workers now 1 >> 3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised >> worker start >> 3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended state >> to 0 >> 3489.658533672:main Q:Reg/w0 : PRIFILT >> '*.info;mail.none;authpriv.none;cron.none' >> 3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X >> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F >> 3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1 >> 3489.658553393:main Q:Reg/w0 : ACTION 1 >> [builtin:omfile:/var/log/messages] >> 3489.658556301:main Q:Reg/w0 : executing action 1 >> 3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to >> builtin:omfile (susp 0/0, direct q 1) >> 3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional - >> executing in commit phase >> 3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended state to 0 >> 3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*' >> 3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X X >> FF X X X X X X X X X X X X X X X >> 3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*' >> 3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X X >> X X X X X X X X X X X X X X X X >> 3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*' >> 3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X FF >> X X X X X X X X X X X X X X X X >> 3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg' >> 3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 >> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> 3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' >> 3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X >> X X X X X X X X X X X X X X X X >> 3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*' >> 3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X X X X X X X X FF X X >> 3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3489.658675686:main Q:Reg/w0 : END batch execution phase, entering to >> commit phase [processed 1 of 1 messages] >> 3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to >> commit 0 isTransactional 1 >> 3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction IF, >> using that, pWrkrInfo 0x23c8ce0 >> 3489.658680668:main Q:Reg/w0 : entering actionCallCommitTransaction(), >> state: itx, actionNbr 1, nMsgs 1 >> 3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm >> 0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0 logger: >> test >> >> 3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages) >> flush, buflen 42 >> 3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len >> 42 >> 3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote 42 >> bytes >> 3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 >> 3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has been >> processed >> 3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 >> sz phys 1 >> 3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects >> and enqueued 0 objects >> 3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 0, phys 0 >> 3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 >> sz phys 0 >> 3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for >> work. >> 3489.660013052:7f1efbb14700: thread created, tid 30310, name >> 'rs:debug_relp queue:Reg' >> 3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker starting >> 3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 0 >> objects and enqueued 0 objects >> 3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from >> store, new sizes: log 1, phys 1 >> 3489.660947418:debug_relp queue:Reg/w0: file >> '/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384 >> 3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file >> '/var/spool/rsyslog/relp_queue.00000001' for READ as 6 >> 3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read 410 >> bytes >> 3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is >> transactional - executing in commit phase >> 3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to create a >> new action worker instance for action 0 >> 3489.661789583:debug_relp queue:Reg/w0: relp engine created new client >> 0x7f1ef00027e0 >> 3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action >> worker instance 1 for action 0 >> 3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction >> ^C3497.710058579:main thread : DoDie called. >> 3497.710272055:main thread : exiting on signal 2 >> 3497.710576430:main thread : Terminating input threads... >> 3497.710765753:main thread : request term via SIGTTIN for input thread >> 'imuxsock' 0xfcb16700 >> 3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock - >> 0x7f1efcb16700 returned with iRet 0, exiting now. >> 3497.711447372:main thread : non-cancel input thread termination >> succeeded for thread imuxsock 0xfcb16700 >> 3497.713028584:imuxsock.c : destructor for debug call stack >> 0x7f1eec0008c0 called >> 3497.713390963:main thread : imuxsock: unlinking unix socket file[0] >> /dev/log >> 3497.713734513:main thread : main Q: qqueueAdd: entry added, size now >> log 1, phys 1 entries >> 3497.713906635:main thread : main Q: EnqueueMsg advised worker start >> 3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle >> processing >> 3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects >> and enqueued 0 objects >> 3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 1, phys 1 >> 3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements must be >> processed >> 3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin >> software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info=" >> http://www.rsyslog.com"] exiting on signal 2. >> 3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*' >> 3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X FF X X X X X X X X X >> 3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.716337713:main Q:Reg/w0 : PRIFILT >> '*.info;mail.none;authpriv.none;cron.none' >> 3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X >> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F >> 3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1 >> 3497.716429241:main Q:Reg/w0 : ACTION 1 >> [builtin:omfile:/var/log/messages] >> 3497.716439928:main Q:Reg/w0 : executing action 1 >> 3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to >> builtin:omfile (susp 0/0, direct q 1) >> 3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional - >> executing in commit phase >> 3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended state to 0 >> 3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*' >> 3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X X >> FF X X X X X X X X X X X X X X X >> 3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*' >> 3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X X >> X X X X X X X X X X X X X X X X >> 3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*' >> 3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X FF >> X X X X X X X X X X X X X X X X >> 3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg' >> 3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 >> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> 3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' >> 3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X >> X X X X X X X X X X X X X X X X >> 3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*' >> 3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X X >> X X X X X X X X X X X X X FF X X >> 3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0 >> 3497.718842803:main Q:Reg/w0 : END batch execution phase, entering to >> commit phase [processed 1 of 1 messages] >> 3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to >> commit 0 isTransactional 1 >> 3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction IF, >> using that, pWrkrInfo 0x23c8ce0 >> 3497.718857172:main Q:Reg/w0 : entering actionCallCommitTransaction(), >> state: itx, actionNbr 1, nMsgs 1 >> 3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm >> 0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0 >> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master" >> x-pid="30306" x-info="http://www.rs >> 3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages) >> flush, buflen 161 >> 3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len >> 161 >> 3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote >> 161 bytes >> 3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx >> 3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy >> 3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 >> 3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has been >> processed >> 3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 >> sz phys 1 >> 3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects >> and enqueued 0 objects >> 3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 0, phys 0 >> 3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 >> sz phys 0 >> 3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for >> work. >> 3497.718942403:main thread : Terminating main queue... >> 3497.718946280:main thread : main Q: initiating worker thread shutdown >> sequence >> 3497.718950170:main thread : main Q: trying shutdown of regular workers >> 3497.718957067:main thread : sent SIGTTIN to worker thread 0xfc315700 >> 3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker >> thread termination, 1 still running >> 3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle >> processing >> 3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects >> and enqueued 0 objects >> 3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >> new sizes: log 0, phys 0 >> 3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 >> sz phys 0 >> 3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker >> terminateRet=5, bInactivityTOOccured=0 >> 3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup action >> instances >> 3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil) >> 3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr 0x7f1ef40022a0 >> 3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil) >> 3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil) >> 3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil) >> 3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil) >> 3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil) >> 3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil) >> 3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting >> 3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0, >> terminated, num workers now 0 >> 3497.719090631:main thread : main Q: regular queue workers shut down. >> 3497.719097597:main thread : main Q: checking to see if we need to >> cancel any worker threads of the primary queue >> 3497.719101335:main thread : main Q: worker threads terminated, >> remaining queue size log 0, phys 0. >> 3497.719111115:main thread : main Q: queue (type 0) will lose 0 >> messages, destroying... >> 3497.719501847:main Q:Reg/w0 : destructor for debug call stack >> 0x7f1ef40008c0 called >> 3497.719547524:main thread : Terminating outputs... >> 3497.719553777:main thread : destructing ruleset 0x23ab150, name >> 0x23ab190 >> 3497.719558276:main thread : debug_relp queue: initiating worker thread >> shutdown sequence >> 3497.719561940:main thread : debug_relp queue: trying shutdown of >> regular workers >> 3497.719568068:main thread : sent SIGTTIN to worker thread 0xfbb14700 >> 3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on >> worker thread termination, 1 still running >> 3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >> 3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to state: >> rtry >> 3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0, >> currIParam 1 >> 3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter >> loop, iRetries=0 >> 3497.719677407:main thread : debug_relp queue:Reg: timeout waiting on >> worker thread termination >> 3497.719677554:main thread : sent SIGTTIN to worker thread 0xfbb14700 >> 3497.719677725:main thread : debug_relp queue: regular shutdown timed >> out on primary queue (this is OK) >> 3497.719677897:main thread : debug_relp queue: trying to shutdown >> workers within Action Timeoutsetting EnqOnly mode >> 3497.719678191:main thread : debug_relp queue: trying immediate >> shutdown of regular workers (if any) >> 3497.719678362:main thread : sent SIGTTIN to worker thread 0xfbb14700 >> 3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms on >> worker thread termination, 1 still running >> 3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >> 3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp >> action->tryResume returned -2007 >> 3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check >> for max retries, iResumeRetryCount -1, iRetries 0 >> 3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action 0x23bebf0 >> state: rtry, next retry (if applicable): 0 [now 1459273498] >> 3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet >> -2007 >> 3498.721034552:debug_relp queue:Reg/w0: regular consumer finished, >> iret=-2007, szlog 0 sz phys 1 >> 3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 1 >> objects and enqueued 0 objects >> 3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from >> store, new sizes: log 0, phys 0 >> 3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating worker >> because of TERMINATE_NOW mode, del iRet 0 >> 3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker >> cleanup action instances >> 3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr >> 0x7f1ef00020d0 >> 3498.722431811:debug_relp queue:Reg/w0: relp engine destructing client >> 0x7f1ef00027e0 >> 3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr (nil) >> 3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr (nil) >> 3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr (nil) >> 3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr (nil) >> 3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr (nil) >> 3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr (nil) >> 3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr (nil) >> 3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting >> 3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker >> thread 23c78b0, terminated, num workers now 0 >> 3498.724243673:debug_relp queue:Reg/w0: destructor for debug call stack >> 0x7f1ef00008c0 called >> 3498.724371931:main thread : debug_relp queue: checking to see if we >> need to cancel any worker threads of the primary queue >> 3498.724379712:main thread : debug_relp queue: worker threads >> terminated, remaining queue size log 0, phys 0. >> 3498.724386288:main thread : debug_relp queue: persisting queue to >> disk, 0 entries... >> 3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue) closing >> 3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue) flush, >> buflen 0 (no need to flush) >> 3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue) closing >> 3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue) closing >> 3498.724423956:main thread : strmCloseFile: deleting >> '/var/spool/rsyslog/relp_queue.00000001' >> 3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages) >> closing >> 3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages) >> flush, buflen 0 (no need to flush) >> 3498.724731690:main thread : all primary multi-thread sources have been >> terminated - now doing aux cleanup... >> 3498.724737248:main thread : destructing current config... >> 3498.724741760:main thread : calling freeCnf(0x23a91e0) for module >> 'builtin:omfile' >> 3498.724745773:main thread : calling freeCnf(0x23a93a0) for module >> 'builtin:ompipe' >> 3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module >> 'builtin:omfwd' >> 3498.724754588:main thread : calling freeCnf(0x23b9b00) for module >> 'imuxsock' >> 3498.724758956:main thread : calling freeCnf(0x23bac70) for module >> 'imklog' >> 3498.724807014:main thread : destructing parser 'rsyslog.rfc5424' >> 3498.724871141:main thread : destructing parser 'rsyslog.rfc3164' >> 3498.724876440:main thread : pmrfc3164: free parser instance 0x23aa8b0 >> 3498.724887558:main thread : file rsyslogd.c released module 'lmnet', >> reference count now 2 >> 3498.724892004:main thread : module lmnet NOT unloaded because it still >> has a refcount of 2 >> 3498.724895745:main thread : Unloading module builtin:omfile >> 3498.724900191:main thread : module lmnet NOT unloaded because it still >> has a refcount of 2 >> 3498.724904194:main thread : Unloading module builtin:ompipe >> 3498.724908035:main thread : module lmnet NOT unloaded because it still >> has a refcount of 2 >> 3498.724911707:main thread : Unloading module builtin-shell >> 3498.725044125:main thread : module lmnet NOT unloaded because it still >> has a refcount of 2 >> 3498.725050401:main thread : Unloading module builtin:omdiscard >> 3498.725054843:main thread : module lmnet NOT unloaded because it still >> has a refcount of 2 >> 3498.725058772:main thread : Unloading module builtin:omfwd >> 3498.725064464:main thread : file omfwd.c released module 'lmnet', >> reference count now 1 >> 3498.725068747:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725072460:main thread : Unloading module builtin:omusrmsg >> 3498.725076324:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725080023:main thread : Unloading module builtin:pmrfc5424 >> 3498.725084039:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725087646:main thread : Unloading module builtin:pmrfc3164 >> 3498.725091651:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725095255:main thread : Unloading module builtin:smfile >> 3498.725099010:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725102601:main thread : Unloading module builtin:smtradfile >> 3498.725106419:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725110003:main thread : Unloading module builtin:smfwd >> 3498.725113689:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725117357:main thread : Unloading module builtin:smtradfwd >> 3498.725121132:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725124782:main thread : Unloading module imuxsock >> 3498.725157459:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.725162797:main thread : Unloading module imklog >> 3498.725179208:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.726112994:main thread : Unloading module omrelp >> 3498.726184405:main thread : module lmnet NOT unloaded because it still >> has a refcount of 1 >> 3498.726189821:main thread : module lmregexp NOT unloaded because it >> still has a refcount of 1 >> 3498.726195161:main thread : file conf.c released module 'lmnet', >> reference count now 0 >> 3498.726198620:main thread : module 'lmnet' has zero reference count, >> unloading... >> 3498.726201936:main thread : Unloading module lmnet >> 3498.726218655:main thread : rsyslog runtime de-initialized, current >> users 0 >> 3498.726222660:main thread : Clean shutdown completed, bye >> >> >> On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]> wrote: >> >>> Hello Reiner, >>> >>> I'm seeing the same behavior with omrelp module. I've configured omrelp >>> to non-routable ip address: >>> module(load="omrelp") >>> local0.* action(type="omrelp" >>> target="10.0.0.1" >>> port="2514" >>> name="debug_relp" >>> queue.filename="relp_queue" >>> queue.maxdiskspace="1G" >>> queue.saveonshutdown="on" >>> queue.spoolDirectory="/var/spool/rsyslog" >>> queue.type="Disk" >>> ) >>> Then I try to log a line: >>> [root@314959ac8cb0 prism]# logger -p local0.info test >>> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/ >>> relp_queue.00000001 >>> # I shutdown rsyslog at this point >>> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/ >>> # after shutdown queue is deleted and message is lost >>> [root@314959ac8cb0 prism]# >>> >>> Here is rsyslog log, so omrelp starts transaction and then I shutdown >>> rsyslog, it seems at some point queue is deleted: >>> >>> 8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction >>> ^C8194.245122305:main thread : DoDie called. >>> 8194.245301525:main thread : exiting on signal 2 >>> 8194.245532709:main thread : Terminating input threads... >>> 8194.245873683:main thread : request term via SIGTTIN for input thread >>> 'imuxsock' 0xcb767700 >>> 8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock - >>> 0x7f9bcb767700 returned with iRet 0, exiting now. >>> 8194.247556671:main thread : non-cancel input thread termination >>> succeeded for thread imuxsock 0xcb767700 >>> 8194.249357547:imuxsock.c : destructor for debug call stack >>> 0x7f9bbc0008c0 called >>> 8194.249784271:main thread : imuxsock: unlinking unix socket file[0] >>> /dev/log >>> 8194.250043368:main thread : main Q: qqueueAdd: entry added, size now >>> log 1, phys 1 entries >>> 8194.250321193:main thread : main Q: EnqueueMsg advised worker start >>> 8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle >>> processing >>> 8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 >>> objects and enqueued 0 objects >>> 8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >>> new sizes: log 1, phys 1 >>> 8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements must be >>> processed >>> 8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin >>> software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info=" >>> http://www.rsyslog.com"] exiting on signal 2. >>> 8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*' >>> 8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X X X >>> X X X X X X FF X X X X X X X X X >>> 8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.252902083:main Q:Reg/w0 : PRIFILT >>> '*.info;mail.none;authpriv.none;cron.none' >>> 8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X >>> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F >>> 8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1 >>> 8194.252959849:main Q:Reg/w0 : ACTION 1 >>> [builtin:omfile:/var/log/messages] >>> 8194.252967510:main Q:Reg/w0 : executing action 1 >>> 8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging to >>> builtin:omfile (susp 0/0, direct q 1) >>> 8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional - >>> executing in commit phase >>> 8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx >>> 8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended state to >>> 0 >>> 8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*' >>> 8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X X X >>> FF X X X X X X X X X X X X X X X >>> 8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*' >>> 8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X X X >>> X X X X X X X X X X X X X X X X >>> 8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*' >>> 8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X X FF >>> X X X X X X X X X X X X X X X X >>> 8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg' >>> 8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 >>> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>> 8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit' >>> 8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X >>> X X X X X X X X X X X X X X X X >>> 8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*' >>> 8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X X X >>> X X X X X X X X X X X X X FF X X >>> 8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0 >>> 8194.254450651:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 1 of 1 messages] >>> 8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr >>> to commit 0 isTransactional 1 >>> 8194.254461646:main Q:Reg/w0 : doTransaction: have commitTransaction IF, >>> using that, pWrkrInfo 0x1c13cf0 >>> 8194.254466238:main Q:Reg/w0 : entering actionCallCommitTransaction(), >>> state: itx, actionNbr 1, nMsgs 1 >>> 8194.254471876:main Q:Reg/w0 : omfile: write to stream, pData->pStrm >>> 0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0 >>> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master" >>> x-pid="30298" x-info="http://www.rs >>> 8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4(messages) >>> flush, buflen 161 >>> 8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream 0x7f9bc4002480, len >>> 161 >>> 8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write wrote >>> 161 bytes >>> 8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy >>> 8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx >>> 8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy >>> 8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0 >>> 8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements has >>> been processed >>> 8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0, szlog >>> 0 sz phys 1 >>> 8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 >>> objects and enqueued 0 objects >>> 8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >>> new sizes: log 0, phys 0 >>> 8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4, szlog >>> 0 sz phys 0 >>> 8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for >>> work. >>> 8194.254556418:main thread : Terminating main queue... >>> 8194.254560776:main thread : main Q: initiating worker thread shutdown >>> sequence >>> 8194.254564653:main thread : main Q: trying shutdown of regular workers >>> 8194.254572961:main thread : sent SIGTTIN to worker thread 0xcaf66700 >>> 8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker >>> thread termination, 1 still running >>> 8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle >>> processing >>> 8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 >>> objects and enqueued 0 objects >>> 8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from store, >>> new sizes: log 0, phys 0 >>> 8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4, szlog >>> 0 sz phys 0 >>> 8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker >>> terminateRet=5, bInactivityTOOccured=0 >>> 8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup >>> action instances >>> 8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil) >>> 8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr >>> 0x7f9bc40022a0 >>> 8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil) >>> 8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil) >>> 8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil) >>> 8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil) >>> 8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil) >>> 8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil) >>> 8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting >>> 8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0, >>> terminated, num workers now 0 >>> 8194.254656279:main thread : main Q: regular queue workers shut down. >>> 8194.254656402:main thread : main Q: checking to see if we need to >>> cancel any worker threads of the primary queue >>> 8194.254656524:main thread : main Q: worker threads terminated, >>> remaining queue size log 0, phys 0. >>> 8194.254656647:main thread : main Q: queue (type 0) will lose 0 >>> messages, destroying... >>> 8194.255021761:main Q:Reg/w0 : destructor for debug call stack >>> 0x7f9bc40008c0 called >>> 8194.255069521:main thread : Terminating outputs... >>> 8194.255084818:main thread : destructing ruleset 0x1bf6150, name >>> 0x1bf6190 >>> 8194.255089525:main thread : debug_relp queue: initiating worker >>> thread shutdown sequence >>> 8194.255093230:main thread : debug_relp queue: trying shutdown of >>> regular workers >>> 8194.255099210:main thread : sent SIGTTIN to worker thread 0xca765700 >>> 8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on >>> worker thread termination, 1 still running >>> 8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >>> 8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to state: >>> rtry >>> 8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0, >>> currIParam 1 >>> 8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter >>> loop, iRetries=0 >>> 8194.255210939:main thread : debug_relp queue:Reg: timeout waiting on >>> worker thread termination >>> 8194.255211086:main thread : sent SIGTTIN to worker thread 0xca765700 >>> 8194.255211257:main thread : debug_relp queue: regular shutdown timed >>> out on primary queue (this is OK) >>> 8194.255211429:main thread : debug_relp queue: trying to shutdown >>> workers within Action Timeoutsetting EnqOnly mode >>> 8194.255215492:main thread : debug_relp queue: trying immediate >>> shutdown of regular workers (if any) >>> 8194.255221059:main thread : sent SIGTTIN to worker thread 0xca765700 >>> 8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms on >>> worker thread termination, 1 still running >>> 8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014 >>> 8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp >>> action->tryResume returned -2007 >>> 8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check >>> for max retries, iResumeRetryCount 0, iRetries 0 >>> 8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to state: >>> susp >>> 8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action >>> 'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016 >>> 8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not submit >>> message - queue does (no longer?) exist - ignored >>> 8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp' suspended, >>> earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration 30 >>> 8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action 0x1c09af0 >>> state: susp, next retry (if applicable): 1459208224 [now 1459208194] >>> 8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet >>> -2123 >>> 8194.255282409:debug_relp queue:Reg/w0: regular consumer finished, >>> iret=-2123, szlog 0 sz phys 1 >>> 8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted >>> 1 objects and enqueued 0 objects >>> 8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from >>> store, new sizes: log 0, phys 0 >>> 8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating worker >>> because of TERMINATE_NOW mode, del iRet 0 >>> 8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker >>> cleanup action instances >>> 8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr >>> 0x7f9bc00020d0 >>> 8194.255305929:debug_relp queue:Reg/w0: relp engine destructing client >>> 0x7f9bc00027e0 >>> 8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr (nil) >>> 8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr (nil) >>> 8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr (nil) >>> 8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr (nil) >>> 8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr (nil) >>> 8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr (nil) >>> 8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr (nil) >>> 8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker exiting >>> 8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker >>> thread 1c12880, terminated, num workers now 0 >>> 8194.255344540:debug_relp queue:Reg/w0: destructor for debug call stack >>> 0x7f9bc00008c0 called >>> 8194.255371534:main thread : debug_relp queue: checking to see if we >>> need to cancel any worker threads of the primary queue >>> 8194.255376798:main thread : debug_relp queue: worker threads >>> terminated, remaining queue size log 0, phys 0. >>> 8194.255381256:main thread : debug_relp queue: persisting queue to >>> disk, 0 entries... >>> 8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue) closing >>> 8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue) flush, >>> buflen 0 (no need to flush) >>> 8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue) closing >>> 8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue) >>> closing >>> 8194.255409145:main thread : strmCloseFile: deleting >>> '/var/spool/rsyslog/relp_queue.00000001' >>> 8194.255460812:main thread : strm 0x7f9bc4002480: file 4(messages) >>> closing >>> 8194.255466309:main thread : strm 0x7f9bc4002480: file 4(messages) >>> flush, buflen 0 (no need to flush) >>> 8194.255479325:main thread : all primary multi-thread sources have >>> been terminated - now doing aux cleanup... >>> 8194.255482809:main thread : destructing current config... >>> 8194.255486662:main thread : calling freeCnf(0x1bf41e0) for module >>> 'builtin:omfile' >>> 8194.255490098:main thread : calling freeCnf(0x1bf43a0) for module >>> 'builtin:ompipe' >>> 8194.255493809:main thread : calling freeCnf(0x1bf53a0) for module >>> 'builtin:omfwd' >>> 8194.255497793:main thread : calling freeCnf(0x1c04b00) for module >>> 'imuxsock' >>> 8194.255501565:main thread : calling freeCnf(0x1c05c70) for module >>> 'imklog' >>> 8194.255543579:main thread : destructing parser 'rsyslog.rfc5424' >>> 8194.255547468:main thread : destructing parser 'rsyslog.rfc3164' >>> 8194.255550784:main thread : pmrfc3164: free parser instance 0x1bf58b0 >>> 8194.255559659:main thread : file rsyslogd.c released module 'lmnet', >>> reference count now 2 >>> 8194.255563470:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 2 >>> 8194.255566676:main thread : Unloading module builtin:omfile >>> 8194.255570420:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 2 >>> 8194.255573674:main thread : Unloading module builtin:ompipe >>> 8194.255576857:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 2 >>> 8194.255579938:main thread : Unloading module builtin-shell >>> 8194.255583060:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 2 >>> 8194.255586144:main thread : Unloading module builtin:omdiscard >>> 8194.255589509:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 2 >>> 8194.255592658:main thread : Unloading module builtin:omfwd >>> 8194.255596441:main thread : file omfwd.c released module 'lmnet', >>> reference count now 1 >>> 8194.255599981:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255603068:main thread : Unloading module builtin:omusrmsg >>> 8194.255606240:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255609287:main thread : Unloading module builtin:pmrfc5424 >>> 8194.255612755:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255615826:main thread : Unloading module builtin:pmrfc3164 >>> 8194.255619252:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255622315:main thread : Unloading module builtin:smfile >>> 8194.255625406:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255628486:main thread : Unloading module builtin:smtradfile >>> 8194.255631753:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255634819:main thread : Unloading module builtin:smfwd >>> 8194.255637916:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255640987:main thread : Unloading module builtin:smtradfwd >>> 8194.255644187:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255647267:main thread : Unloading module imuxsock >>> 8194.255676232:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255680751:main thread : Unloading module imklog >>> 8194.255694428:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255698085:main thread : Unloading module omrelp >>> 8194.255750953:main thread : module lmnet NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255755236:main thread : module lmregexp NOT unloaded because it >>> still has a refcount of 1 >>> 8194.255759705:main thread : file conf.c released module 'lmnet', >>> reference count now 0 >>> 8194.255762875:main thread : module 'lmnet' has zero reference count, >>> unloading... >>> 8194.255765901:main thread : Unloading module lmnet >>> 8194.255780970:main thread : rsyslog runtime de-initialized, current >>> users 0 >>> 8194.255784704:main thread : Clean shutdown completed, bye >>> >>> >>> >>> On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote: >>> >>>> On Sun, 13 Mar 2016, Kane Kim wrote: >>>> >>>> +1to David, having module handle the errors is not 100% reliable, what if >>>>> module dies before having a chance to save error log, those messages >>>>> will >>>>> be lost. >>>>> >>>> >>>> well, the module is a thread of rsyslog, so if it dies, rsyslog has died. >>>> >>>> But if you are doing large batches (hundreds or thousands), putting all >>>> of them in a 'bad messages' file because one has a problem seems like the >>>> wrong thing to do. >>>> >>>> given that the error is not detected until the batch is being flushed >>>> for the output, any outputs that have already flushed their batches can't >>>> be changed, so the question is if it's worse to have messages resent to >>>> those outputs as the batch gets re-processed, or having to have the output >>>> go to a "couldn't deliver" file. >>>> >>>> It's bad enough if the problem is a corrupt message that can never be >>>> delivered and the error is a permanent one, but if the problem is instead a >>>> temporary error (server is down), you will have a LOT of messages going >>>> into this 'couldn't deliver' bucket, when what should really be happening >>>> is that rsyslog should be pausing in the processing of messages, retrying >>>> after the suspended timeout. >>>> >>>> David Lang >>>> >>>> >>>> On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote: >>>>> >>>>> On Sun, 13 Mar 2016, Rainer Gerhards wrote: >>>>>> >>>>>> Rainer, can you explain what happens when a batch fails on an action? >>>>>> >>>>>>> >>>>>>>> >>>>>>> It depends on when what happens and if we have batches of one or >>>>>>> batches of many. For batches of one, the failure is immediately >>>>>>> detected and acted upon. For batches of many,all action strings or >>>>>>> stored temporarily. During that time, it is assume that the messages >>>>>>> could be processed. There is no way around this assumption, because we >>>>>>> now run each action completely through the rule engine before the next >>>>>>> one is begun to process, which means that other parts of the config >>>>>>> have actually acted on a message, which means we cannot undo that >>>>>>> state. If is needed to act on the "real" state, we need to process >>>>>>> batches of one. There simply is no other solution possible. So let's >>>>>>> stay with batches of many. Once the batch is completed, the temporary >>>>>>> strings are passed to the output's commitTransaction entry point. It >>>>>>> is currently assumed that the output does inteligently handle the >>>>>>> case, and as it looks most do this well (no regression tests failed, >>>>>>> no bad reports since we introduced this with the initial v8 version >>>>>>> (15+ month ago). If the action fails, status is reflected accordingly. >>>>>>> Also, the core engine *should* (but obviously does not support this >>>>>>> right now) write an error file with the messages in error (just like >>>>>>> omelasticsearch does, which has proven to be quite good). >>>>>>> >>>>>>> >>>>>> well, it's not in the specs that modules should have local error files >>>>>> the >>>>>> way omelasticsearch does. >>>>>> >>>>>> What I would expect if there is a problem when commiting the batch is >>>>>> that >>>>>> the entire batch would be marked as not delivered and try again with >>>>>> half >>>>>> of it. >>>>>> >>>>>> This will cause messages to other outputs working from the same queue >>>>>> to >>>>>> get duplciate messages, but it's better than loosing an entire batch of >>>>>> messages because one is corrupt in some way. or there is a temporary >>>>>> problem with the output. >>>>>> >>>>>> David Lang >>>>>> >>>>>> I have managed to look at least briefly at the commitTransaction core >>>>>> >>>>>>> engine code. I think we can even improve it and may get some ability >>>>>>> to do retries if the action itself does not handle that. At the >>>>>>> minimum, it should have the ability (which was intended originally, >>>>>>> but pushed away by more urgent work) to write that error file. >>>>>>> >>>>>>> I am very grateful that Kane is working on a repro with the standard >>>>>>> modules. Once we have that, I can go deeper into the code and check >>>>>>> what I am thinking about. It might be possible that we cannot manage >>>>>>> to repro this with the standard modules. Then we need to think about a >>>>>>> way via the testing module to do the same thing. >>>>>>> >>>>>>> I will try to address this within the 8.18 timeframe, but cannot >>>>>>> totally commit to it. There has a lot of work been going on, and >>>>>>> especially all that CI testing has taken up lots of works. >>>>>>> Nevertheless, I've been able to iron out quite some nits, so it's >>>>>>> probably worth it. I would still like to continue to get the CI >>>>>>> environment into better shape, because now I know what I am doing. If >>>>>>> I stop again and let it go for a couple of weeks, I would lose a lot >>>>>>> of momentum. I have also do to some work for support customers, which >>>>>>> keeps me able to work on rsyslog in general, so this, too, has >>>>>>> priority (and validly has so because without support customers I would >>>>>>> be totally unable to work on rsyslog other than as a hobby) >>>>>>> >>>>>>> I hope this explains the current state. >>>>>>> Rainer >>>>>>> >>>>>>> >>>>>>> Kane, can you try and duplicate your problem using standard modules? >>>>>>>> I'm >>>>>>>> thinking that if you define a omrelp action that doesn't go >>>>>>>> anywhere, it >>>>>>>> should return errors similar to what you are trying to do, so you >>>>>>>> should >>>>>>>> be >>>>>>>> able to duplicate the problem that way. >>>>>>>> >>>>>>>> David Lang >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 23 Feb 2016, Kane Kim wrote: >>>>>>>> >>>>>>>> Looking at omkafka module source code it seems that it relies on >>>>>>>> rsyslog >>>>>>>> >>>>>>>>> retries in DoAction, returning RS_RET_SUSPENDED: >>>>>>>>> >>>>>>>>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet); >>>>>>>>> if(iRet != RS_RET_OK) { >>>>>>>>> iRet = RS_RET_SUSPENDED; >>>>>>>>> } >>>>>>>>> >>>>>>>>> I've tried similar code in DoAction, it seems that action processing >>>>>>>>> is suspended at that point and messages are not retried to module. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks for your help Rainer, I'll try to debug what's going on, so >>>>>>>>> far >>>>>>>>> >>>>>>>>>> it >>>>>>>>>> seems rsyslog doesn't retry even with batch size 1. It doesn't >>>>>>>>>> retry >>>>>>>>>> if I >>>>>>>>>> return RS_RET_SUSPENDED from DoAction as well. >>>>>>>>>> >>>>>>>>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards >>>>>>>>>> <[email protected] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote: >>>>>>>>>>>> >>>>>>>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some >>>>>>>>>>>>>> context: >>>>>>>>>>>>>> I >>>>>>>>>>>>>> want >>>>>>>>>>>>>> to write module that both using batching and also can't loose >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> messages in >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> any circumstances. Are you saying it is by design that rsyslog >>>>>>>>>>>>>> can't >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> do >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> that together? According to documentation rsyslog will retry if >>>>>>>>>>>>>> module >>>>>>>>>>>>>> returns any error. Do you plan to fix this in rsyslog or update >>>>>>>>>>>>>> documentation to say batching and retries don't work? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It depends on many things. In almost all cases, the retry should >>>>>>>>>>>>> work >>>>>>>>>>>>> well (and does so in practice). Unfortunately, I am pretty >>>>>>>>>>>>> swamped. >>>>>>>>>>>>> I >>>>>>>>>>>>> need to go to a conference tomorrow and have had quite some >>>>>>>>>>>>> unexpected >>>>>>>>>>>>> work today. It would probably be good if you could ping me next >>>>>>>>>>>>> week >>>>>>>>>>>>> to see if we can look into more details what is causing you >>>>>>>>>>>>> pain. >>>>>>>>>>>>> But >>>>>>>>>>>>> I can't guarantee that I will be available early next week. >>>>>>>>>>>>> >>>>>>>>>>>>> In general, we cannot handle a fatal error here from an engine >>>>>>>>>>>>> PoV, >>>>>>>>>>>>> because everything is already processed and we do no longer >>>>>>>>>>>>> have the >>>>>>>>>>>>> original messages. This is simply needed if you want to process >>>>>>>>>>>>> messages one after another through the full config (a goal for >>>>>>>>>>>>> v8 >>>>>>>>>>>>> that >>>>>>>>>>>>> was muuuuch requested). As I said, the solution is to use >>>>>>>>>>>>> batches of >>>>>>>>>>>>> one, because otherwise we would really need to turn back time >>>>>>>>>>>>> and >>>>>>>>>>>>> undo >>>>>>>>>>>>> everything that was already done on the messages in question by >>>>>>>>>>>>> other >>>>>>>>>>>>> modules (including state advances). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I thought that if a batch failed, it pushed all the messages >>>>>>>>>>>> back on >>>>>>>>>>>> the >>>>>>>>>>>> queue and retried with a half size batch until it got to the one >>>>>>>>>>>> message >>>>>>>>>>>> that could not be processed and only did a fatal fail on that >>>>>>>>>>>> message. >>>>>>>>>>>> >>>>>>>>>>>> Now, there is a big difference between a module giving a hard >>>>>>>>>>>> error >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> "this >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> message is never going to be able to be processed no matter how >>>>>>>>>>>> many >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> times >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> it's retried" vs a soft error "there is a problem delivering >>>>>>>>>>>> things >>>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> this >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> destination right now, retry later". I thought the batch >>>>>>>>>>>> processing >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> handled >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> these differently. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> That's no longer possible with v8, at least generically. As I said >>>>>>>>>>> above, we would need to turn back time. >>>>>>>>>>> >>>>>>>>>>> But I really run out of time now... >>>>>>>>>>> >>>>>>>>>>> Rainer >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> David Lang >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> myriad of >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>>>>> you >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> DON'T >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> LIKE THAT. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>> myriad >>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>>>>> if you >>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad >>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>> you >>>>>>>>> DON'T >>>>>>>>> LIKE THAT. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of >>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>>>> DON'T >>>>>>>> LIKE THAT. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>>> DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>> you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>> >>> >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

