Updated config to:
module(load="omrelp")
local0.* action(type="omrelp"
target="10.0.0.1"
port="2514"
name="debug_relp"
queue.filename="relp_queue"
queue.maxdiskspace="1G"
queue.saveonshutdown="on"
queue.spoolDirectory="/var/spool/rsyslog"
queue.type="Disk"
action.resumeRetryCount="-1"
action.resumeInterval="1"
)
Still message is lost:
3489.658349703:imuxsock.c : Message from UNIX socket: #3
3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size now
log 1, phys 1 entries
3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker start
3489.658374985:imuxsock.c : --------imuxsock calling select, active
file descriptors (max 3): 3
3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
and enqueued 0 objects
3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new
sizes: log 1, phys 1
3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
processed
3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29
17:44:49 logger: test
3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*'
3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X X X
X X X X X FF X X X X X X X X X
3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658414686:main Q:Reg/w0 : ACTION 0 [omrelp:action(type="omrelp"
...)]
3489.658417648:main Q:Reg/w0 : executing action 0
3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging to
omrelp (susp 0/0, direct q 0)
3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue) flush,
buflen 410
3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len 410
3489.658484243:main Q:Reg/w0 : file
'/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384
3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file
'/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5
3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote 410 bytes
3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410 octets to
disk, queue disk size now 410 octets, EnqOnly:0
3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry added,
size now log 1, phys 1 entries
3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity -
starting 1 additional worker thread(s).
3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with state 0,
num workers now 1
3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised worker
start
3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended state to
0
3489.658533672:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658553393:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3489.658556301:main Q:Reg/w0 : executing action 1
3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended state to 0
3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X X FF
X X X X X X X X X X X X X X X
3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*'
3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X X X
X X X X X X X X X X X X X X X
3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*'
3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X FF X
X X X X X X X X X X X X X X X
3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg'
3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X X
X X X X X X X X X X X X X X X
3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*'
3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X X X
X X X X X X X X X X X X FF X X
3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658675686:main Q:Reg/w0 : END batch execution phase, entering to
commit phase [processed 1 of 1 messages]
3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to
commit 0 isTransactional 1
3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x23c8ce0
3489.658680668:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0 logger:
test
3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 42
3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len 42
3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote 42
bytes
3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has been
processed
3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0
sz phys 1
3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects
and enqueued 0 objects
3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new
sizes: log 0, phys 0
3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0
sz phys 0
3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
work.
3489.660013052:7f1efbb14700: thread created, tid 30310, name 'rs:debug_relp
queue:Reg'
3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker starting
3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 0
objects and enqueued 0 objects
3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
store, new sizes: log 1, phys 1
3489.660947418:debug_relp queue:Reg/w0: file
'/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384
3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file
'/var/spool/rsyslog/relp_queue.00000001' for READ as 6
3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read 410
bytes
3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is
transactional - executing in commit phase
3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to create a
new action worker instance for action 0
3489.661789583:debug_relp queue:Reg/w0: relp engine created new client
0x7f1ef00027e0
3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action
worker instance 1 for action 0
3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction
^C3497.710058579:main thread : DoDie called.
3497.710272055:main thread : exiting on signal 2
3497.710576430:main thread : Terminating input threads...
3497.710765753:main thread : request term via SIGTTIN for input thread
'imuxsock' 0xfcb16700
3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
0x7f1efcb16700 returned with iRet 0, exiting now.
3497.711447372:main thread : non-cancel input thread termination
succeeded for thread imuxsock 0xfcb16700
3497.713028584:imuxsock.c : destructor for debug call stack
0x7f1eec0008c0 called
3497.713390963:main thread : imuxsock: unlinking unix socket file[0]
/dev/log
3497.713734513:main thread : main Q: qqueueAdd: entry added, size now
log 1, phys 1 entries
3497.713906635:main thread : main Q: EnqueueMsg advised worker start
3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
and enqueued 0 objects
3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new
sizes: log 1, phys 1
3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
processed
3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin
software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info="
http://www.rsyslog.com"] exiting on signal 2.
3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*'
3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X X X
X X X X X FF X X X X X X X X X
3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716337713:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1
3497.716429241:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3497.716439928:main Q:Reg/w0 : executing action 1
3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended state to 0
3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X X FF
X X X X X X X X X X X X X X X
3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*'
3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X X X
X X X X X X X X X X X X X X X
3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*'
3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X FF X
X X X X X X X X X X X X X X X
3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg'
3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X X
X X X X X X X X X X X X X X X
3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*'
3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X X X
X X X X X X X X X X X X FF X X
3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718842803:main Q:Reg/w0 : END batch execution phase, entering to
commit phase [processed 1 of 1 messages]
3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to
commit 0 isTransactional 1
3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x23c8ce0
3497.718857172:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0
rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
x-pid="30306" x-info="http://www.rs
3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 161
3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len
161
3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote 161
bytes
3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has been
processed
3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0
sz phys 1
3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects
and enqueued 0 objects
3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new
sizes: log 0, phys 0
3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0
sz phys 0
3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
work.
3497.718942403:main thread : Terminating main queue...
3497.718946280:main thread : main Q: initiating worker thread shutdown
sequence
3497.718950170:main thread : main Q: trying shutdown of regular workers
3497.718957067:main thread : sent SIGTTIN to worker thread 0xfc315700
3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker thread
termination, 1 still running
3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
and enqueued 0 objects
3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new
sizes: log 0, phys 0
3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0
sz phys 0
3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker
terminateRet=5, bInactivityTOOccured=0
3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup action
instances
3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil)
3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr 0x7f1ef40022a0
3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil)
3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil)
3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil)
3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil)
3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil)
3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil)
3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting
3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0,
terminated, num workers now 0
3497.719090631:main thread : main Q: regular queue workers shut down.
3497.719097597:main thread : main Q: checking to see if we need to
cancel any worker threads of the primary queue
3497.719101335:main thread : main Q: worker threads terminated,
remaining queue size log 0, phys 0.
3497.719111115:main thread : main Q: queue (type 0) will lose 0
messages, destroying...
3497.719501847:main Q:Reg/w0 : destructor for debug call stack
0x7f1ef40008c0 called
3497.719547524:main thread : Terminating outputs...
3497.719553777:main thread : destructing ruleset 0x23ab150, name
0x23ab190
3497.719558276:main thread : debug_relp queue: initiating worker thread
shutdown sequence
3497.719561940:main thread : debug_relp queue: trying shutdown of
regular workers
3497.719568068:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on worker
thread termination, 1 still running
3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to state: rtry
3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0, currIParam
1
3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter
loop, iRetries=0
3497.719677407:main thread : debug_relp queue:Reg: timeout waiting on
worker thread termination
3497.719677554:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719677725:main thread : debug_relp queue: regular shutdown timed
out on primary queue (this is OK)
3497.719677897:main thread : debug_relp queue: trying to shutdown
workers within Action Timeoutsetting EnqOnly mode
3497.719678191:main thread : debug_relp queue: trying immediate shutdown
of regular workers (if any)
3497.719678362:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms on
worker thread termination, 1 still running
3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
action->tryResume returned -2007
3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check for
max retries, iResumeRetryCount -1, iRetries 0
3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action 0x23bebf0
state: rtry, next retry (if applicable): 0 [now 1459273498]
3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet
-2007
3498.721034552:debug_relp queue:Reg/w0: regular consumer finished,
iret=-2007, szlog 0 sz phys 1
3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 1
objects and enqueued 0 objects
3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
store, new sizes: log 0, phys 0
3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating worker
because of TERMINATE_NOW mode, del iRet 0
3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker cleanup
action instances
3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr
0x7f1ef00020d0
3498.722431811:debug_relp queue:Reg/w0: relp engine destructing client
0x7f1ef00027e0
3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr (nil)
3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr (nil)
3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr (nil)
3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr (nil)
3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr (nil)
3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr (nil)
3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr (nil)
3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting
3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker thread
23c78b0, terminated, num workers now 0
3498.724243673:debug_relp queue:Reg/w0: destructor for debug call stack
0x7f1ef00008c0 called
3498.724371931:main thread : debug_relp queue: checking to see if we
need to cancel any worker threads of the primary queue
3498.724379712:main thread : debug_relp queue: worker threads
terminated, remaining queue size log 0, phys 0.
3498.724386288:main thread : debug_relp queue: persisting queue to disk,
0 entries...
3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue) closing
3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue) flush,
buflen 0 (no need to flush)
3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue) closing
3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue) closing
3498.724423956:main thread : strmCloseFile: deleting
'/var/spool/rsyslog/relp_queue.00000001'
3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages)
closing
3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 0 (no need to flush)
3498.724731690:main thread : all primary multi-thread sources have been
terminated - now doing aux cleanup...
3498.724737248:main thread : destructing current config...
3498.724741760:main thread : calling freeCnf(0x23a91e0) for module
'builtin:omfile'
3498.724745773:main thread : calling freeCnf(0x23a93a0) for module
'builtin:ompipe'
3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module
'builtin:omfwd'
3498.724754588:main thread : calling freeCnf(0x23b9b00) for module
'imuxsock'
3498.724758956:main thread : calling freeCnf(0x23bac70) for module
'imklog'
3498.724807014:main thread : destructing parser 'rsyslog.rfc5424'
3498.724871141:main thread : destructing parser 'rsyslog.rfc3164'
3498.724876440:main thread : pmrfc3164: free parser instance 0x23aa8b0
3498.724887558:main thread : file rsyslogd.c released module 'lmnet',
reference count now 2
3498.724892004:main thread : module lmnet NOT unloaded because it still
has a refcount of 2
3498.724895745:main thread : Unloading module builtin:omfile
3498.724900191:main thread : module lmnet NOT unloaded because it still
has a refcount of 2
3498.724904194:main thread : Unloading module builtin:ompipe
3498.724908035:main thread : module lmnet NOT unloaded because it still
has a refcount of 2
3498.724911707:main thread : Unloading module builtin-shell
3498.725044125:main thread : module lmnet NOT unloaded because it still
has a refcount of 2
3498.725050401:main thread : Unloading module builtin:omdiscard
3498.725054843:main thread : module lmnet NOT unloaded because it still
has a refcount of 2
3498.725058772:main thread : Unloading module builtin:omfwd
3498.725064464:main thread : file omfwd.c released module 'lmnet',
reference count now 1
3498.725068747:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725072460:main thread : Unloading module builtin:omusrmsg
3498.725076324:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725080023:main thread : Unloading module builtin:pmrfc5424
3498.725084039:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725087646:main thread : Unloading module builtin:pmrfc3164
3498.725091651:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725095255:main thread : Unloading module builtin:smfile
3498.725099010:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725102601:main thread : Unloading module builtin:smtradfile
3498.725106419:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725110003:main thread : Unloading module builtin:smfwd
3498.725113689:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725117357:main thread : Unloading module builtin:smtradfwd
3498.725121132:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725124782:main thread : Unloading module imuxsock
3498.725157459:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.725162797:main thread : Unloading module imklog
3498.725179208:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.726112994:main thread : Unloading module omrelp
3498.726184405:main thread : module lmnet NOT unloaded because it still
has a refcount of 1
3498.726189821:main thread : module lmregexp NOT unloaded because it
still has a refcount of 1
3498.726195161:main thread : file conf.c released module 'lmnet',
reference count now 0
3498.726198620:main thread : module 'lmnet' has zero reference count,
unloading...
3498.726201936:main thread : Unloading module lmnet
3498.726218655:main thread : rsyslog runtime de-initialized, current
users 0
3498.726222660:main thread : Clean shutdown completed, bye
On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]> wrote:
> Hello Reiner,
>
> I'm seeing the same behavior with omrelp module. I've configured omrelp to
> non-routable ip address:
> module(load="omrelp")
> local0.* action(type="omrelp"
> target="10.0.0.1"
> port="2514"
> name="debug_relp"
> queue.filename="relp_queue"
> queue.maxdiskspace="1G"
> queue.saveonshutdown="on"
> queue.spoolDirectory="/var/spool/rsyslog"
> queue.type="Disk"
> )
> Then I try to log a line:
> [root@314959ac8cb0 prism]# logger -p local0.info test
> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
> relp_queue.00000001
> # I shutdown rsyslog at this point
> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
> # after shutdown queue is deleted and message is lost
> [root@314959ac8cb0 prism]#
>
> Here is rsyslog log, so omrelp starts transaction and then I shutdown
> rsyslog, it seems at some point queue is deleted:
>
> 8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction
> ^C8194.245122305:main thread : DoDie called.
> 8194.245301525:main thread : exiting on signal 2
> 8194.245532709:main thread : Terminating input threads...
> 8194.245873683:main thread : request term via SIGTTIN for input thread
> 'imuxsock' 0xcb767700
> 8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
> 0x7f9bcb767700 returned with iRet 0, exiting now.
> 8194.247556671:main thread : non-cancel input thread termination
> succeeded for thread imuxsock 0xcb767700
> 8194.249357547:imuxsock.c : destructor for debug call stack
> 0x7f9bbc0008c0 called
> 8194.249784271:main thread : imuxsock: unlinking unix socket file[0]
> /dev/log
> 8194.250043368:main thread : main Q: qqueueAdd: entry added, size now
> log 1, phys 1 entries
> 8194.250321193:main thread : main Q: EnqueueMsg advised worker start
> 8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
> processing
> 8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
> and enqueued 0 objects
> 8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
> new sizes: log 1, phys 1
> 8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
> processed
> 8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin
> software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info="
> http://www.rsyslog.com"] exiting on signal 2.
> 8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*'
> 8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X X X
> X X X X X X FF X X X X X X X X X
> 8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.252902083:main Q:Reg/w0 : PRIFILT
> '*.info;mail.none;authpriv.none;cron.none'
> 8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X
> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
> 8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1
> 8194.252959849:main Q:Reg/w0 : ACTION 1
> [builtin:omfile:/var/log/messages]
> 8194.252967510:main Q:Reg/w0 : executing action 1
> 8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging to
> builtin:omfile (susp 0/0, direct q 1)
> 8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional -
> executing in commit phase
> 8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx
> 8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended state to 0
> 8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*'
> 8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X X X
> FF X X X X X X X X X X X X X X X
> 8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*'
> 8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X X X
> X X X X X X X X X X X X X X X X
> 8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*'
> 8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X X FF
> X X X X X X X X X X X X X X X X
> 8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg'
> 8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
> 8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X
> X X X X X X X X X X X X X X X X
> 8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*'
> 8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X X X
> X X X X X X X X X X X X X FF X X
> 8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0
> 8194.254450651:main Q:Reg/w0 : END batch execution phase, entering to
> commit phase [processed 1 of 1 messages]
> 8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr to
> commit 0 isTransactional 1
> 8194.254461646:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
> using that, pWrkrInfo 0x1c13cf0
> 8194.254466238:main Q:Reg/w0 : entering actionCallCommitTransaction(),
> state: itx, actionNbr 1, nMsgs 1
> 8194.254471876:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
> 0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0
> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
> x-pid="30298" x-info="http://www.rs
> 8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4(messages)
> flush, buflen 161
> 8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream 0x7f9bc4002480, len
> 161
> 8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write wrote
> 161 bytes
> 8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy
> 8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx
> 8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy
> 8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
> 8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements has been
> processed
> 8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0
> sz phys 1
> 8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects
> and enqueued 0 objects
> 8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
> new sizes: log 0, phys 0
> 8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0
> sz phys 0
> 8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
> work.
> 8194.254556418:main thread : Terminating main queue...
> 8194.254560776:main thread : main Q: initiating worker thread shutdown
> sequence
> 8194.254564653:main thread : main Q: trying shutdown of regular workers
> 8194.254572961:main thread : sent SIGTTIN to worker thread 0xcaf66700
> 8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker
> thread termination, 1 still running
> 8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
> processing
> 8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects
> and enqueued 0 objects
> 8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
> new sizes: log 0, phys 0
> 8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0
> sz phys 0
> 8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker
> terminateRet=5, bInactivityTOOccured=0
> 8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup action
> instances
> 8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil)
> 8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr 0x7f9bc40022a0
> 8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil)
> 8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil)
> 8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil)
> 8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil)
> 8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil)
> 8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil)
> 8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting
> 8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0,
> terminated, num workers now 0
> 8194.254656279:main thread : main Q: regular queue workers shut down.
> 8194.254656402:main thread : main Q: checking to see if we need to
> cancel any worker threads of the primary queue
> 8194.254656524:main thread : main Q: worker threads terminated,
> remaining queue size log 0, phys 0.
> 8194.254656647:main thread : main Q: queue (type 0) will lose 0
> messages, destroying...
> 8194.255021761:main Q:Reg/w0 : destructor for debug call stack
> 0x7f9bc40008c0 called
> 8194.255069521:main thread : Terminating outputs...
> 8194.255084818:main thread : destructing ruleset 0x1bf6150, name
> 0x1bf6190
> 8194.255089525:main thread : debug_relp queue: initiating worker thread
> shutdown sequence
> 8194.255093230:main thread : debug_relp queue: trying shutdown of
> regular workers
> 8194.255099210:main thread : sent SIGTTIN to worker thread 0xca765700
> 8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on
> worker thread termination, 1 still running
> 8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
> 8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to state:
> rtry
> 8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0,
> currIParam 1
> 8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter
> loop, iRetries=0
> 8194.255210939:main thread : debug_relp queue:Reg: timeout waiting on
> worker thread termination
> 8194.255211086:main thread : sent SIGTTIN to worker thread 0xca765700
> 8194.255211257:main thread : debug_relp queue: regular shutdown timed
> out on primary queue (this is OK)
> 8194.255211429:main thread : debug_relp queue: trying to shutdown
> workers within Action Timeoutsetting EnqOnly mode
> 8194.255215492:main thread : debug_relp queue: trying immediate
> shutdown of regular workers (if any)
> 8194.255221059:main thread : sent SIGTTIN to worker thread 0xca765700
> 8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms on
> worker thread termination, 1 still running
> 8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
> 8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
> action->tryResume returned -2007
> 8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check
> for max retries, iResumeRetryCount 0, iRetries 0
> 8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to state:
> susp
> 8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action
> 'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016
> 8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not submit
> message - queue does (no longer?) exist - ignored
> 8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp' suspended,
> earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration 30
> 8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action 0x1c09af0
> state: susp, next retry (if applicable): 1459208224 [now 1459208194]
> 8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet
> -2123
> 8194.255282409:debug_relp queue:Reg/w0: regular consumer finished,
> iret=-2123, szlog 0 sz phys 1
> 8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted 1
> objects and enqueued 0 objects
> 8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
> store, new sizes: log 0, phys 0
> 8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating worker
> because of TERMINATE_NOW mode, del iRet 0
> 8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker
> cleanup action instances
> 8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr
> 0x7f9bc00020d0
> 8194.255305929:debug_relp queue:Reg/w0: relp engine destructing client
> 0x7f9bc00027e0
> 8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr (nil)
> 8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr (nil)
> 8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr (nil)
> 8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr (nil)
> 8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr (nil)
> 8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr (nil)
> 8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr (nil)
> 8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker exiting
> 8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
> thread 1c12880, terminated, num workers now 0
> 8194.255344540:debug_relp queue:Reg/w0: destructor for debug call stack
> 0x7f9bc00008c0 called
> 8194.255371534:main thread : debug_relp queue: checking to see if we
> need to cancel any worker threads of the primary queue
> 8194.255376798:main thread : debug_relp queue: worker threads
> terminated, remaining queue size log 0, phys 0.
> 8194.255381256:main thread : debug_relp queue: persisting queue to
> disk, 0 entries...
> 8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue) closing
> 8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue) flush,
> buflen 0 (no need to flush)
> 8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue) closing
> 8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue) closing
> 8194.255409145:main thread : strmCloseFile: deleting
> '/var/spool/rsyslog/relp_queue.00000001'
> 8194.255460812:main thread : strm 0x7f9bc4002480: file 4(messages)
> closing
> 8194.255466309:main thread : strm 0x7f9bc4002480: file 4(messages)
> flush, buflen 0 (no need to flush)
> 8194.255479325:main thread : all primary multi-thread sources have been
> terminated - now doing aux cleanup...
> 8194.255482809:main thread : destructing current config...
> 8194.255486662:main thread : calling freeCnf(0x1bf41e0) for module
> 'builtin:omfile'
> 8194.255490098:main thread : calling freeCnf(0x1bf43a0) for module
> 'builtin:ompipe'
> 8194.255493809:main thread : calling freeCnf(0x1bf53a0) for module
> 'builtin:omfwd'
> 8194.255497793:main thread : calling freeCnf(0x1c04b00) for module
> 'imuxsock'
> 8194.255501565:main thread : calling freeCnf(0x1c05c70) for module
> 'imklog'
> 8194.255543579:main thread : destructing parser 'rsyslog.rfc5424'
> 8194.255547468:main thread : destructing parser 'rsyslog.rfc3164'
> 8194.255550784:main thread : pmrfc3164: free parser instance 0x1bf58b0
> 8194.255559659:main thread : file rsyslogd.c released module 'lmnet',
> reference count now 2
> 8194.255563470:main thread : module lmnet NOT unloaded because it still
> has a refcount of 2
> 8194.255566676:main thread : Unloading module builtin:omfile
> 8194.255570420:main thread : module lmnet NOT unloaded because it still
> has a refcount of 2
> 8194.255573674:main thread : Unloading module builtin:ompipe
> 8194.255576857:main thread : module lmnet NOT unloaded because it still
> has a refcount of 2
> 8194.255579938:main thread : Unloading module builtin-shell
> 8194.255583060:main thread : module lmnet NOT unloaded because it still
> has a refcount of 2
> 8194.255586144:main thread : Unloading module builtin:omdiscard
> 8194.255589509:main thread : module lmnet NOT unloaded because it still
> has a refcount of 2
> 8194.255592658:main thread : Unloading module builtin:omfwd
> 8194.255596441:main thread : file omfwd.c released module 'lmnet',
> reference count now 1
> 8194.255599981:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255603068:main thread : Unloading module builtin:omusrmsg
> 8194.255606240:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255609287:main thread : Unloading module builtin:pmrfc5424
> 8194.255612755:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255615826:main thread : Unloading module builtin:pmrfc3164
> 8194.255619252:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255622315:main thread : Unloading module builtin:smfile
> 8194.255625406:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255628486:main thread : Unloading module builtin:smtradfile
> 8194.255631753:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255634819:main thread : Unloading module builtin:smfwd
> 8194.255637916:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255640987:main thread : Unloading module builtin:smtradfwd
> 8194.255644187:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255647267:main thread : Unloading module imuxsock
> 8194.255676232:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255680751:main thread : Unloading module imklog
> 8194.255694428:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255698085:main thread : Unloading module omrelp
> 8194.255750953:main thread : module lmnet NOT unloaded because it still
> has a refcount of 1
> 8194.255755236:main thread : module lmregexp NOT unloaded because it
> still has a refcount of 1
> 8194.255759705:main thread : file conf.c released module 'lmnet',
> reference count now 0
> 8194.255762875:main thread : module 'lmnet' has zero reference count,
> unloading...
> 8194.255765901:main thread : Unloading module lmnet
> 8194.255780970:main thread : rsyslog runtime de-initialized, current
> users 0
> 8194.255784704:main thread : Clean shutdown completed, bye
>
>
>
> On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote:
>
>> On Sun, 13 Mar 2016, Kane Kim wrote:
>>
>> +1to David, having module handle the errors is not 100% reliable, what if
>>> module dies before having a chance to save error log, those messages will
>>> be lost.
>>>
>>
>> well, the module is a thread of rsyslog, so if it dies, rsyslog has died.
>>
>> But if you are doing large batches (hundreds or thousands), putting all
>> of them in a 'bad messages' file because one has a problem seems like the
>> wrong thing to do.
>>
>> given that the error is not detected until the batch is being flushed for
>> the output, any outputs that have already flushed their batches can't be
>> changed, so the question is if it's worse to have messages resent to those
>> outputs as the batch gets re-processed, or having to have the output go to
>> a "couldn't deliver" file.
>>
>> It's bad enough if the problem is a corrupt message that can never be
>> delivered and the error is a permanent one, but if the problem is instead a
>> temporary error (server is down), you will have a LOT of messages going
>> into this 'couldn't deliver' bucket, when what should really be happening
>> is that rsyslog should be pausing in the processing of messages, retrying
>> after the suspended timeout.
>>
>> David Lang
>>
>>
>> On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:
>>>
>>> On Sun, 13 Mar 2016, Rainer Gerhards wrote:
>>>>
>>>> Rainer, can you explain what happens when a batch fails on an action?
>>>>
>>>>>
>>>>>>
>>>>> It depends on when what happens and if we have batches of one or
>>>>> batches of many. For batches of one, the failure is immediately
>>>>> detected and acted upon. For batches of many,all action strings or
>>>>> stored temporarily. During that time, it is assume that the messages
>>>>> could be processed. There is no way around this assumption, because we
>>>>> now run each action completely through the rule engine before the next
>>>>> one is begun to process, which means that other parts of the config
>>>>> have actually acted on a message, which means we cannot undo that
>>>>> state. If is needed to act on the "real" state, we need to process
>>>>> batches of one. There simply is no other solution possible. So let's
>>>>> stay with batches of many. Once the batch is completed, the temporary
>>>>> strings are passed to the output's commitTransaction entry point. It
>>>>> is currently assumed that the output does inteligently handle the
>>>>> case, and as it looks most do this well (no regression tests failed,
>>>>> no bad reports since we introduced this with the initial v8 version
>>>>> (15+ month ago). If the action fails, status is reflected accordingly.
>>>>> Also, the core engine *should* (but obviously does not support this
>>>>> right now) write an error file with the messages in error (just like
>>>>> omelasticsearch does, which has proven to be quite good).
>>>>>
>>>>>
>>>> well, it's not in the specs that modules should have local error files
>>>> the
>>>> way omelasticsearch does.
>>>>
>>>> What I would expect if there is a problem when commiting the batch is
>>>> that
>>>> the entire batch would be marked as not delivered and try again with
>>>> half
>>>> of it.
>>>>
>>>> This will cause messages to other outputs working from the same queue to
>>>> get duplciate messages, but it's better than loosing an entire batch of
>>>> messages because one is corrupt in some way. or there is a temporary
>>>> problem with the output.
>>>>
>>>> David Lang
>>>>
>>>> I have managed to look at least briefly at the commitTransaction core
>>>>
>>>>> engine code. I think we can even improve it and may get some ability
>>>>> to do retries if the action itself does not handle that. At the
>>>>> minimum, it should have the ability (which was intended originally,
>>>>> but pushed away by more urgent work) to write that error file.
>>>>>
>>>>> I am very grateful that Kane is working on a repro with the standard
>>>>> modules. Once we have that, I can go deeper into the code and check
>>>>> what I am thinking about. It might be possible that we cannot manage
>>>>> to repro this with the standard modules. Then we need to think about a
>>>>> way via the testing module to do the same thing.
>>>>>
>>>>> I will try to address this within the 8.18 timeframe, but cannot
>>>>> totally commit to it. There has a lot of work been going on, and
>>>>> especially all that CI testing has taken up lots of works.
>>>>> Nevertheless, I've been able to iron out quite some nits, so it's
>>>>> probably worth it. I would still like to continue to get the CI
>>>>> environment into better shape, because now I know what I am doing. If
>>>>> I stop again and let it go for a couple of weeks, I would lose a lot
>>>>> of momentum. I have also do to some work for support customers, which
>>>>> keeps me able to work on rsyslog in general, so this, too, has
>>>>> priority (and validly has so because without support customers I would
>>>>> be totally unable to work on rsyslog other than as a hobby)
>>>>>
>>>>> I hope this explains the current state.
>>>>> Rainer
>>>>>
>>>>>
>>>>> Kane, can you try and duplicate your problem using standard modules?
>>>>>> I'm
>>>>>> thinking that if you define a omrelp action that doesn't go anywhere,
>>>>>> it
>>>>>> should return errors similar to what you are trying to do, so you
>>>>>> should
>>>>>> be
>>>>>> able to duplicate the problem that way.
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 23 Feb 2016, Kane Kim wrote:
>>>>>>
>>>>>> Looking at omkafka module source code it seems that it relies on
>>>>>> rsyslog
>>>>>>
>>>>>>> retries in DoAction, returning RS_RET_SUSPENDED:
>>>>>>>
>>>>>>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
>>>>>>> if(iRet != RS_RET_OK) {
>>>>>>> iRet = RS_RET_SUSPENDED;
>>>>>>> }
>>>>>>>
>>>>>>> I've tried similar code in DoAction, it seems that action processing
>>>>>>> is suspended at that point and messages are not retried to module.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for your help Rainer, I'll try to debug what's going on, so
>>>>>>> far
>>>>>>>
>>>>>>>> it
>>>>>>>> seems rsyslog doesn't retry even with batch size 1. It doesn't retry
>>>>>>>> if I
>>>>>>>> return RS_RET_SUSPENDED from DoAction as well.
>>>>>>>>
>>>>>>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
>>>>>>>> <[email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote:
>>>>>>>>>>
>>>>>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some
>>>>>>>>>>>> context:
>>>>>>>>>>>> I
>>>>>>>>>>>> want
>>>>>>>>>>>> to write module that both using batching and also can't loose
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> messages in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> any circumstances. Are you saying it is by design that rsyslog
>>>>>>>>>>>> can't
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> do
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> that together? According to documentation rsyslog will retry if
>>>>>>>>>>>> module
>>>>>>>>>>>> returns any error. Do you plan to fix this in rsyslog or update
>>>>>>>>>>>> documentation to say batching and retries don't work?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It depends on many things. In almost all cases, the retry should
>>>>>>>>>>> work
>>>>>>>>>>> well (and does so in practice). Unfortunately, I am pretty
>>>>>>>>>>> swamped.
>>>>>>>>>>> I
>>>>>>>>>>> need to go to a conference tomorrow and have had quite some
>>>>>>>>>>> unexpected
>>>>>>>>>>> work today. It would probably be good if you could ping me next
>>>>>>>>>>> week
>>>>>>>>>>> to see if we can look into more details what is causing you pain.
>>>>>>>>>>> But
>>>>>>>>>>> I can't guarantee that I will be available early next week.
>>>>>>>>>>>
>>>>>>>>>>> In general, we cannot handle a fatal error here from an engine
>>>>>>>>>>> PoV,
>>>>>>>>>>> because everything is already processed and we do no longer have
>>>>>>>>>>> the
>>>>>>>>>>> original messages. This is simply needed if you want to process
>>>>>>>>>>> messages one after another through the full config (a goal for v8
>>>>>>>>>>> that
>>>>>>>>>>> was muuuuch requested). As I said, the solution is to use
>>>>>>>>>>> batches of
>>>>>>>>>>> one, because otherwise we would really need to turn back time and
>>>>>>>>>>> undo
>>>>>>>>>>> everything that was already done on the messages in question by
>>>>>>>>>>> other
>>>>>>>>>>> modules (including state advances).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I thought that if a batch failed, it pushed all the messages back
>>>>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>> queue and retried with a half size batch until it got to the one
>>>>>>>>>> message
>>>>>>>>>> that could not be processed and only did a fatal fail on that
>>>>>>>>>> message.
>>>>>>>>>>
>>>>>>>>>> Now, there is a big difference between a module giving a hard
>>>>>>>>>> error
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> "this
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> message is never going to be able to be processed no matter how
>>>>>>>>>> many
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> times
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> it's retried" vs a soft error "there is a problem delivering
>>>>>>>>>> things
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> this
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> destination right now, retry later". I thought the batch
>>>>>>>>>> processing
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> handled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> these differently.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That's no longer possible with v8, at least generically. As I said
>>>>>>>>> above, we would need to turn back time.
>>>>>>>>>
>>>>>>>>> But I really run out of time now...
>>>>>>>>>
>>>>>>>>> Rainer
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David Lang
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> rsyslog mailing list
>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> myriad of
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>> you
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> DON'T
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> LIKE THAT.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>> myriad
>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>> you
>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>> myriad
>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>> you
>>>>>>> DON'T
>>>>>>> LIKE THAT.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of
>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T
>>>>>> LIKE THAT.
>>>>>>
>>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.