From: Kane Kim <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] retry if output module returns RS_RET_SUSPENDED
Updated config to:
module(load="omrelp")
local0.* action(type="omrelp"
target="10.0.0.1"
port="2514"
name="debug_relp"
queue.filename="relp_queue"
queue.maxdiskspace="1G"
queue.saveonshutdown="on"
queue.spoolDirectory="/var/spool/rsyslog"
queue.type="Disk"
action.resumeRetryCount="-1"
action.resumeInterval="1"
)
Still message is lost:
3489.658349703:imuxsock.c : Message from UNIX socket: #3
3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size now
log 1, phys 1 entries
3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker start
3489.658374985:imuxsock.c : --------imuxsock calling select, active
file descriptors (max 3): 3
3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new
sizes: log 1, phys 1
3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
processed
3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29
17:44:49 logger: test
3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*'
3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X FF X X X X X X X X X
3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658414686:main Q:Reg/w0 : ACTION 0 [omrelp:action(type="omrelp"
...)]
3489.658417648:main Q:Reg/w0 : executing action 0
3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging to
omrelp (susp 0/0, direct q 0)
3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue)
flush,
buflen 410
3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len 410
3489.658484243:main Q:Reg/w0 : file
'/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384
3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file
'/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5
3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote 410
bytes
3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410 octets
to
disk, queue disk size now 410 octets, EnqOnly:0
3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry added,
size now log 1, phys 1 entries
3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity -
starting 1 additional worker thread(s).
3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with state
0,
num workers now 1
3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised
worker
start
3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended state
to
0
3489.658533672:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658553393:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3489.658556301:main Q:Reg/w0 : executing action 1
3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended state
to 0
3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X X
FF
X X X X X X X X X X X X X X X
3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*'
3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X
X X
X X X X X X X X X X X X X X X
3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*'
3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X
FF X
X X X X X X X X X X X X X X X
3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg'
3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7
X X
X X X X X X X X X X X X X X X
3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*'
3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X X X X X X X X FF X X
3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658675686:main Q:Reg/w0 : END batch execution phase, entering to
commit phase [processed 1 of 1 messages]
3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr
to
commit 0 isTransactional 1
3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x23c8ce0
3489.658680668:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0 logger:
test
3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 42
3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480,
len 42
3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote
42
bytes
3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has
been
processed
3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0, szlog
0
sz phys 1
3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new
sizes: log 0, phys 0
3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4, szlog
0
sz phys 0
3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
work.
3489.660013052:7f1efbb14700: thread created, tid 30310, name
'rs:debug_relp
queue:Reg'
3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker starting
3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted
0
objects and enqueued 0 objects
3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
store, new sizes: log 1, phys 1
3489.660947418:debug_relp queue:Reg/w0: file
'/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384
3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file
'/var/spool/rsyslog/relp_queue.00000001' for READ as 6
3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read 410
bytes
3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is
transactional - executing in commit phase
3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to create
a
new action worker instance for action 0
3489.661789583:debug_relp queue:Reg/w0: relp engine created new client
0x7f1ef00027e0
3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action
worker instance 1 for action 0
3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction
^C3497.710058579:main thread : DoDie called.
3497.710272055:main thread : exiting on signal 2
3497.710576430:main thread : Terminating input threads...
3497.710765753:main thread : request term via SIGTTIN for input thread
'imuxsock' 0xfcb16700
3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
0x7f1efcb16700 returned with iRet 0, exiting now.
3497.711447372:main thread : non-cancel input thread termination
succeeded for thread imuxsock 0xfcb16700
3497.713028584:imuxsock.c : destructor for debug call stack
0x7f1eec0008c0 called
3497.713390963:main thread : imuxsock: unlinking unix socket file[0]
/dev/log
3497.713734513:main thread : main Q: qqueueAdd: entry added, size now
log 1, phys 1 entries
3497.713906635:main thread : main Q: EnqueueMsg advised worker start
3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new
sizes: log 1, phys 1
3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements must be
processed
3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin
software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info="
http://www.rsyslog.com"] exiting on signal 2.
3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*'
3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X FF X X X X X X X X X
3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716337713:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1
3497.716429241:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3497.716439928:main Q:Reg/w0 : executing action 1
3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended state
to 0
3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X X
FF
X X X X X X X X X X X X X X X
3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*'
3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X
X X
X X X X X X X X X X X X X X X
3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*'
3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X
FF X
X X X X X X X X X X X X X X X
3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg'
3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7
X X
X X X X X X X X X X X X X X X
3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*'
3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X X X X X X X X FF X X
3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718842803:main Q:Reg/w0 : END batch execution phase, entering to
commit phase [processed 1 of 1 messages]
3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr
to
commit 0 isTransactional 1
3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x23c8ce0
3497.718857172:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0
rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
x-pid="30306" x-info="http://www.rs
3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 161
3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480, len
161
3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote
161
bytes
3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has
been
processed
3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0, szlog
0
sz phys 1
3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new
sizes: log 0, phys 0
3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4, szlog
0
sz phys 0
3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
work.
3497.718942403:main thread : Terminating main queue...
3497.718946280:main thread : main Q: initiating worker thread shutdown
sequence
3497.718950170:main thread : main Q: trying shutdown of regular
workers
3497.718957067:main thread : sent SIGTTIN to worker thread 0xfc315700
3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker
thread
termination, 1 still running
3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new
sizes: log 0, phys 0
3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4, szlog
0
sz phys 0
3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker
terminateRet=5, bInactivityTOOccured=0
3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup
action
instances
3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil)
3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr
0x7f1ef40022a0
3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil)
3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil)
3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil)
3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil)
3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil)
3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil)
3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting
3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0,
terminated, num workers now 0
3497.719090631:main thread : main Q: regular queue workers shut down.
3497.719097597:main thread : main Q: checking to see if we need to
cancel any worker threads of the primary queue
3497.719101335:main thread : main Q: worker threads terminated,
remaining queue size log 0, phys 0.
3497.719111115:main thread : main Q: queue (type 0) will lose 0
messages, destroying...
3497.719501847:main Q:Reg/w0 : destructor for debug call stack
0x7f1ef40008c0 called
3497.719547524:main thread : Terminating outputs...
3497.719553777:main thread : destructing ruleset 0x23ab150, name
0x23ab190
3497.719558276:main thread : debug_relp queue: initiating worker
thread
shutdown sequence
3497.719561940:main thread : debug_relp queue: trying shutdown of
regular workers
3497.719568068:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on
worker
thread termination, 1 still running
3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to state:
rtry
3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0,
currIParam
1
3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter
loop, iRetries=0
3497.719677407:main thread : debug_relp queue:Reg: timeout waiting on
worker thread termination
3497.719677554:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719677725:main thread : debug_relp queue: regular shutdown timed
out on primary queue (this is OK)
3497.719677897:main thread : debug_relp queue: trying to shutdown
workers within Action Timeoutsetting EnqOnly mode
3497.719678191:main thread : debug_relp queue: trying immediate
shutdown
of regular workers (if any)
3497.719678362:main thread : sent SIGTTIN to worker thread 0xfbb14700
3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms on
worker thread termination, 1 still running
3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
action->tryResume returned -2007
3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check
for
max retries, iResumeRetryCount -1, iRetries 0
3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action 0x23bebf0
state: rtry, next retry (if applicable): 0 [now 1459273498]
3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop, iRet
-2007
3498.721034552:debug_relp queue:Reg/w0: regular consumer finished,
iret=-2007, szlog 0 sz phys 1
3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we deleted
1
objects and enqueued 0 objects
3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
store, new sizes: log 0, phys 0
3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating worker
because of TERMINATE_NOW mode, del iRet 0
3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker
cleanup
action instances
3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr
0x7f1ef00020d0
3498.722431811:debug_relp queue:Reg/w0: relp engine destructing client
0x7f1ef00027e0
3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr
(nil)
3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr
(nil)
3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr
(nil)
3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr
(nil)
3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr
(nil)
3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr
(nil)
3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr
(nil)
3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting
3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
thread
23c78b0, terminated, num workers now 0
3498.724243673:debug_relp queue:Reg/w0: destructor for debug call stack
0x7f1ef00008c0 called
3498.724371931:main thread : debug_relp queue: checking to see if we
need to cancel any worker threads of the primary queue
3498.724379712:main thread : debug_relp queue: worker threads
terminated, remaining queue size log 0, phys 0.
3498.724386288:main thread : debug_relp queue: persisting queue to
disk,
0 entries...
3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue)
closing
3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue) flush,
buflen 0 (no need to flush)
3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue)
closing
3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue)
closing
3498.724423956:main thread : strmCloseFile: deleting
'/var/spool/rsyslog/relp_queue.00000001'
3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages)
closing
3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 0 (no need to flush)
3498.724731690:main thread : all primary multi-thread sources have
been
terminated - now doing aux cleanup...
3498.724737248:main thread : destructing current config...
3498.724741760:main thread : calling freeCnf(0x23a91e0) for module
'builtin:omfile'
3498.724745773:main thread : calling freeCnf(0x23a93a0) for module
'builtin:ompipe'
3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module
'builtin:omfwd'
3498.724754588:main thread : calling freeCnf(0x23b9b00) for module
'imuxsock'
3498.724758956:main thread : calling freeCnf(0x23bac70) for module
'imklog'
3498.724807014:main thread : destructing parser 'rsyslog.rfc5424'
3498.724871141:main thread : destructing parser 'rsyslog.rfc3164'
3498.724876440:main thread : pmrfc3164: free parser instance 0x23aa8b0
3498.724887558:main thread : file rsyslogd.c released module 'lmnet',
reference count now 2
3498.724892004:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724895745:main thread : Unloading module builtin:omfile
3498.724900191:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724904194:main thread : Unloading module builtin:ompipe
3498.724908035:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724911707:main thread : Unloading module builtin-shell
3498.725044125:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.725050401:main thread : Unloading module builtin:omdiscard
3498.725054843:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.725058772:main thread : Unloading module builtin:omfwd
3498.725064464:main thread : file omfwd.c released module 'lmnet',
reference count now 1
3498.725068747:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725072460:main thread : Unloading module builtin:omusrmsg
3498.725076324:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725080023:main thread : Unloading module builtin:pmrfc5424
3498.725084039:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725087646:main thread : Unloading module builtin:pmrfc3164
3498.725091651:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725095255:main thread : Unloading module builtin:smfile
3498.725099010:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725102601:main thread : Unloading module builtin:smtradfile
3498.725106419:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725110003:main thread : Unloading module builtin:smfwd
3498.725113689:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725117357:main thread : Unloading module builtin:smtradfwd
3498.725121132:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725124782:main thread : Unloading module imuxsock
3498.725157459:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725162797:main thread : Unloading module imklog
3498.725179208:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.726112994:main thread : Unloading module omrelp
3498.726184405:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.726189821:main thread : module lmregexp NOT unloaded because it
still has a refcount of 1
3498.726195161:main thread : file conf.c released module 'lmnet',
reference count now 0
3498.726198620:main thread : module 'lmnet' has zero reference count,
unloading...
3498.726201936:main thread : Unloading module lmnet
3498.726218655:main thread : rsyslog runtime de-initialized, current
users 0
3498.726222660:main thread : Clean shutdown completed, bye
On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]> wrote:
Hello Reiner,
I'm seeing the same behavior with omrelp module. I've configured omrelp
to
non-routable ip address:
module(load="omrelp")
local0.* action(type="omrelp"
target="10.0.0.1"
port="2514"
name="debug_relp"
queue.filename="relp_queue"
queue.maxdiskspace="1G"
queue.saveonshutdown="on"
queue.spoolDirectory="/var/spool/rsyslog"
queue.type="Disk"
)
Then I try to log a line:
[root@314959ac8cb0 prism]# logger -p local0.info test
[root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
relp_queue.00000001
# I shutdown rsyslog at this point
[root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
# after shutdown queue is deleted and message is lost
[root@314959ac8cb0 prism]#
Here is rsyslog log, so omrelp starts transaction and then I shutdown
rsyslog, it seems at some point queue is deleted:
8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction
^C8194.245122305:main thread : DoDie called.
8194.245301525:main thread : exiting on signal 2
8194.245532709:main thread : Terminating input threads...
8194.245873683:main thread : request term via SIGTTIN for input
thread
'imuxsock' 0xcb767700
8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
0x7f9bcb767700 returned with iRet 0, exiting now.
8194.247556671:main thread : non-cancel input thread termination
succeeded for thread imuxsock 0xcb767700
8194.249357547:imuxsock.c : destructor for debug call stack
0x7f9bbc0008c0 called
8194.249784271:main thread : imuxsock: unlinking unix socket file[0]
/dev/log
8194.250043368:main thread : main Q: qqueueAdd: entry added, size now
log 1, phys 1 entries
8194.250321193:main thread : main Q: EnqueueMsg advised worker start
8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
processing
8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new sizes: log 1, phys 1
8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements must
be
processed
8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin
software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info="
http://www.rsyslog.com"] exiting on signal 2.
8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*'
8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X X X
X X X X X X FF X X X X X X X X X
8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0
8194.252902083:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F X
X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1
8194.252959849:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
8194.252967510:main Q:Reg/w0 : executing action 1
8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx
8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended state
to 0
8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*'
8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X X X
FF X X X X X X X X X X X X X X X
8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*'
8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X X X
X X X X X X X X X X X X X X X X
8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*'
8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X X FF
X X X X X X X X X X X X X X X X
8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg'
8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7 7 X
X X X X X X X X X X X X X X X X
8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*'
8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X X X
X X X X X X X X X X X X X FF X X
8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0
8194.254450651:main Q:Reg/w0 : END batch execution phase, entering to
commit phase [processed 1 of 1 messages]
8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr
to
commit 0 isTransactional 1
8194.254461646:main Q:Reg/w0 : doTransaction: have commitTransaction
IF,
using that, pWrkrInfo 0x1c13cf0
8194.254466238:main Q:Reg/w0 : entering actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
8194.254471876:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0
rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
x-pid="30298" x-info="http://www.rs
8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4(messages)
flush, buflen 161
8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream 0x7f9bc4002480,
len
161
8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write wrote
161 bytes
8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy
8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx
8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy
8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements has
been
processed
8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0,
szlog 0
sz phys 1
8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new sizes: log 0, phys 0
8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog 0
sz phys 0
8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
work.
8194.254556418:main thread : Terminating main queue...
8194.254560776:main thread : main Q: initiating worker thread
shutdown
sequence
8194.254564653:main thread : main Q: trying shutdown of regular
workers
8194.254572961:main thread : sent SIGTTIN to worker thread 0xcaf66700
8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker
thread termination, 1 still running
8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
processing
8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
new sizes: log 0, phys 0
8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog 0
sz phys 0
8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker
terminateRet=5, bInactivityTOOccured=0
8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup
action
instances
8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil)
8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr
0x7f9bc40022a0
8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil)
8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil)
8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil)
8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil)
8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil)
8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil)
8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting
8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0,
terminated, num workers now 0
8194.254656279:main thread : main Q: regular queue workers shut down.
8194.254656402:main thread : main Q: checking to see if we need to
cancel any worker threads of the primary queue
8194.254656524:main thread : main Q: worker threads terminated,
remaining queue size log 0, phys 0.
8194.254656647:main thread : main Q: queue (type 0) will lose 0
messages, destroying...
8194.255021761:main Q:Reg/w0 : destructor for debug call stack
0x7f9bc40008c0 called
8194.255069521:main thread : Terminating outputs...
8194.255084818:main thread : destructing ruleset 0x1bf6150, name
0x1bf6190
8194.255089525:main thread : debug_relp queue: initiating worker
thread
shutdown sequence
8194.255093230:main thread : debug_relp queue: trying shutdown of
regular workers
8194.255099210:main thread : sent SIGTTIN to worker thread 0xca765700
8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on
worker thread termination, 1 still running
8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to state:
rtry
8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0,
currIParam 1
8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter
loop, iRetries=0
8194.255210939:main thread : debug_relp queue:Reg: timeout waiting on
worker thread termination
8194.255211086:main thread : sent SIGTTIN to worker thread 0xca765700
8194.255211257:main thread : debug_relp queue: regular shutdown timed
out on primary queue (this is OK)
8194.255211429:main thread : debug_relp queue: trying to shutdown
workers within Action Timeoutsetting EnqOnly mode
8194.255215492:main thread : debug_relp queue: trying immediate
shutdown of regular workers (if any)
8194.255221059:main thread : sent SIGTTIN to worker thread 0xca765700
8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms on
worker thread termination, 1 still running
8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
action->tryResume returned -2007
8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check
for max retries, iResumeRetryCount 0, iRetries 0
8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to state:
susp
8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action
'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016
8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not submit
message - queue does (no longer?) exist - ignored
8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp' suspended,
earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration 30
8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action
0x1c09af0
state: susp, next retry (if applicable): 1459208224 [now 1459208194]
8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop,
iRet
-2123
8194.255282409:debug_relp queue:Reg/w0: regular consumer finished,
iret=-2123, szlog 0 sz phys 1
8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
deleted 1
objects and enqueued 0 objects
8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch from
store, new sizes: log 0, phys 0
8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating
worker
because of TERMINATE_NOW mode, del iRet 0
8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker
cleanup action instances
8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr
0x7f9bc00020d0
8194.255305929:debug_relp queue:Reg/w0: relp engine destructing client
0x7f9bc00027e0
8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr
(nil)
8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr
(nil)
8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr
(nil)
8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr
(nil)
8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr
(nil)
8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr
(nil)
8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr
(nil)
8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker exiting
8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
thread 1c12880, terminated, num workers now 0
8194.255344540:debug_relp queue:Reg/w0: destructor for debug call stack
0x7f9bc00008c0 called
8194.255371534:main thread : debug_relp queue: checking to see if we
need to cancel any worker threads of the primary queue
8194.255376798:main thread : debug_relp queue: worker threads
terminated, remaining queue size log 0, phys 0.
8194.255381256:main thread : debug_relp queue: persisting queue to
disk, 0 entries...
8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue)
closing
8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue)
flush,
buflen 0 (no need to flush)
8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue)
closing
8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue)
closing
8194.255409145:main thread : strmCloseFile: deleting
'/var/spool/rsyslog/relp_queue.00000001'
8194.255460812:main thread : strm 0x7f9bc4002480: file 4(messages)
closing
8194.255466309:main thread : strm 0x7f9bc4002480: file 4(messages)
flush, buflen 0 (no need to flush)
8194.255479325:main thread : all primary multi-thread sources have
been
terminated - now doing aux cleanup...
8194.255482809:main thread : destructing current config...
8194.255486662:main thread : calling freeCnf(0x1bf41e0) for module
'builtin:omfile'
8194.255490098:main thread : calling freeCnf(0x1bf43a0) for module
'builtin:ompipe'
8194.255493809:main thread : calling freeCnf(0x1bf53a0) for module
'builtin:omfwd'
8194.255497793:main thread : calling freeCnf(0x1c04b00) for module
'imuxsock'
8194.255501565:main thread : calling freeCnf(0x1c05c70) for module
'imklog'
8194.255543579:main thread : destructing parser 'rsyslog.rfc5424'
8194.255547468:main thread : destructing parser 'rsyslog.rfc3164'
8194.255550784:main thread : pmrfc3164: free parser instance
0x1bf58b0
8194.255559659:main thread : file rsyslogd.c released module 'lmnet',
reference count now 2
8194.255563470:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255566676:main thread : Unloading module builtin:omfile
8194.255570420:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255573674:main thread : Unloading module builtin:ompipe
8194.255576857:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255579938:main thread : Unloading module builtin-shell
8194.255583060:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255586144:main thread : Unloading module builtin:omdiscard
8194.255589509:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255592658:main thread : Unloading module builtin:omfwd
8194.255596441:main thread : file omfwd.c released module 'lmnet',
reference count now 1
8194.255599981:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255603068:main thread : Unloading module builtin:omusrmsg
8194.255606240:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255609287:main thread : Unloading module builtin:pmrfc5424
8194.255612755:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255615826:main thread : Unloading module builtin:pmrfc3164
8194.255619252:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255622315:main thread : Unloading module builtin:smfile
8194.255625406:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255628486:main thread : Unloading module builtin:smtradfile
8194.255631753:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255634819:main thread : Unloading module builtin:smfwd
8194.255637916:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255640987:main thread : Unloading module builtin:smtradfwd
8194.255644187:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255647267:main thread : Unloading module imuxsock
8194.255676232:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255680751:main thread : Unloading module imklog
8194.255694428:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255698085:main thread : Unloading module omrelp
8194.255750953:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255755236:main thread : module lmregexp NOT unloaded because it
still has a refcount of 1
8194.255759705:main thread : file conf.c released module 'lmnet',
reference count now 0
8194.255762875:main thread : module 'lmnet' has zero reference count,
unloading...
8194.255765901:main thread : Unloading module lmnet
8194.255780970:main thread : rsyslog runtime de-initialized, current
users 0
8194.255784704:main thread : Clean shutdown completed, bye
On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote:
On Sun, 13 Mar 2016, Kane Kim wrote:
+1to David, having module handle the errors is not 100% reliable, what
if
module dies before having a chance to save error log, those messages
will
be lost.
well, the module is a thread of rsyslog, so if it dies, rsyslog has
died.
But if you are doing large batches (hundreds or thousands), putting all
of them in a 'bad messages' file because one has a problem seems like
the
wrong thing to do.
given that the error is not detected until the batch is being flushed
for
the output, any outputs that have already flushed their batches can't
be
changed, so the question is if it's worse to have messages resent to
those
outputs as the batch gets re-processed, or having to have the output
go to
a "couldn't deliver" file.
It's bad enough if the problem is a corrupt message that can never be
delivered and the error is a permanent one, but if the problem is
instead a
temporary error (server is down), you will have a LOT of messages going
into this 'couldn't deliver' bucket, when what should really be
happening
is that rsyslog should be pausing in the processing of messages,
retrying
after the suspended timeout.
David Lang
On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:
On Sun, 13 Mar 2016, Rainer Gerhards wrote:
Rainer, can you explain what happens when a batch fails on an action?
It depends on when what happens and if we have batches of one or
batches of many. For batches of one, the failure is immediately
detected and acted upon. For batches of many,all action strings or
stored temporarily. During that time, it is assume that the messages
could be processed. There is no way around this assumption, because
we
now run each action completely through the rule engine before the
next
one is begun to process, which means that other parts of the config
have actually acted on a message, which means we cannot undo that
state. If is needed to act on the "real" state, we need to process
batches of one. There simply is no other solution possible. So let's
stay with batches of many. Once the batch is completed, the
temporary
strings are passed to the output's commitTransaction entry point. It
is currently assumed that the output does inteligently handle the
case, and as it looks most do this well (no regression tests failed,
no bad reports since we introduced this with the initial v8 version
(15+ month ago). If the action fails, status is reflected
accordingly.
Also, the core engine *should* (but obviously does not support this
right now) write an error file with the messages in error (just like
omelasticsearch does, which has proven to be quite good).
well, it's not in the specs that modules should have local error
files
the
way omelasticsearch does.
What I would expect if there is a problem when commiting the batch is
that
the entire batch would be marked as not delivered and try again with
half
of it.
This will cause messages to other outputs working from the same
queue to
get duplciate messages, but it's better than loosing an entire batch
of
messages because one is corrupt in some way. or there is a temporary
problem with the output.
David Lang
I have managed to look at least briefly at the commitTransaction core
engine code. I think we can even improve it and may get some ability
to do retries if the action itself does not handle that. At the
minimum, it should have the ability (which was intended originally,
but pushed away by more urgent work) to write that error file.
I am very grateful that Kane is working on a repro with the standard
modules. Once we have that, I can go deeper into the code and check
what I am thinking about. It might be possible that we cannot manage
to repro this with the standard modules. Then we need to think
about a
way via the testing module to do the same thing.
I will try to address this within the 8.18 timeframe, but cannot
totally commit to it. There has a lot of work been going on, and
especially all that CI testing has taken up lots of works.
Nevertheless, I've been able to iron out quite some nits, so it's
probably worth it. I would still like to continue to get the CI
environment into better shape, because now I know what I am doing.
If
I stop again and let it go for a couple of weeks, I would lose a lot
of momentum. I have also do to some work for support customers,
which
keeps me able to work on rsyslog in general, so this, too, has
priority (and validly has so because without support customers I
would
be totally unable to work on rsyslog other than as a hobby)
I hope this explains the current state.
Rainer
Kane, can you try and duplicate your problem using standard modules?
I'm
thinking that if you define a omrelp action that doesn't go
anywhere,
it
should return errors similar to what you are trying to do, so you
should
be
able to duplicate the problem that way.
David Lang
On Tue, 23 Feb 2016, Kane Kim wrote:
Looking at omkafka module source code it seems that it relies on
rsyslog
retries in DoAction, returning RS_RET_SUSPENDED:
DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
if(iRet != RS_RET_OK) {
iRet = RS_RET_SUSPENDED;
}
I've tried similar code in DoAction, it seems that action
processing
is suspended at that point and messages are not retried to module.
On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]>
wrote:
Thanks for your help Rainer, I'll try to debug what's going on, so
far
it
seems rsyslog doesn't retry even with batch size 1. It doesn't
retry
if I
return RS_RET_SUSPENDED from DoAction as well.
On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
<[email protected]
wrote:
2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
On Tue, 23 Feb 2016, Rainer Gerhards wrote:
2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:
Hello Rainer, thanks for the prompt reply! To give you some
context:
I
want
to write module that both using batching and also can't loose
messages in
any circumstances. Are you saying it is by design that rsyslog
can't
do
that together? According to documentation rsyslog will retry if
module
returns any error. Do you plan to fix this in rsyslog or
update
documentation to say batching and retries don't work?
It depends on many things. In almost all cases, the retry
should
work
well (and does so in practice). Unfortunately, I am pretty
swamped.
I
need to go to a conference tomorrow and have had quite some
unexpected
work today. It would probably be good if you could ping me
next
week
to see if we can look into more details what is causing you
pain.
But
I can't guarantee that I will be available early next week.
In general, we cannot handle a fatal error here from an engine
PoV,
because everything is already processed and we do no longer
have
the
original messages. This is simply needed if you want to
process
messages one after another through the full config (a goal
for v8
that
was muuuuch requested). As I said, the solution is to use
batches of
one, because otherwise we would really need to turn back time
and
undo
everything that was already done on the messages in question
by
other
modules (including state advances).
I thought that if a batch failed, it pushed all the messages
back
on
the
queue and retried with a half size batch until it got to the
one
message
that could not be processed and only did a fatal fail on that
message.
Now, there is a big difference between a module giving a hard
error
"this
message is never going to be able to be processed no matter how
many
times
it's retried" vs a soft error "there is a problem delivering
things
to
this
destination right now, retry later". I thought the batch
processing
handled
these differently.
That's no longer possible with v8, at least generically. As I
said
above, we would need to turn back time.
But I really run out of time now...
Rainer
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
by a
myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by
a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________