2016-04-21 3:22 GMT+02:00 David Lang <[email protected]>:
Thanks, I didn't know that you had been able to take any time to think
about
this issue yet.
Sorry for not being more explicit about it. But there was not yet
anything that could be posted. I did this as a side-activity all since
this opened. The plan acutally is:
1. get a testbench test to expose that problem -- this greatly eases
debugging
line of thought is to update omfile, which is needed anyhow, so
that we can induce errors via it
2. add missing functionality to write error file
this does not directly address the issue, but is useful and
ensures we are on the right path
3. check the end transaction endpoint. After review of old notes and
some thinking, I think that
part of the "not yet implemented" comment is actually right. It's
not as easy as suggested by it,
but we probably can do better.
It sounds like you are tenitively agreeing with my speculation that the
end
of the block processing is happening just a little sooner than it should,
preventing the chance to detect output errors.
No, I don't think so. I think it happens as it should, but we are not
handling the full situation. Originally, it was planned to write an
error file (point 2 above) when we cannot commit at end of
transaction. But we probably can give it some retries at that point as
well. In v8, we **need** to buffer transactional data until the end of
batch. If we wouldn't do that, we would always need to commit after
each and every transaction (because we need processing "completed"
when we advance to the next action). The, at end of batch, the
transaction commit processing happens. Right now, we have one try, and
then give up (unfortunately without any error indication). So we need
to go into that part of processing and enhance/fix it.
I hope this clarifies.
Rainer
David Lang
On Wed, 20 Apr 2016, Rainer Gerhards wrote:
I should also say that it had a release goal, so it is on my active
to-do
list.
Sent from phone, thus brief.
Am 20.04.2016 19:40 schrieb "Rainer Gerhards" <[email protected]
:
Yup... Look aty work Log - still no time... Was meant for last release
but
then came libfastjson work.
Sent from phone, thus brief.
Am 20.04.2016 19:39 schrieb "David Lang" <[email protected]>:
Rainer, ping on this issue.
David Lang
On Tue, 29 Mar 2016, Kane Kim wrote:
Date: Tue, 29 Mar 2016 10:46:49 -0700
From: Kane Kim <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] retry if output module returns
RS_RET_SUSPENDED
Updated config to:
module(load="omrelp")
local0.* action(type="omrelp"
target="10.0.0.1"
port="2514"
name="debug_relp"
queue.filename="relp_queue"
queue.maxdiskspace="1G"
queue.saveonshutdown="on"
queue.spoolDirectory="/var/spool/rsyslog"
queue.type="Disk"
action.resumeRetryCount="-1"
action.resumeInterval="1"
)
Still message is lost:
3489.658349703:imuxsock.c : Message from UNIX socket: #3
3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size
now
log 1, phys 1 entries
3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker
start
3489.658374985:imuxsock.c : --------imuxsock calling select,
active
file descriptors (max 3): 3
3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new
sizes: log 1, phys 1
3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements
must
be
processed
3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29
17:44:49 logger: test
3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*'
3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X FF X X X X X X X X X
3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658414686:main Q:Reg/w0 : ACTION 0
[omrelp:action(type="omrelp"
...)]
3489.658417648:main Q:Reg/w0 : executing action 0
3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging
to
omrelp (susp 0/0, direct q 0)
3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue)
flush,
buflen 410
3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len
410
3489.658484243:main Q:Reg/w0 : file
'/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384
3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file
'/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5
3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote
410
bytes
3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410
octets
to
disk, queue disk size now 410 octets, EnqOnly:0
3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry
added,
size now log 1, phys 1 entries
3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity -
starting 1 additional worker thread(s).
3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with
state
0,
num workers now 1
3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised
worker
start
3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended
state
to
0
3489.658533672:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1
3489.658553393:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3489.658556301:main Q:Reg/w0 : executing action 1
3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended
state
to 0
3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X
X
FF
X X X X X X X X X X X X X X X
3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*'
3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X
X X
X X X X X X X X X X X X X X X
3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*'
3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X
FF X
X X X X X X X X X X X X X X X
3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg'
3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7
X X
X X X X X X X X X X X X X X X
3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*'
3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X X X X X X X X FF X X
3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0
3489.658675686:main Q:Reg/w0 : END batch execution phase, entering
to
commit phase [processed 1 of 1 messages]
3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1,
nbr
to
commit 0 isTransactional 1
3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction
IF,
using that, pWrkrInfo 0x23c8ce0
3489.658680668:main Q:Reg/w0 : entering
actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0
logger:
test
3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 42
3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480,
len 42
3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write
wrote
42
bytes
3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx
3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has
been
processed
3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0,
szlog
0
sz phys 1
3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new
sizes: log 0, phys 0
3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog
0
sz phys 0
3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting
for
work.
3489.660013052:7f1efbb14700: thread created, tid 30310, name
'rs:debug_relp
queue:Reg'
3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker
starting
3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
deleted
0
objects and enqueued 0 objects
3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
from
store, new sizes: log 1, phys 1
3489.660947418:debug_relp queue:Reg/w0: file
'/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384
3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file
'/var/spool/rsyslog/relp_queue.00000001' for READ as 6
3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read
410
bytes
3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is
transactional - executing in commit phase
3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to
create
a
new action worker instance for action 0
3489.661789583:debug_relp queue:Reg/w0: relp engine created new
client
0x7f1ef00027e0
3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action
worker instance 1 for action 0
3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction
^C3497.710058579:main thread : DoDie called.
3497.710272055:main thread : exiting on signal 2
3497.710576430:main thread : Terminating input threads...
3497.710765753:main thread : request term via SIGTTIN for input
thread
'imuxsock' 0xfcb16700
3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
0x7f1efcb16700 returned with iRet 0, exiting now.
3497.711447372:main thread : non-cancel input thread termination
succeeded for thread imuxsock 0xfcb16700
3497.713028584:imuxsock.c : destructor for debug call stack
0x7f1eec0008c0 called
3497.713390963:main thread : imuxsock: unlinking unix socket
file[0]
/dev/log
3497.713734513:main thread : main Q: qqueueAdd: entry added, size
now
log 1, phys 1 entries
3497.713906635:main thread : main Q: EnqueueMsg advised worker
start
3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new
sizes: log 1, phys 1
3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements
must
be
processed
3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin
software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info="
http://www.rsyslog.com"] exiting on signal 2.
3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*'
3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X FF X X X X X X X X X
3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716337713:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
X X
7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1
3497.716429241:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
3497.716439928:main Q:Reg/w0 : executing action 1
3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to
builtin:omfile (susp 0/0, direct q 1)
3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional -
executing in commit phase
3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended
state
to 0
3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*'
3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X
X
FF
X X X X X X X X X X X X X X X
3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0
3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*'
3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X
X X
X X X X X X X X X X X X X X X
3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*'
3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X
FF X
X X X X X X X X X X X X X X X
3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg'
3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0
3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7
X X
X X X X X X X X X X X X X X X
3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*'
3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X
X X
X X X X X X X X X X X X FF X X
3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0
3497.718842803:main Q:Reg/w0 : END batch execution phase, entering
to
commit phase [processed 1 of 1 messages]
3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1,
nbr
to
commit 0 isTransactional 1
3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction
IF,
using that, pWrkrInfo 0x23c8ce0
3497.718857172:main Q:Reg/w0 : entering
actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0
rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
x-pid="30306" x-info="http://www.rs
3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 161
3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480,
len
161
3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write
wrote
161
bytes
3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx
3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy
3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has
been
processed
3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0,
szlog
0
sz phys 1
3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new
sizes: log 0, phys 0
3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog
0
sz phys 0
3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting
for
work.
3497.718942403:main thread : Terminating main queue...
3497.718946280:main thread : main Q: initiating worker thread
shutdown
sequence
3497.718950170:main thread : main Q: trying shutdown of regular
workers
3497.718957067:main thread : sent SIGTTIN to worker thread
0xfc315700
3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker
thread
termination, 1 still running
3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
processing
3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new
sizes: log 0, phys 0
3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog
0
sz phys 0
3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker
terminateRet=5, bInactivityTOOccured=0
3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup
action
instances
3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil)
3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr
0x7f1ef40022a0
3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil)
3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil)
3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil)
3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil)
3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil)
3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil)
3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting
3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0,
terminated, num workers now 0
3497.719090631:main thread : main Q: regular queue workers shut
down.
3497.719097597:main thread : main Q: checking to see if we need to
cancel any worker threads of the primary queue
3497.719101335:main thread : main Q: worker threads terminated,
remaining queue size log 0, phys 0.
3497.719111115:main thread : main Q: queue (type 0) will lose 0
messages, destroying...
3497.719501847:main Q:Reg/w0 : destructor for debug call stack
0x7f1ef40008c0 called
3497.719547524:main thread : Terminating outputs...
3497.719553777:main thread : destructing ruleset 0x23ab150, name
0x23ab190
3497.719558276:main thread : debug_relp queue: initiating worker
thread
shutdown sequence
3497.719561940:main thread : debug_relp queue: trying shutdown of
regular workers
3497.719568068:main thread : sent SIGTTIN to worker thread
0xfbb14700
3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on
worker
thread termination, 1 still running
3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet
10014
3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to
state:
rtry
3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0,
currIParam
1
3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
enter
loop, iRetries=0
3497.719677407:main thread : debug_relp queue:Reg: timeout waiting
on
worker thread termination
3497.719677554:main thread : sent SIGTTIN to worker thread
0xfbb14700
3497.719677725:main thread : debug_relp queue: regular shutdown
timed
out on primary queue (this is OK)
3497.719677897:main thread : debug_relp queue: trying to shutdown
workers within Action Timeoutsetting EnqOnly mode
3497.719678191:main thread : debug_relp queue: trying immediate
shutdown
of regular workers (if any)
3497.719678362:main thread : sent SIGTTIN to worker thread
0xfbb14700
3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms
on
worker thread termination, 1 still running
3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet
10014
3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
action->tryResume returned -2007
3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
check
for
max retries, iResumeRetryCount -1, iRetries 0
3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action
0x23bebf0
state: rtry, next retry (if applicable): 0 [now 1459273498]
3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop,
iRet
-2007
3498.721034552:debug_relp queue:Reg/w0: regular consumer finished,
iret=-2007, szlog 0 sz phys 1
3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
deleted
1
objects and enqueued 0 objects
3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
from
store, new sizes: log 0, phys 0
3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating
worker
because of TERMINATE_NOW mode, del iRet 0
3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker
cleanup
action instances
3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr
0x7f1ef00020d0
3498.722431811:debug_relp queue:Reg/w0: relp engine destructing
client
0x7f1ef00027e0
3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr
(nil)
3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr
(nil)
3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr
(nil)
3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr
(nil)
3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr
(nil)
3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr
(nil)
3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr
(nil)
3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting
3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
thread
23c78b0, terminated, num workers now 0
3498.724243673:debug_relp queue:Reg/w0: destructor for debug call
stack
0x7f1ef00008c0 called
3498.724371931:main thread : debug_relp queue: checking to see if
we
need to cancel any worker threads of the primary queue
3498.724379712:main thread : debug_relp queue: worker threads
terminated, remaining queue size log 0, phys 0.
3498.724386288:main thread : debug_relp queue: persisting queue to
disk,
0 entries...
3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue)
closing
3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue)
flush,
buflen 0 (no need to flush)
3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue)
closing
3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue)
closing
3498.724423956:main thread : strmCloseFile: deleting
'/var/spool/rsyslog/relp_queue.00000001'
3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages)
closing
3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages)
flush, buflen 0 (no need to flush)
3498.724731690:main thread : all primary multi-thread sources have
been
terminated - now doing aux cleanup...
3498.724737248:main thread : destructing current config...
3498.724741760:main thread : calling freeCnf(0x23a91e0) for module
'builtin:omfile'
3498.724745773:main thread : calling freeCnf(0x23a93a0) for module
'builtin:ompipe'
3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module
'builtin:omfwd'
3498.724754588:main thread : calling freeCnf(0x23b9b00) for module
'imuxsock'
3498.724758956:main thread : calling freeCnf(0x23bac70) for module
'imklog'
3498.724807014:main thread : destructing parser 'rsyslog.rfc5424'
3498.724871141:main thread : destructing parser 'rsyslog.rfc3164'
3498.724876440:main thread : pmrfc3164: free parser instance
0x23aa8b0
3498.724887558:main thread : file rsyslogd.c released module
'lmnet',
reference count now 2
3498.724892004:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724895745:main thread : Unloading module builtin:omfile
3498.724900191:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724904194:main thread : Unloading module builtin:ompipe
3498.724908035:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.724911707:main thread : Unloading module builtin-shell
3498.725044125:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.725050401:main thread : Unloading module builtin:omdiscard
3498.725054843:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
3498.725058772:main thread : Unloading module builtin:omfwd
3498.725064464:main thread : file omfwd.c released module 'lmnet',
reference count now 1
3498.725068747:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725072460:main thread : Unloading module builtin:omusrmsg
3498.725076324:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725080023:main thread : Unloading module builtin:pmrfc5424
3498.725084039:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725087646:main thread : Unloading module builtin:pmrfc3164
3498.725091651:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725095255:main thread : Unloading module builtin:smfile
3498.725099010:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725102601:main thread : Unloading module builtin:smtradfile
3498.725106419:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725110003:main thread : Unloading module builtin:smfwd
3498.725113689:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725117357:main thread : Unloading module builtin:smtradfwd
3498.725121132:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725124782:main thread : Unloading module imuxsock
3498.725157459:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.725162797:main thread : Unloading module imklog
3498.725179208:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.726112994:main thread : Unloading module omrelp
3498.726184405:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
3498.726189821:main thread : module lmregexp NOT unloaded because
it
still has a refcount of 1
3498.726195161:main thread : file conf.c released module 'lmnet',
reference count now 0
3498.726198620:main thread : module 'lmnet' has zero reference
count,
unloading...
3498.726201936:main thread : Unloading module lmnet
3498.726218655:main thread : rsyslog runtime de-initialized,
current
users 0
3498.726222660:main thread : Clean shutdown completed, bye
On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]>
wrote:
Hello Reiner,
I'm seeing the same behavior with omrelp module. I've configured
omrelp
to
non-routable ip address:
module(load="omrelp")
local0.* action(type="omrelp"
target="10.0.0.1"
port="2514"
name="debug_relp"
queue.filename="relp_queue"
queue.maxdiskspace="1G"
queue.saveonshutdown="on"
queue.spoolDirectory="/var/spool/rsyslog"
queue.type="Disk"
)
Then I try to log a line:
[root@314959ac8cb0 prism]# logger -p local0.info test
[root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
relp_queue.00000001
# I shutdown rsyslog at this point
[root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
# after shutdown queue is deleted and message is lost
[root@314959ac8cb0 prism]#
Here is rsyslog log, so omrelp starts transaction and then I
shutdown
rsyslog, it seems at some point queue is deleted:
8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction
^C8194.245122305:main thread : DoDie called.
8194.245301525:main thread : exiting on signal 2
8194.245532709:main thread : Terminating input threads...
8194.245873683:main thread : request term via SIGTTIN for input
thread
'imuxsock' 0xcb767700
8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
0x7f9bcb767700 returned with iRet 0, exiting now.
8194.247556671:main thread : non-cancel input thread termination
succeeded for thread imuxsock 0xcb767700
8194.249357547:imuxsock.c : destructor for debug call stack
0x7f9bbc0008c0 called
8194.249784271:main thread : imuxsock: unlinking unix socket
file[0]
/dev/log
8194.250043368:main thread : main Q: qqueueAdd: entry added, size
now
log 1, phys 1 entries
8194.250321193:main thread : main Q: EnqueueMsg advised worker
start
8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from
idle
processing
8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new sizes: log 1, phys 1
8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements
must
be
processed
8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin
software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info="
http://www.rsyslog.com"] exiting on signal 2.
8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*'
8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X
X
X
X X X X X X FF X X X X X X X X X
8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0
8194.252902083:main Q:Reg/w0 : PRIFILT
'*.info;mail.none;authpriv.none;cron.none'
8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F
7F
X
X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1
8194.252959849:main Q:Reg/w0 : ACTION 1
[builtin:omfile:/var/log/messages]
8194.252967510:main Q:Reg/w0 : executing action 1
8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging
to
builtin:omfile (susp 0/0, direct q 1)
8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional
-
executing in commit phase
8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx
8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended
state
to 0
8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*'
8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X
X
X
FF X X X X X X X X X X X X X X X
8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*'
8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X
X
X
X X X X X X X X X X X X X X X X
8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*'
8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X
X
FF
X X X X X X X X X X X X X X X X
8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg'
8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7
7
X
X X X X X X X X X X X X X X X X
8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0
8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*'
8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X
X
X
X X X X X X X X X X X X X FF X X
8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0
8194.254450651:main Q:Reg/w0 : END batch execution phase, entering
to
commit phase [processed 1 of 1 messages]
8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1,
nbr
to
commit 0 isTransactional 1
8194.254461646:main Q:Reg/w0 : doTransaction: have
commitTransaction
IF,
using that, pWrkrInfo 0x1c13cf0
8194.254466238:main Q:Reg/w0 : entering
actionCallCommitTransaction(),
state: itx, actionNbr 1, nMsgs 1
8194.254471876:main Q:Reg/w0 : omfile: write to stream,
pData->pStrm
0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0
rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
x-pid="30298" x-info="http://www.rs
8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file
4(messages)
flush, buflen 161
8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream
0x7f9bc4002480,
len
161
8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write
wrote
161 bytes
8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy
8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx
8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy
8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements
has
been
processed
8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0,
szlog 0
sz phys 1
8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
objects
and enqueued 0 objects
8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new sizes: log 0, phys 0
8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog 0
sz phys 0
8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting
for
work.
8194.254556418:main thread : Terminating main queue...
8194.254560776:main thread : main Q: initiating worker thread
shutdown
sequence
8194.254564653:main thread : main Q: trying shutdown of regular
workers
8194.254572961:main thread : sent SIGTTIN to worker thread
0xcaf66700
8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker
thread termination, 1 still running
8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from
idle
processing
8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
objects
and enqueued 0 objects
8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from
store,
new sizes: log 0, phys 0
8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4,
szlog 0
sz phys 0
8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker
terminateRet=5, bInactivityTOOccured=0
8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup
action
instances
8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil)
8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr
0x7f9bc40022a0
8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil)
8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil)
8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil)
8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil)
8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil)
8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil)
8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting
8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0,
terminated, num workers now 0
8194.254656279:main thread : main Q: regular queue workers shut
down.
8194.254656402:main thread : main Q: checking to see if we need
to
cancel any worker threads of the primary queue
8194.254656524:main thread : main Q: worker threads terminated,
remaining queue size log 0, phys 0.
8194.254656647:main thread : main Q: queue (type 0) will lose 0
messages, destroying...
8194.255021761:main Q:Reg/w0 : destructor for debug call stack
0x7f9bc40008c0 called
8194.255069521:main thread : Terminating outputs...
8194.255084818:main thread : destructing ruleset 0x1bf6150, name
0x1bf6190
8194.255089525:main thread : debug_relp queue: initiating worker
thread
shutdown sequence
8194.255093230:main thread : debug_relp queue: trying shutdown of
regular workers
8194.255099210:main thread : sent SIGTTIN to worker thread
0xca765700
8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on
worker thread termination, 1 still running
8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet
10014
8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to
state:
rtry
8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0,
currIParam 1
8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
enter
loop, iRetries=0
8194.255210939:main thread : debug_relp queue:Reg: timeout
waiting
on
worker thread termination
8194.255211086:main thread : sent SIGTTIN to worker thread
0xca765700
8194.255211257:main thread : debug_relp queue: regular shutdown
timed
out on primary queue (this is OK)
8194.255211429:main thread : debug_relp queue: trying to shutdown
workers within Action Timeoutsetting EnqOnly mode
8194.255215492:main thread : debug_relp queue: trying immediate
shutdown of regular workers (if any)
8194.255221059:main thread : sent SIGTTIN to worker thread
0xca765700
8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms
on
worker thread termination, 1 still running
8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet
10014
8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
action->tryResume returned -2007
8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
check
for max retries, iResumeRetryCount 0, iRetries 0
8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to
state:
susp
8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action
'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016
8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not
submit
message - queue does (no longer?) exist - ignored
8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp'
suspended,
earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration
30
8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action
0x1c09af0
state: susp, next retry (if applicable): 1459208224 [now 1459208194]
8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop,
iRet
-2123
8194.255282409:debug_relp queue:Reg/w0: regular consumer finished,
iret=-2123, szlog 0 sz phys 1
8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
deleted 1
objects and enqueued 0 objects
8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
from
store, new sizes: log 0, phys 0
8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating
worker
because of TERMINATE_NOW mode, del iRet 0
8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker
cleanup action instances
8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr
0x7f9bc00020d0
8194.255305929:debug_relp queue:Reg/w0: relp engine destructing
client
0x7f9bc00027e0
8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr
(nil)
8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr
(nil)
8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr
(nil)
8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr
(nil)
8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr
(nil)
8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr
(nil)
8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr
(nil)
8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker
exiting
8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
thread 1c12880, terminated, num workers now 0
8194.255344540:debug_relp queue:Reg/w0: destructor for debug call
stack
0x7f9bc00008c0 called
8194.255371534:main thread : debug_relp queue: checking to see if
we
need to cancel any worker threads of the primary queue
8194.255376798:main thread : debug_relp queue: worker threads
terminated, remaining queue size log 0, phys 0.
8194.255381256:main thread : debug_relp queue: persisting queue
to
disk, 0 entries...
8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue)
closing
8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue)
flush,
buflen 0 (no need to flush)
8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue)
closing
8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue)
closing
8194.255409145:main thread : strmCloseFile: deleting
'/var/spool/rsyslog/relp_queue.00000001'
8194.255460812:main thread : strm 0x7f9bc4002480: file
4(messages)
closing
8194.255466309:main thread : strm 0x7f9bc4002480: file
4(messages)
flush, buflen 0 (no need to flush)
8194.255479325:main thread : all primary multi-thread sources
have
been
terminated - now doing aux cleanup...
8194.255482809:main thread : destructing current config...
8194.255486662:main thread : calling freeCnf(0x1bf41e0) for
module
'builtin:omfile'
8194.255490098:main thread : calling freeCnf(0x1bf43a0) for
module
'builtin:ompipe'
8194.255493809:main thread : calling freeCnf(0x1bf53a0) for
module
'builtin:omfwd'
8194.255497793:main thread : calling freeCnf(0x1c04b00) for
module
'imuxsock'
8194.255501565:main thread : calling freeCnf(0x1c05c70) for
module
'imklog'
8194.255543579:main thread : destructing parser 'rsyslog.rfc5424'
8194.255547468:main thread : destructing parser 'rsyslog.rfc3164'
8194.255550784:main thread : pmrfc3164: free parser instance
0x1bf58b0
8194.255559659:main thread : file rsyslogd.c released module
'lmnet',
reference count now 2
8194.255563470:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255566676:main thread : Unloading module builtin:omfile
8194.255570420:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255573674:main thread : Unloading module builtin:ompipe
8194.255576857:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255579938:main thread : Unloading module builtin-shell
8194.255583060:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255586144:main thread : Unloading module builtin:omdiscard
8194.255589509:main thread : module lmnet NOT unloaded because it
still
has a refcount of 2
8194.255592658:main thread : Unloading module builtin:omfwd
8194.255596441:main thread : file omfwd.c released module
'lmnet',
reference count now 1
8194.255599981:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255603068:main thread : Unloading module builtin:omusrmsg
8194.255606240:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255609287:main thread : Unloading module builtin:pmrfc5424
8194.255612755:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255615826:main thread : Unloading module builtin:pmrfc3164
8194.255619252:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255622315:main thread : Unloading module builtin:smfile
8194.255625406:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255628486:main thread : Unloading module builtin:smtradfile
8194.255631753:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255634819:main thread : Unloading module builtin:smfwd
8194.255637916:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255640987:main thread : Unloading module builtin:smtradfwd
8194.255644187:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255647267:main thread : Unloading module imuxsock
8194.255676232:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255680751:main thread : Unloading module imklog
8194.255694428:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255698085:main thread : Unloading module omrelp
8194.255750953:main thread : module lmnet NOT unloaded because it
still
has a refcount of 1
8194.255755236:main thread : module lmregexp NOT unloaded because
it
still has a refcount of 1
8194.255759705:main thread : file conf.c released module 'lmnet',
reference count now 0
8194.255762875:main thread : module 'lmnet' has zero reference
count,
unloading...
8194.255765901:main thread : Unloading module lmnet
8194.255780970:main thread : rsyslog runtime de-initialized,
current
users 0
8194.255784704:main thread : Clean shutdown completed, bye
On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote:
On Sun, 13 Mar 2016, Kane Kim wrote:
+1to David, having module handle the errors is not 100% reliable,
what
if
module dies before having a chance to save error log, those
messages
will
be lost.
well, the module is a thread of rsyslog, so if it dies, rsyslog has
died.
But if you are doing large batches (hundreds or thousands), putting
all
of them in a 'bad messages' file because one has a problem seems
like
the
wrong thing to do.
given that the error is not detected until the batch is being
flushed
for
the output, any outputs that have already flushed their batches
can't
be
changed, so the question is if it's worse to have messages resent
to
those
outputs as the batch gets re-processed, or having to have the
output
go to
a "couldn't deliver" file.
It's bad enough if the problem is a corrupt message that can never
be
delivered and the error is a permanent one, but if the problem is
instead a
temporary error (server is down), you will have a LOT of messages
going
into this 'couldn't deliver' bucket, when what should really be
happening
is that rsyslog should be pausing in the processing of messages,
retrying
after the suspended timeout.
David Lang
On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:
On Sun, 13 Mar 2016, Rainer Gerhards wrote:
Rainer, can you explain what happens when a batch fails on an
action?
It depends on when what happens and if we have batches of one
or
batches of many. For batches of one, the failure is immediately
detected and acted upon. For batches of many,all action strings
or
stored temporarily. During that time, it is assume that the
messages
could be processed. There is no way around this assumption,
because
we
now run each action completely through the rule engine before
the
next
one is begun to process, which means that other parts of the
config
have actually acted on a message, which means we cannot undo
that
state. If is needed to act on the "real" state, we need to
process
batches of one. There simply is no other solution possible. So
let's
stay with batches of many. Once the batch is completed, the
temporary
strings are passed to the output's commitTransaction entry
point.
It
is currently assumed that the output does inteligently handle
the
case, and as it looks most do this well (no regression tests
failed,
no bad reports since we introduced this with the initial v8
version
(15+ month ago). If the action fails, status is reflected
accordingly.
Also, the core engine *should* (but obviously does not support
this
right now) write an error file with the messages in error (just
like
omelasticsearch does, which has proven to be quite good).
well, it's not in the specs that modules should have local error
files
the
way omelasticsearch does.
What I would expect if there is a problem when commiting the
batch
is
that
the entire batch would be marked as not delivered and try again
with
half
of it.
This will cause messages to other outputs working from the same
queue to
get duplciate messages, but it's better than loosing an entire
batch
of
messages because one is corrupt in some way. or there is a
temporary
problem with the output.
David Lang
I have managed to look at least briefly at the commitTransaction
core
engine code. I think we can even improve it and may get some
ability
to do retries if the action itself does not handle that. At the
minimum, it should have the ability (which was intended
originally,
but pushed away by more urgent work) to write that error file.
I am very grateful that Kane is working on a repro with the
standard
modules. Once we have that, I can go deeper into the code and
check
what I am thinking about. It might be possible that we cannot
manage
to repro this with the standard modules. Then we need to think
about a
way via the testing module to do the same thing.
I will try to address this within the 8.18 timeframe, but cannot
totally commit to it. There has a lot of work been going on, and
especially all that CI testing has taken up lots of works.
Nevertheless, I've been able to iron out quite some nits, so
it's
probably worth it. I would still like to continue to get the CI
environment into better shape, because now I know what I am
doing.
If
I stop again and let it go for a couple of weeks, I would lose a
lot
of momentum. I have also do to some work for support customers,
which
keeps me able to work on rsyslog in general, so this, too, has
priority (and validly has so because without support customers I
would
be totally unable to work on rsyslog other than as a hobby)
I hope this explains the current state.
Rainer
Kane, can you try and duplicate your problem using standard
modules?
I'm
thinking that if you define a omrelp action that doesn't go
anywhere,
it
should return errors similar to what you are trying to do, so
you
should
be
able to duplicate the problem that way.
David Lang
On Tue, 23 Feb 2016, Kane Kim wrote:
Looking at omkafka module source code it seems that it relies
on
rsyslog
retries in DoAction, returning RS_RET_SUSPENDED:
DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
if(iRet != RS_RET_OK) {
iRet = RS_RET_SUSPENDED;
}
I've tried similar code in DoAction, it seems that action
processing
is suspended at that point and messages are not retried to
module.
On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim
<[email protected]>
wrote:
Thanks for your help Rainer, I'll try to debug what's going
on,
so
far
it
seems rsyslog doesn't retry even with batch size 1. It
doesn't
retry
if I
return RS_RET_SUSPENDED from DoAction as well.
On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
<[email protected]
wrote:
2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
On Tue, 23 Feb 2016, Rainer Gerhards wrote:
2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]
:
Hello Rainer, thanks for the prompt reply! To give you
some
context:
I
want
to write module that both using batching and also can't
loose
messages in
any circumstances. Are you saying it is by design that
rsyslog
can't
do
that together? According to documentation rsyslog will retry
if
module
returns any error. Do you plan to fix this in rsyslog or
update
documentation to say batching and retries don't work?
It depends on many things. In almost all cases, the retry
should
work
well (and does so in practice). Unfortunately, I am pretty
swamped.
I
need to go to a conference tomorrow and have had quite
some
unexpected
work today. It would probably be good if you could ping me
next
week
to see if we can look into more details what is causing
you
pain.
But
I can't guarantee that I will be available early next
week.
In general, we cannot handle a fatal error here from an
engine
PoV,
because everything is already processed and we do no
longer
have
the
original messages. This is simply needed if you want to
process
messages one after another through the full config (a goal
for v8
that
was muuuuch requested). As I said, the solution is to use
batches of
one, because otherwise we would really need to turn back
time
and
undo
everything that was already done on the messages in
question
by
other
modules (including state advances).
I thought that if a batch failed, it pushed all the
messages
back
on
the
queue and retried with a half size batch until it got to
the
one
message
that could not be processed and only did a fatal fail on
that
message.
Now, there is a big difference between a module giving a
hard
error
"this
message is never going to be able to be processed no matter
how
many
times
it's retried" vs a soft error "there is a problem delivering
things
to
this
destination right now, retry later". I thought the batch
processing
handled
these differently.
That's no longer possible with v8, at least generically. As
I
said
above, we would need to turn back time.
But I really run out of time now...
Rainer
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow
https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are
ARCHIVED
by a
myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow
https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
by
a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST
if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
by
a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST
if
you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
by a
myriad
of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by
a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.