2016-04-21 3:22 GMT+02:00 David Lang <[email protected]>:
> Thanks, I didn't know that you had been able to take any time to think about
> this issue yet.
Sorry for not being more explicit about it. But there was not yet
anything that could be posted. I did this as a side-activity all since
this opened. The plan acutally is:
1. get a testbench test to expose that problem -- this greatly eases debugging
line of thought is to update omfile, which is needed anyhow, so
that we can induce errors via it
2. add missing functionality to write error file
this does not directly address the issue, but is useful and
ensures we are on the right path
3. check the end transaction endpoint. After review of old notes and
some thinking, I think that
part of the "not yet implemented" comment is actually right. It's
not as easy as suggested by it,
but we probably can do better.
>
> It sounds like you are tenitively agreeing with my speculation that the end
> of the block processing is happening just a little sooner than it should,
> preventing the chance to detect output errors.
No, I don't think so. I think it happens as it should, but we are not
handling the full situation. Originally, it was planned to write an
error file (point 2 above) when we cannot commit at end of
transaction. But we probably can give it some retries at that point as
well. In v8, we **need** to buffer transactional data until the end of
batch. If we wouldn't do that, we would always need to commit after
each and every transaction (because we need processing "completed"
when we advance to the next action). The, at end of batch, the
transaction commit processing happens. Right now, we have one try, and
then give up (unfortunately without any error indication). So we need
to go into that part of processing and enhance/fix it.
I hope this clarifies.
Rainer
>
> David Lang
>
>
> On Wed, 20 Apr 2016, Rainer Gerhards wrote:
>
>> I should also say that it had a release goal, so it is on my active to-do
>> list.
>>
>> Sent from phone, thus brief.
>> Am 20.04.2016 19:40 schrieb "Rainer Gerhards" <[email protected]>:
>>
>>> Yup... Look aty work Log - still no time... Was meant for last release
>>> but
>>> then came libfastjson work.
>>>
>>> Sent from phone, thus brief.
>>> Am 20.04.2016 19:39 schrieb "David Lang" <[email protected]>:
>>>
>>>> Rainer, ping on this issue.
>>>>
>>>> David Lang
>>>>
>>>> On Tue, 29 Mar 2016, Kane Kim wrote:
>>>>
>>>> Date: Tue, 29 Mar 2016 10:46:49 -0700
>>>>>
>>>>> From: Kane Kim <[email protected]>
>>>>> Reply-To: rsyslog-users <[email protected]>
>>>>> To: rsyslog-users <[email protected]>
>>>>> Subject: Re: [rsyslog] retry if output module returns RS_RET_SUSPENDED
>>>>>
>>>>> Updated config to:
>>>>> module(load="omrelp")
>>>>> local0.* action(type="omrelp"
>>>>> target="10.0.0.1"
>>>>> port="2514"
>>>>> name="debug_relp"
>>>>> queue.filename="relp_queue"
>>>>> queue.maxdiskspace="1G"
>>>>> queue.saveonshutdown="on"
>>>>> queue.spoolDirectory="/var/spool/rsyslog"
>>>>> queue.type="Disk"
>>>>> action.resumeRetryCount="-1"
>>>>> action.resumeInterval="1"
>>>>> )
>>>>>
>>>>> Still message is lost:
>>>>>
>>>>> 3489.658349703:imuxsock.c : Message from UNIX socket: #3
>>>>> 3489.658367582:imuxsock.c : main Q: qqueueAdd: entry added, size
>>>>> now
>>>>> log 1, phys 1 entries
>>>>> 3489.658372561:imuxsock.c : main Q: EnqueueMsg advised worker start
>>>>> 3489.658374985:imuxsock.c : --------imuxsock calling select, active
>>>>> file descriptors (max 3): 3
>>>>> 3489.658384201:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
>>>>> processing
>>>>> 3489.658386792:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
>>>>> objects
>>>>> and enqueued 0 objects
>>>>> 3489.658388548:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
>>>>> new
>>>>> sizes: log 1, phys 1
>>>>> 3489.658391356:main Q:Reg/w0 : processBATCH: batch of 1 elements must
>>>>> be
>>>>> processed
>>>>> 3489.658392786:main Q:Reg/w0 : processBATCH: next msg 0: <134>Mar 29
>>>>> 17:44:49 logger: test
>>>>> 3489.658395027:main Q:Reg/w0 : PRIFILT 'local0.*'
>>>>> 3489.658397540:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X X
>>>>> X X X X X FF X X X X X X X X X
>>>>> 3489.658413235:main Q:Reg/w0 : PRIFILT condition result is 1
>>>>> 3489.658414686:main Q:Reg/w0 : ACTION 0
>>>>> [omrelp:action(type="omrelp"
>>>>> ...)]
>>>>> 3489.658417648:main Q:Reg/w0 : executing action 0
>>>>> 3489.658419520:main Q:Reg/w0 : action 'debug_relp': called, logging to
>>>>> omrelp (susp 0/0, direct q 0)
>>>>> 3489.658433462:main Q:Reg/w0 : strm 0x23b74c0: file -1(relp_queue)
>>>>> flush,
>>>>> buflen 410
>>>>> 3489.658435496:main Q:Reg/w0 : strmPhysWrite, stream 0x23b74c0, len
>>>>> 410
>>>>> 3489.658484243:main Q:Reg/w0 : file
>>>>> '/var/spool/rsyslog/relp_queue.00000001' opened as #5 with mode 384
>>>>> 3489.658487347:main Q:Reg/w0 : strm 0x23b74c0: opened file
>>>>> '/var/spool/rsyslog/relp_queue.00000001' for WRITE as 5
>>>>> 3489.658500498:main Q:Reg/w0 : strm 0x23b74c0: file 5 write wrote 410
>>>>> bytes
>>>>> 3489.658502751:main Q:Reg/w0 : debug_relp queue: write wrote 410
>>>>> octets
>>>>> to
>>>>> disk, queue disk size now 410 octets, EnqOnly:0
>>>>> 3489.658504568:main Q:Reg/w0 : debug_relp queue: qqueueAdd: entry
>>>>> added,
>>>>> size now log 1, phys 1 entries
>>>>> 3489.658506705:main Q:Reg/w0 : debug_relp queue:Reg: high activity -
>>>>> starting 1 additional worker thread(s).
>>>>> 3489.658528951:main Q:Reg/w0 : debug_relp queue:Reg: started with
>>>>> state
>>>>> 0,
>>>>> num workers now 1
>>>>> 3489.658530689:main Q:Reg/w0 : debug_relp queue: EnqueueMsg advised
>>>>> worker
>>>>> start
>>>>> 3489.658532183:main Q:Reg/w0 : action 'debug_relp': set suspended
>>>>> state
>>>>> to
>>>>> 0
>>>>> 3489.658533672:main Q:Reg/w0 : PRIFILT
>>>>> '*.info;mail.none;authpriv.none;cron.none'
>>>>> 3489.658536164:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
>>>>> X X
>>>>> 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
>>>>> 3489.658552240:main Q:Reg/w0 : PRIFILT condition result is 1
>>>>> 3489.658553393:main Q:Reg/w0 : ACTION 1
>>>>> [builtin:omfile:/var/log/messages]
>>>>> 3489.658556301:main Q:Reg/w0 : executing action 1
>>>>> 3489.658557878:main Q:Reg/w0 : action 'action 1': called, logging to
>>>>> builtin:omfile (susp 0/0, direct q 1)
>>>>> 3489.658561265:main Q:Reg/w0 : action 'action 1': is transactional -
>>>>> executing in commit phase
>>>>> 3489.658563696:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>> 3489.658564902:main Q:Reg/w0 : action 'action 1': set suspended state
>>>>> to 0
>>>>> 3489.658566053:main Q:Reg/w0 : PRIFILT 'authpriv.*'
>>>>> 3489.658568233:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X
>>>>> FF
>>>>> X X X X X X X X X X X X X X X
>>>>> 3489.658582888:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658584058:main Q:Reg/w0 : PRIFILT 'mail.*'
>>>>> 3489.658586243:main Q:Reg/w0 : pmask: X X FF X X X X X X
>>>>> X X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3489.658600961:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658602136:main Q:Reg/w0 : PRIFILT 'cron.*'
>>>>> 3489.658604562:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> FF X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3489.658619159:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658620352:main Q:Reg/w0 : PRIFILT '*.emerg'
>>>>> 3489.658622551:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
>>>>> 1 1
>>>>> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>>>> 3489.658638435:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658639529:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
>>>>> 3489.658641711:main Q:Reg/w0 : pmask: X X X X X X X 7 7
>>>>> X X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3489.658656380:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658657532:main Q:Reg/w0 : PRIFILT 'local7.*'
>>>>> 3489.658659794:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X X
>>>>> X X X X X X X X X X X X FF X X
>>>>> 3489.658674346:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3489.658675686:main Q:Reg/w0 : END batch execution phase, entering to
>>>>> commit phase [processed 1 of 1 messages]
>>>>> 3489.658677400:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr
>>>>> to
>>>>> commit 0 isTransactional 1
>>>>> 3489.658679044:main Q:Reg/w0 : doTransaction: have commitTransaction
>>>>> IF,
>>>>> using that, pWrkrInfo 0x23c8ce0
>>>>> 3489.658680668:main Q:Reg/w0 : entering actionCallCommitTransaction(),
>>>>> state: itx, actionNbr 1, nMsgs 1
>>>>> 3489.658682704:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
>>>>> 0x7f1ef4002480, lenBuf 42, strt data Mar 29 17:44:49 314959ac8cb0
>>>>> logger:
>>>>> test
>>>>>
>>>>> 3489.658685296:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
>>>>> flush, buflen 42
>>>>> 3489.658686866:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480,
>>>>> len 42
>>>>> 3489.658696046:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote
>>>>> 42
>>>>> bytes
>>>>> 3489.658697884:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>> 3489.658699301:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>> 3489.658700575:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>> 3489.658701773:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
>>>>> 3489.658703176:main Q:Reg/w0 : processBATCH: batch of 1 elements has
>>>>> been
>>>>> processed
>>>>> 3489.658704989:main Q:Reg/w0 : regular consumer finished, iret=0,
>>>>> szlog
>>>>> 0
>>>>> sz phys 1
>>>>> 3489.658707214:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
>>>>> objects
>>>>> and enqueued 0 objects
>>>>> 3489.658708626:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
>>>>> new
>>>>> sizes: log 0, phys 0
>>>>> 3489.658710186:main Q:Reg/w0 : regular consumer finished, iret=4,
>>>>> szlog
>>>>> 0
>>>>> sz phys 0
>>>>> 3489.658711550:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
>>>>> work.
>>>>> 3489.660013052:7f1efbb14700: thread created, tid 30310, name
>>>>> 'rs:debug_relp
>>>>> queue:Reg'
>>>>> 3489.660298870:debug_relp queue:Reg/w0: wti 0x23c78b0: worker starting
>>>>> 3489.660645336:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
>>>>> deleted
>>>>> 0
>>>>> objects and enqueued 0 objects
>>>>> 3489.660922752:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
>>>>> from
>>>>> store, new sizes: log 1, phys 1
>>>>> 3489.660947418:debug_relp queue:Reg/w0: file
>>>>> '/var/spool/rsyslog/relp_queue.00000001' opened as #6 with mode 384
>>>>> 3489.660959920:debug_relp queue:Reg/w0: strm 0x23b7760: opened file
>>>>> '/var/spool/rsyslog/relp_queue.00000001' for READ as 6
>>>>> 3489.660967370:debug_relp queue:Reg/w0: strm 0x23b7760: file 6 read 410
>>>>> bytes
>>>>> 3489.661768120:debug_relp queue:Reg/w0: action 'debug_relp': is
>>>>> transactional - executing in commit phase
>>>>> 3489.661782408:debug_relp queue:Reg/w0: wti 0x23c78b0: we need to
>>>>> create
>>>>> a
>>>>> new action worker instance for action 0
>>>>> 3489.661789583:debug_relp queue:Reg/w0: relp engine created new client
>>>>> 0x7f1ef00027e0
>>>>> 3489.661794073:debug_relp queue:Reg/w0: wti 0x23c78b0: created action
>>>>> worker instance 1 for action 0
>>>>> 3489.661798129:debug_relp queue:Reg/w0: omrelp: beginTransaction
>>>>> ^C3497.710058579:main thread : DoDie called.
>>>>> 3497.710272055:main thread : exiting on signal 2
>>>>> 3497.710576430:main thread : Terminating input threads...
>>>>> 3497.710765753:main thread : request term via SIGTTIN for input
>>>>> thread
>>>>> 'imuxsock' 0xfcb16700
>>>>> 3497.711033925:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
>>>>> 0x7f1efcb16700 returned with iRet 0, exiting now.
>>>>> 3497.711447372:main thread : non-cancel input thread termination
>>>>> succeeded for thread imuxsock 0xfcb16700
>>>>> 3497.713028584:imuxsock.c : destructor for debug call stack
>>>>> 0x7f1eec0008c0 called
>>>>> 3497.713390963:main thread : imuxsock: unlinking unix socket file[0]
>>>>> /dev/log
>>>>> 3497.713734513:main thread : main Q: qqueueAdd: entry added, size
>>>>> now
>>>>> log 1, phys 1 entries
>>>>> 3497.713906635:main thread : main Q: EnqueueMsg advised worker start
>>>>> 3497.714257550:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
>>>>> processing
>>>>> 3497.714544345:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
>>>>> objects
>>>>> and enqueued 0 objects
>>>>> 3497.715372923:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
>>>>> new
>>>>> sizes: log 1, phys 1
>>>>> 3497.715611163:main Q:Reg/w0 : processBATCH: batch of 1 elements must
>>>>> be
>>>>> processed
>>>>> 3497.715762447:main Q:Reg/w0 : processBATCH: next msg 0: [origin
>>>>> software="rsyslogd" swVersion="8.18.0.master" x-pid="30306" x-info="
>>>>> http://www.rsyslog.com"] exiting on signal 2.
>>>>> 3497.715965911:main Q:Reg/w0 : PRIFILT 'local0.*'
>>>>> 3497.715976154:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X X
>>>>> X X X X X FF X X X X X X X X X
>>>>> 3497.716332698:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.716337713:main Q:Reg/w0 : PRIFILT
>>>>> '*.info;mail.none;authpriv.none;cron.none'
>>>>> 3497.716347729:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
>>>>> X X
>>>>> 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
>>>>> 3497.716424684:main Q:Reg/w0 : PRIFILT condition result is 1
>>>>> 3497.716429241:main Q:Reg/w0 : ACTION 1
>>>>> [builtin:omfile:/var/log/messages]
>>>>> 3497.716439928:main Q:Reg/w0 : executing action 1
>>>>> 3497.716445132:main Q:Reg/w0 : action 'action 1': called, logging to
>>>>> builtin:omfile (susp 0/0, direct q 1)
>>>>> 3497.716452142:main Q:Reg/w0 : action 'action 1': is transactional -
>>>>> executing in commit phase
>>>>> 3497.716457817:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>> 3497.716462371:main Q:Reg/w0 : action 'action 1': set suspended state
>>>>> to 0
>>>>> 3497.716466771:main Q:Reg/w0 : PRIFILT 'authpriv.*'
>>>>> 3497.716476330:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X
>>>>> FF
>>>>> X X X X X X X X X X X X X X X
>>>>> 3497.716796250:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.716802372:main Q:Reg/w0 : PRIFILT 'mail.*'
>>>>> 3497.716811926:main Q:Reg/w0 : pmask: X X FF X X X X X X
>>>>> X X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3497.717436983:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.717444114:main Q:Reg/w0 : PRIFILT 'cron.*'
>>>>> 3497.717454023:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> FF X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3497.717823841:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.717829094:main Q:Reg/w0 : PRIFILT '*.emerg'
>>>>> 3497.717838767:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
>>>>> 1 1
>>>>> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>>>> 3497.717916499:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.717920817:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
>>>>> 3497.717930304:main Q:Reg/w0 : pmask: X X X X X X X 7 7
>>>>> X X
>>>>> X X X X X X X X X X X X X X X
>>>>> 3497.718043828:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.718044097:main Q:Reg/w0 : PRIFILT 'local7.*'
>>>>> 3497.718055458:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>> X X
>>>>> X X X X X X X X X X X X FF X X
>>>>> 3497.718838257:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>> 3497.718842803:main Q:Reg/w0 : END batch execution phase, entering to
>>>>> commit phase [processed 1 of 1 messages]
>>>>> 3497.718847686:main Q:Reg/w0 : actionCommitAll: action 1, state 1, nbr
>>>>> to
>>>>> commit 0 isTransactional 1
>>>>> 3497.718852750:main Q:Reg/w0 : doTransaction: have commitTransaction
>>>>> IF,
>>>>> using that, pWrkrInfo 0x23c8ce0
>>>>> 3497.718857172:main Q:Reg/w0 : entering actionCallCommitTransaction(),
>>>>> state: itx, actionNbr 1, nMsgs 1
>>>>> 3497.718862516:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
>>>>> 0x7f1ef4002480, lenBuf 161, strt data Mar 29 17:44:57 314959ac8cb0
>>>>> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
>>>>> x-pid="30306" x-info="http://www.rs
>>>>> 3497.718868427:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4(messages)
>>>>> flush, buflen 161
>>>>> 3497.718873173:main Q:Reg/w0 : strmPhysWrite, stream 0x7f1ef4002480,
>>>>> len
>>>>> 161
>>>>> 3497.718890732:main Q:Reg/w0 : strm 0x7f1ef4002480: file 4 write wrote
>>>>> 161
>>>>> bytes
>>>>> 3497.718896519:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>> 3497.718901151:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>> 3497.718905316:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>> 3497.718909316:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
>>>>> 3497.718913541:main Q:Reg/w0 : processBATCH: batch of 1 elements has
>>>>> been
>>>>> processed
>>>>> 3497.718918158:main Q:Reg/w0 : regular consumer finished, iret=0,
>>>>> szlog
>>>>> 0
>>>>> sz phys 1
>>>>> 3497.718923739:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
>>>>> objects
>>>>> and enqueued 0 objects
>>>>> 3497.718928090:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
>>>>> new
>>>>> sizes: log 0, phys 0
>>>>> 3497.718932606:main Q:Reg/w0 : regular consumer finished, iret=4,
>>>>> szlog
>>>>> 0
>>>>> sz phys 0
>>>>> 3497.718936888:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for
>>>>> work.
>>>>> 3497.718942403:main thread : Terminating main queue...
>>>>> 3497.718946280:main thread : main Q: initiating worker thread
>>>>> shutdown
>>>>> sequence
>>>>> 3497.718950170:main thread : main Q: trying shutdown of regular
>>>>> workers
>>>>> 3497.718957067:main thread : sent SIGTTIN to worker thread
>>>>> 0xfc315700
>>>>> 3497.718961130:main thread : main Q:Reg: waiting 1500ms on worker
>>>>> thread
>>>>> termination, 1 still running
>>>>> 3497.718968288:main Q:Reg/w0 : wti 0x23c8ba0: worker awoke from idle
>>>>> processing
>>>>> 3497.718972108:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
>>>>> objects
>>>>> and enqueued 0 objects
>>>>> 3497.718975498:main Q:Reg/w0 : doDeleteBatch: delete batch from store,
>>>>> new
>>>>> sizes: log 0, phys 0
>>>>> 3497.718979083:main Q:Reg/w0 : regular consumer finished, iret=4,
>>>>> szlog
>>>>> 0
>>>>> sz phys 0
>>>>> 3497.718982620:main Q:Reg/w0 : wti 0x23c8ba0: terminating worker
>>>>> terminateRet=5, bInactivityTOOccured=0
>>>>> 3497.718986378:main Q:Reg/w0 : DDDD: wti 0x23c8ba0: worker cleanup
>>>>> action
>>>>> instances
>>>>> 3497.718990027:main Q:Reg/w0 : wti 0x23c8ba0, action 0, ptr (nil)
>>>>> 3497.718993327:main Q:Reg/w0 : wti 0x23c8ba0, action 1, ptr
>>>>> 0x7f1ef40022a0
>>>>> 3497.718997548:main Q:Reg/w0 : wti 0x23c8ba0, action 2, ptr (nil)
>>>>> 3497.719000789:main Q:Reg/w0 : wti 0x23c8ba0, action 3, ptr (nil)
>>>>> 3497.719003996:main Q:Reg/w0 : wti 0x23c8ba0, action 4, ptr (nil)
>>>>> 3497.719007201:main Q:Reg/w0 : wti 0x23c8ba0, action 5, ptr (nil)
>>>>> 3497.719010409:main Q:Reg/w0 : wti 0x23c8ba0, action 6, ptr (nil)
>>>>> 3497.719013604:main Q:Reg/w0 : wti 0x23c8ba0, action 7, ptr (nil)
>>>>> 3497.719016916:main Q:Reg/w0 : wti 0x23c8ba0: worker exiting
>>>>> 3497.719020757:main Q:Reg/w0 : main Q:Reg: Worker thread 23c8ba0,
>>>>> terminated, num workers now 0
>>>>> 3497.719090631:main thread : main Q: regular queue workers shut
>>>>> down.
>>>>> 3497.719097597:main thread : main Q: checking to see if we need to
>>>>> cancel any worker threads of the primary queue
>>>>> 3497.719101335:main thread : main Q: worker threads terminated,
>>>>> remaining queue size log 0, phys 0.
>>>>> 3497.719111115:main thread : main Q: queue (type 0) will lose 0
>>>>> messages, destroying...
>>>>> 3497.719501847:main Q:Reg/w0 : destructor for debug call stack
>>>>> 0x7f1ef40008c0 called
>>>>> 3497.719547524:main thread : Terminating outputs...
>>>>> 3497.719553777:main thread : destructing ruleset 0x23ab150, name
>>>>> 0x23ab190
>>>>> 3497.719558276:main thread : debug_relp queue: initiating worker
>>>>> thread
>>>>> shutdown sequence
>>>>> 3497.719561940:main thread : debug_relp queue: trying shutdown of
>>>>> regular workers
>>>>> 3497.719568068:main thread : sent SIGTTIN to worker thread
>>>>> 0xfbb14700
>>>>> 3497.719587478:main thread : debug_relp queue:Reg: waiting 0ms on
>>>>> worker
>>>>> thread termination, 1 still running
>>>>> 3497.719593945:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
>>>>> 3497.719599684:debug_relp queue:Reg/w0: Action 0 transitioned to state:
>>>>> rtry
>>>>> 3497.719603211:debug_relp queue:Reg/w0: doTransaction: action 0,
>>>>> currIParam
>>>>> 1
>>>>> 3497.719607186:debug_relp queue:Reg/w0: actionDoRetry: debug_relp enter
>>>>> loop, iRetries=0
>>>>> 3497.719677407:main thread : debug_relp queue:Reg: timeout waiting
>>>>> on
>>>>> worker thread termination
>>>>> 3497.719677554:main thread : sent SIGTTIN to worker thread
>>>>> 0xfbb14700
>>>>> 3497.719677725:main thread : debug_relp queue: regular shutdown
>>>>> timed
>>>>> out on primary queue (this is OK)
>>>>> 3497.719677897:main thread : debug_relp queue: trying to shutdown
>>>>> workers within Action Timeoutsetting EnqOnly mode
>>>>> 3497.719678191:main thread : debug_relp queue: trying immediate
>>>>> shutdown
>>>>> of regular workers (if any)
>>>>> 3497.719678362:main thread : sent SIGTTIN to worker thread
>>>>> 0xfbb14700
>>>>> 3497.719678558:main thread : debug_relp queue:Reg: waiting 1000ms on
>>>>> worker thread termination, 1 still running
>>>>> 3497.719678754:debug_relp queue:Reg/w0: end relpSessConnect, iRet 10014
>>>>> 3497.719678877:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
>>>>> action->tryResume returned -2007
>>>>> 3497.719678999:debug_relp queue:Reg/w0: actionDoRetry: debug_relp check
>>>>> for
>>>>> max retries, iResumeRetryCount -1, iRetries 0
>>>>> 3498.720321005:debug_relp queue:Reg/w0: actionTryResume: action
>>>>> 0x23bebf0
>>>>> state: rtry, next retry (if applicable): 0 [now 1459273498]
>>>>> 3498.720769277:debug_relp queue:Reg/w0: actionCommit, in retry loop,
>>>>> iRet
>>>>> -2007
>>>>> 3498.721034552:debug_relp queue:Reg/w0: regular consumer finished,
>>>>> iret=-2007, szlog 0 sz phys 1
>>>>> 3498.721240322:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
>>>>> deleted
>>>>> 1
>>>>> objects and enqueued 0 objects
>>>>> 3498.721396664:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
>>>>> from
>>>>> store, new sizes: log 0, phys 0
>>>>> 3498.721831840:debug_relp queue:Reg/w0: wti 0x23c78b0: terminating
>>>>> worker
>>>>> because of TERMINATE_NOW mode, del iRet 0
>>>>> 3498.721991265:debug_relp queue:Reg/w0: DDDD: wti 0x23c78b0: worker
>>>>> cleanup
>>>>> action instances
>>>>> 3498.722274489:debug_relp queue:Reg/w0: wti 0x23c78b0, action 0, ptr
>>>>> 0x7f1ef00020d0
>>>>> 3498.722431811:debug_relp queue:Reg/w0: relp engine destructing client
>>>>> 0x7f1ef00027e0
>>>>> 3498.723389457:debug_relp queue:Reg/w0: wti 0x23c78b0, action 1, ptr
>>>>> (nil)
>>>>> 3498.723559462:debug_relp queue:Reg/w0: wti 0x23c78b0, action 2, ptr
>>>>> (nil)
>>>>> 3498.723794716:debug_relp queue:Reg/w0: wti 0x23c78b0, action 3, ptr
>>>>> (nil)
>>>>> 3498.723972736:debug_relp queue:Reg/w0: wti 0x23c78b0, action 4, ptr
>>>>> (nil)
>>>>> 3498.724207978:debug_relp queue:Reg/w0: wti 0x23c78b0, action 5, ptr
>>>>> (nil)
>>>>> 3498.724215660:debug_relp queue:Reg/w0: wti 0x23c78b0, action 6, ptr
>>>>> (nil)
>>>>> 3498.724219766:debug_relp queue:Reg/w0: wti 0x23c78b0, action 7, ptr
>>>>> (nil)
>>>>> 3498.724223775:debug_relp queue:Reg/w0: wti 0x23c78b0: worker exiting
>>>>> 3498.724228430:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
>>>>> thread
>>>>> 23c78b0, terminated, num workers now 0
>>>>> 3498.724243673:debug_relp queue:Reg/w0: destructor for debug call stack
>>>>> 0x7f1ef00008c0 called
>>>>> 3498.724371931:main thread : debug_relp queue: checking to see if we
>>>>> need to cancel any worker threads of the primary queue
>>>>> 3498.724379712:main thread : debug_relp queue: worker threads
>>>>> terminated, remaining queue size log 0, phys 0.
>>>>> 3498.724386288:main thread : debug_relp queue: persisting queue to
>>>>> disk,
>>>>> 0 entries...
>>>>> 3498.724391611:main thread : strm 0x23b74c0: file 5(relp_queue)
>>>>> closing
>>>>> 3498.724396110:main thread : strm 0x23b74c0: file 5(relp_queue)
>>>>> flush,
>>>>> buflen 0 (no need to flush)
>>>>> 3498.724411574:main thread : strm 0x23b7760: file 6(relp_queue)
>>>>> closing
>>>>> 3498.724418797:main thread : strm 0x23b7a00: file -1(relp_queue)
>>>>> closing
>>>>> 3498.724423956:main thread : strmCloseFile: deleting
>>>>> '/var/spool/rsyslog/relp_queue.00000001'
>>>>> 3498.724490018:main thread : strm 0x7f1ef4002480: file 4(messages)
>>>>> closing
>>>>> 3498.724607874:main thread : strm 0x7f1ef4002480: file 4(messages)
>>>>> flush, buflen 0 (no need to flush)
>>>>> 3498.724731690:main thread : all primary multi-thread sources have
>>>>> been
>>>>> terminated - now doing aux cleanup...
>>>>> 3498.724737248:main thread : destructing current config...
>>>>> 3498.724741760:main thread : calling freeCnf(0x23a91e0) for module
>>>>> 'builtin:omfile'
>>>>> 3498.724745773:main thread : calling freeCnf(0x23a93a0) for module
>>>>> 'builtin:ompipe'
>>>>> 3498.724749977:main thread : calling freeCnf(0x23aa3a0) for module
>>>>> 'builtin:omfwd'
>>>>> 3498.724754588:main thread : calling freeCnf(0x23b9b00) for module
>>>>> 'imuxsock'
>>>>> 3498.724758956:main thread : calling freeCnf(0x23bac70) for module
>>>>> 'imklog'
>>>>> 3498.724807014:main thread : destructing parser 'rsyslog.rfc5424'
>>>>> 3498.724871141:main thread : destructing parser 'rsyslog.rfc3164'
>>>>> 3498.724876440:main thread : pmrfc3164: free parser instance
>>>>> 0x23aa8b0
>>>>> 3498.724887558:main thread : file rsyslogd.c released module
>>>>> 'lmnet',
>>>>> reference count now 2
>>>>> 3498.724892004:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 2
>>>>> 3498.724895745:main thread : Unloading module builtin:omfile
>>>>> 3498.724900191:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 2
>>>>> 3498.724904194:main thread : Unloading module builtin:ompipe
>>>>> 3498.724908035:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 2
>>>>> 3498.724911707:main thread : Unloading module builtin-shell
>>>>> 3498.725044125:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 2
>>>>> 3498.725050401:main thread : Unloading module builtin:omdiscard
>>>>> 3498.725054843:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 2
>>>>> 3498.725058772:main thread : Unloading module builtin:omfwd
>>>>> 3498.725064464:main thread : file omfwd.c released module 'lmnet',
>>>>> reference count now 1
>>>>> 3498.725068747:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725072460:main thread : Unloading module builtin:omusrmsg
>>>>> 3498.725076324:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725080023:main thread : Unloading module builtin:pmrfc5424
>>>>> 3498.725084039:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725087646:main thread : Unloading module builtin:pmrfc3164
>>>>> 3498.725091651:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725095255:main thread : Unloading module builtin:smfile
>>>>> 3498.725099010:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725102601:main thread : Unloading module builtin:smtradfile
>>>>> 3498.725106419:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725110003:main thread : Unloading module builtin:smfwd
>>>>> 3498.725113689:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725117357:main thread : Unloading module builtin:smtradfwd
>>>>> 3498.725121132:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725124782:main thread : Unloading module imuxsock
>>>>> 3498.725157459:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.725162797:main thread : Unloading module imklog
>>>>> 3498.725179208:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.726112994:main thread : Unloading module omrelp
>>>>> 3498.726184405:main thread : module lmnet NOT unloaded because it
>>>>> still
>>>>> has a refcount of 1
>>>>> 3498.726189821:main thread : module lmregexp NOT unloaded because it
>>>>> still has a refcount of 1
>>>>> 3498.726195161:main thread : file conf.c released module 'lmnet',
>>>>> reference count now 0
>>>>> 3498.726198620:main thread : module 'lmnet' has zero reference
>>>>> count,
>>>>> unloading...
>>>>> 3498.726201936:main thread : Unloading module lmnet
>>>>> 3498.726218655:main thread : rsyslog runtime de-initialized, current
>>>>> users 0
>>>>> 3498.726222660:main thread : Clean shutdown completed, bye
>>>>>
>>>>>
>>>>> On Mon, Mar 28, 2016 at 4:41 PM, Kane Kim <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hello Reiner,
>>>>>>
>>>>>>
>>>>>> I'm seeing the same behavior with omrelp module. I've configured
>>>>>> omrelp
>>>>>> to
>>>>>> non-routable ip address:
>>>>>> module(load="omrelp")
>>>>>> local0.* action(type="omrelp"
>>>>>> target="10.0.0.1"
>>>>>> port="2514"
>>>>>> name="debug_relp"
>>>>>> queue.filename="relp_queue"
>>>>>> queue.maxdiskspace="1G"
>>>>>> queue.saveonshutdown="on"
>>>>>> queue.spoolDirectory="/var/spool/rsyslog"
>>>>>> queue.type="Disk"
>>>>>> )
>>>>>> Then I try to log a line:
>>>>>> [root@314959ac8cb0 prism]# logger -p local0.info test
>>>>>> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
>>>>>> relp_queue.00000001
>>>>>> # I shutdown rsyslog at this point
>>>>>> [root@314959ac8cb0 prism]# ls /var/spool/rsyslog/
>>>>>> # after shutdown queue is deleted and message is lost
>>>>>> [root@314959ac8cb0 prism]#
>>>>>>
>>>>>> Here is rsyslog log, so omrelp starts transaction and then I shutdown
>>>>>> rsyslog, it seems at some point queue is deleted:
>>>>>>
>>>>>> 8184.798028612:debug_relp queue:Reg/w0: omrelp: beginTransaction
>>>>>> ^C8194.245122305:main thread : DoDie called.
>>>>>> 8194.245301525:main thread : exiting on signal 2
>>>>>> 8194.245532709:main thread : Terminating input threads...
>>>>>> 8194.245873683:main thread : request term via SIGTTIN for input
>>>>>> thread
>>>>>> 'imuxsock' 0xcb767700
>>>>>> 8194.246079748:imuxsock.c : thrdStarter: usrThrdMain imuxsock -
>>>>>> 0x7f9bcb767700 returned with iRet 0, exiting now.
>>>>>> 8194.247556671:main thread : non-cancel input thread termination
>>>>>> succeeded for thread imuxsock 0xcb767700
>>>>>> 8194.249357547:imuxsock.c : destructor for debug call stack
>>>>>> 0x7f9bbc0008c0 called
>>>>>> 8194.249784271:main thread : imuxsock: unlinking unix socket
>>>>>> file[0]
>>>>>> /dev/log
>>>>>> 8194.250043368:main thread : main Q: qqueueAdd: entry added, size
>>>>>> now
>>>>>> log 1, phys 1 entries
>>>>>> 8194.250321193:main thread : main Q: EnqueueMsg advised worker
>>>>>> start
>>>>>> 8194.250493505:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
>>>>>> processing
>>>>>> 8194.250735988:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
>>>>>> objects
>>>>>> and enqueued 0 objects
>>>>>> 8194.250959294:main Q:Reg/w0 : doDeleteBatch: delete batch from
>>>>>> store,
>>>>>> new sizes: log 1, phys 1
>>>>>> 8194.251913535:main Q:Reg/w0 : processBATCH: batch of 1 elements must
>>>>>> be
>>>>>> processed
>>>>>> 8194.252193683:main Q:Reg/w0 : processBATCH: next msg 0: [origin
>>>>>> software="rsyslogd" swVersion="8.18.0.master" x-pid="30298" x-info="
>>>>>> http://www.rsyslog.com"] exiting on signal 2.
>>>>>> 8194.252377044:main Q:Reg/w0 : PRIFILT 'local0.*'
>>>>>> 8194.252669683:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>>> X
>>>>>> X X X X X X FF X X X X X X X X X
>>>>>> 8194.252898063:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.252902083:main Q:Reg/w0 : PRIFILT
>>>>>> '*.info;mail.none;authpriv.none;cron.none'
>>>>>> 8194.252908847:main Q:Reg/w0 : pmask: 7F 7F X 7F 7F 7F 7F 7F 7F
>>>>>> X
>>>>>> X 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F 7F
>>>>>> 8194.252956209:main Q:Reg/w0 : PRIFILT condition result is 1
>>>>>> 8194.252959849:main Q:Reg/w0 : ACTION 1
>>>>>> [builtin:omfile:/var/log/messages]
>>>>>> 8194.252967510:main Q:Reg/w0 : executing action 1
>>>>>> 8194.252971541:main Q:Reg/w0 : action 'action 1': called, logging to
>>>>>> builtin:omfile (susp 0/0, direct q 1)
>>>>>> 8194.252977583:main Q:Reg/w0 : action 'action 1': is transactional -
>>>>>> executing in commit phase
>>>>>> 8194.252982638:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>>> 8194.252986300:main Q:Reg/w0 : action 'action 1': set suspended state
>>>>>> to 0
>>>>>> 8194.252989729:main Q:Reg/w0 : PRIFILT 'authpriv.*'
>>>>>> 8194.252996951:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>>> X
>>>>>> FF X X X X X X X X X X X X X X X
>>>>>> 8194.253051858:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.253056228:main Q:Reg/w0 : PRIFILT 'mail.*'
>>>>>> 8194.253065299:main Q:Reg/w0 : pmask: X X FF X X X X X X
>>>>>> X
>>>>>> X X X X X X X X X X X X X X X X
>>>>>> 8194.253407472:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.253412679:main Q:Reg/w0 : PRIFILT 'cron.*'
>>>>>> 8194.253422055:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>>> FF
>>>>>> X X X X X X X X X X X X X X X X
>>>>>> 8194.253493551:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.253498038:main Q:Reg/w0 : PRIFILT '*.emerg'
>>>>>> 8194.253507065:main Q:Reg/w0 : pmask: 1 1 1 1 1 1 1 1 1
>>>>>> 1
>>>>>> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>>>>> 8194.253580307:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.253584400:main Q:Reg/w0 : PRIFILT 'uucp,news.crit'
>>>>>> 8194.253593442:main Q:Reg/w0 : pmask: X X X X X X X 7 7
>>>>>> X
>>>>>> X X X X X X X X X X X X X X X X
>>>>>> 8194.253664984:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.253669184:main Q:Reg/w0 : PRIFILT 'local7.*'
>>>>>> 8194.253678130:main Q:Reg/w0 : pmask: X X X X X X X X X
>>>>>> X
>>>>>> X X X X X X X X X X X X X FF X X
>>>>>> 8194.254445021:main Q:Reg/w0 : PRIFILT condition result is 0
>>>>>> 8194.254450651:main Q:Reg/w0 : END batch execution phase, entering to
>>>>>> commit phase [processed 1 of 1 messages]
>>>>>> 8194.254456257:main Q:Reg/w0 : actionCommitAll: action 1, state 1,
>>>>>> nbr
>>>>>> to
>>>>>> commit 0 isTransactional 1
>>>>>> 8194.254461646:main Q:Reg/w0 : doTransaction: have commitTransaction
>>>>>> IF,
>>>>>> using that, pWrkrInfo 0x1c13cf0
>>>>>> 8194.254466238:main Q:Reg/w0 : entering
>>>>>> actionCallCommitTransaction(),
>>>>>> state: itx, actionNbr 1, nMsgs 1
>>>>>> 8194.254471876:main Q:Reg/w0 : omfile: write to stream, pData->pStrm
>>>>>> 0x7f9bc4002480, lenBuf 161, strt data Mar 28 23:36:34 314959ac8cb0
>>>>>> rsyslogd: [origin software="rsyslogd" swVersion="8.18.0.master"
>>>>>> x-pid="30298" x-info="http://www.rs
>>>>>> 8194.254478237:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4(messages)
>>>>>> flush, buflen 161
>>>>>> 8194.254483210:main Q:Reg/w0 : strmPhysWrite, stream 0x7f9bc4002480,
>>>>>> len
>>>>>> 161
>>>>>> 8194.254500329:main Q:Reg/w0 : strm 0x7f9bc4002480: file 4 write
>>>>>> wrote
>>>>>> 161 bytes
>>>>>> 8194.254506254:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>>> 8194.254511506:main Q:Reg/w0 : Action 1 transitioned to state: itx
>>>>>> 8194.254515961:main Q:Reg/w0 : Action 1 transitioned to state: rdy
>>>>>> 8194.254520243:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
>>>>>> 8194.254524679:main Q:Reg/w0 : processBATCH: batch of 1 elements has
>>>>>> been
>>>>>> processed
>>>>>> 8194.254529731:main Q:Reg/w0 : regular consumer finished, iret=0,
>>>>>> szlog 0
>>>>>> sz phys 1
>>>>>> 8194.254536104:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1
>>>>>> objects
>>>>>> and enqueued 0 objects
>>>>>> 8194.254540861:main Q:Reg/w0 : doDeleteBatch: delete batch from
>>>>>> store,
>>>>>> new sizes: log 0, phys 0
>>>>>> 8194.254545792:main Q:Reg/w0 : regular consumer finished, iret=4,
>>>>>> szlog 0
>>>>>> sz phys 0
>>>>>> 8194.254550314:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting
>>>>>> for
>>>>>> work.
>>>>>> 8194.254556418:main thread : Terminating main queue...
>>>>>> 8194.254560776:main thread : main Q: initiating worker thread
>>>>>> shutdown
>>>>>> sequence
>>>>>> 8194.254564653:main thread : main Q: trying shutdown of regular
>>>>>> workers
>>>>>> 8194.254572961:main thread : sent SIGTTIN to worker thread
>>>>>> 0xcaf66700
>>>>>> 8194.254577260:main thread : main Q:Reg: waiting 1499ms on worker
>>>>>> thread termination, 1 still running
>>>>>> 8194.254585507:main Q:Reg/w0 : wti 0x1c13bb0: worker awoke from idle
>>>>>> processing
>>>>>> 8194.254589465:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0
>>>>>> objects
>>>>>> and enqueued 0 objects
>>>>>> 8194.254593070:main Q:Reg/w0 : doDeleteBatch: delete batch from
>>>>>> store,
>>>>>> new sizes: log 0, phys 0
>>>>>> 8194.254596813:main Q:Reg/w0 : regular consumer finished, iret=4,
>>>>>> szlog 0
>>>>>> sz phys 0
>>>>>> 8194.254600412:main Q:Reg/w0 : wti 0x1c13bb0: terminating worker
>>>>>> terminateRet=5, bInactivityTOOccured=0
>>>>>> 8194.254604469:main Q:Reg/w0 : DDDD: wti 0x1c13bb0: worker cleanup
>>>>>> action
>>>>>> instances
>>>>>> 8194.254608268:main Q:Reg/w0 : wti 0x1c13bb0, action 0, ptr (nil)
>>>>>> 8194.254611752:main Q:Reg/w0 : wti 0x1c13bb0, action 1, ptr
>>>>>> 0x7f9bc40022a0
>>>>>> 8194.254616114:main Q:Reg/w0 : wti 0x1c13bb0, action 2, ptr (nil)
>>>>>> 8194.254619450:main Q:Reg/w0 : wti 0x1c13bb0, action 3, ptr (nil)
>>>>>> 8194.254622767:main Q:Reg/w0 : wti 0x1c13bb0, action 4, ptr (nil)
>>>>>> 8194.254626075:main Q:Reg/w0 : wti 0x1c13bb0, action 5, ptr (nil)
>>>>>> 8194.254629392:main Q:Reg/w0 : wti 0x1c13bb0, action 6, ptr (nil)
>>>>>> 8194.254632707:main Q:Reg/w0 : wti 0x1c13bb0, action 7, ptr (nil)
>>>>>> 8194.254636124:main Q:Reg/w0 : wti 0x1c13bb0: worker exiting
>>>>>> 8194.254639920:main Q:Reg/w0 : main Q:Reg: Worker thread 1c13bb0,
>>>>>> terminated, num workers now 0
>>>>>> 8194.254656279:main thread : main Q: regular queue workers shut
>>>>>> down.
>>>>>> 8194.254656402:main thread : main Q: checking to see if we need to
>>>>>> cancel any worker threads of the primary queue
>>>>>> 8194.254656524:main thread : main Q: worker threads terminated,
>>>>>> remaining queue size log 0, phys 0.
>>>>>> 8194.254656647:main thread : main Q: queue (type 0) will lose 0
>>>>>> messages, destroying...
>>>>>> 8194.255021761:main Q:Reg/w0 : destructor for debug call stack
>>>>>> 0x7f9bc40008c0 called
>>>>>> 8194.255069521:main thread : Terminating outputs...
>>>>>> 8194.255084818:main thread : destructing ruleset 0x1bf6150, name
>>>>>> 0x1bf6190
>>>>>> 8194.255089525:main thread : debug_relp queue: initiating worker
>>>>>> thread
>>>>>> shutdown sequence
>>>>>> 8194.255093230:main thread : debug_relp queue: trying shutdown of
>>>>>> regular workers
>>>>>> 8194.255099210:main thread : sent SIGTTIN to worker thread
>>>>>> 0xca765700
>>>>>> 8194.255119598:main thread : debug_relp queue:Reg: waiting 0ms on
>>>>>> worker thread termination, 1 still running
>>>>>> 8194.255126357:debug_relp queue:Reg/w0: end relpSessConnect, iRet
>>>>>> 10014
>>>>>> 8194.255132022:debug_relp queue:Reg/w0: Action 0 transitioned to
>>>>>> state:
>>>>>> rtry
>>>>>> 8194.255135551:debug_relp queue:Reg/w0: doTransaction: action 0,
>>>>>> currIParam 1
>>>>>> 8194.255139603:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
>>>>>> enter
>>>>>> loop, iRetries=0
>>>>>> 8194.255210939:main thread : debug_relp queue:Reg: timeout waiting
>>>>>> on
>>>>>> worker thread termination
>>>>>> 8194.255211086:main thread : sent SIGTTIN to worker thread
>>>>>> 0xca765700
>>>>>> 8194.255211257:main thread : debug_relp queue: regular shutdown
>>>>>> timed
>>>>>> out on primary queue (this is OK)
>>>>>> 8194.255211429:main thread : debug_relp queue: trying to shutdown
>>>>>> workers within Action Timeoutsetting EnqOnly mode
>>>>>> 8194.255215492:main thread : debug_relp queue: trying immediate
>>>>>> shutdown of regular workers (if any)
>>>>>> 8194.255221059:main thread : sent SIGTTIN to worker thread
>>>>>> 0xca765700
>>>>>> 8194.255226792:main thread : debug_relp queue:Reg: waiting 1000ms
>>>>>> on
>>>>>> worker thread termination, 1 still running
>>>>>> 8194.255232713:debug_relp queue:Reg/w0: end relpSessConnect, iRet
>>>>>> 10014
>>>>>> 8194.255237080:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
>>>>>> action->tryResume returned -2007
>>>>>> 8194.255240684:debug_relp queue:Reg/w0: actionDoRetry: debug_relp
>>>>>> check
>>>>>> for max retries, iResumeRetryCount 0, iRetries 0
>>>>>> 8194.255244459:debug_relp queue:Reg/w0: Action 0 transitioned to
>>>>>> state:
>>>>>> susp
>>>>>> 8194.255257975:debug_relp queue:Reg/w0: Called LogMsg, msg: action
>>>>>> 'debug_relp' suspended, next retry is Mon Mar 28 23:37:04 2016
>>>>>> 8194.255266857:debug_relp queue:Reg/w0: submitMsg2() could not submit
>>>>>> message - queue does (no longer?) exist - ignored
>>>>>> 8194.255271065:debug_relp queue:Reg/w0: action 'debug_relp' suspended,
>>>>>> earliest retry=1459208224 (now 1459208194), iNbrResRtry 0, duration 30
>>>>>> 8194.255275244:debug_relp queue:Reg/w0: actionTryResume: action
>>>>>> 0x1c09af0
>>>>>> state: susp, next retry (if applicable): 1459208224 [now 1459208194]
>>>>>> 8194.255278825:debug_relp queue:Reg/w0: actionCommit, in retry loop,
>>>>>> iRet
>>>>>> -2123
>>>>>> 8194.255282409:debug_relp queue:Reg/w0: regular consumer finished,
>>>>>> iret=-2123, szlog 0 sz phys 1
>>>>>> 8194.255287286:debug_relp queue:Reg/w0: DeleteProcessedBatch: we
>>>>>> deleted 1
>>>>>> objects and enqueued 0 objects
>>>>>> 8194.255290952:debug_relp queue:Reg/w0: doDeleteBatch: delete batch
>>>>>> from
>>>>>> store, new sizes: log 0, phys 0
>>>>>> 8194.255295204:debug_relp queue:Reg/w0: wti 0x1c12880: terminating
>>>>>> worker
>>>>>> because of TERMINATE_NOW mode, del iRet 0
>>>>>> 8194.255298822:debug_relp queue:Reg/w0: DDDD: wti 0x1c12880: worker
>>>>>> cleanup action instances
>>>>>> 8194.255302306:debug_relp queue:Reg/w0: wti 0x1c12880, action 0, ptr
>>>>>> 0x7f9bc00020d0
>>>>>> 8194.255305929:debug_relp queue:Reg/w0: relp engine destructing client
>>>>>> 0x7f9bc00027e0
>>>>>> 8194.255309843:debug_relp queue:Reg/w0: wti 0x1c12880, action 1, ptr
>>>>>> (nil)
>>>>>> 8194.255313226:debug_relp queue:Reg/w0: wti 0x1c12880, action 2, ptr
>>>>>> (nil)
>>>>>> 8194.255316479:debug_relp queue:Reg/w0: wti 0x1c12880, action 3, ptr
>>>>>> (nil)
>>>>>> 8194.255319704:debug_relp queue:Reg/w0: wti 0x1c12880, action 4, ptr
>>>>>> (nil)
>>>>>> 8194.255322934:debug_relp queue:Reg/w0: wti 0x1c12880, action 5, ptr
>>>>>> (nil)
>>>>>> 8194.255326175:debug_relp queue:Reg/w0: wti 0x1c12880, action 6, ptr
>>>>>> (nil)
>>>>>> 8194.255329409:debug_relp queue:Reg/w0: wti 0x1c12880, action 7, ptr
>>>>>> (nil)
>>>>>> 8194.255332655:debug_relp queue:Reg/w0: wti 0x1c12880: worker exiting
>>>>>> 8194.255336231:debug_relp queue:Reg/w0: debug_relp queue:Reg: Worker
>>>>>> thread 1c12880, terminated, num workers now 0
>>>>>> 8194.255344540:debug_relp queue:Reg/w0: destructor for debug call
>>>>>> stack
>>>>>> 0x7f9bc00008c0 called
>>>>>> 8194.255371534:main thread : debug_relp queue: checking to see if
>>>>>> we
>>>>>> need to cancel any worker threads of the primary queue
>>>>>> 8194.255376798:main thread : debug_relp queue: worker threads
>>>>>> terminated, remaining queue size log 0, phys 0.
>>>>>> 8194.255381256:main thread : debug_relp queue: persisting queue to
>>>>>> disk, 0 entries...
>>>>>> 8194.255385525:main thread : strm 0x1c024c0: file 5(relp_queue)
>>>>>> closing
>>>>>> 8194.255389222:main thread : strm 0x1c024c0: file 5(relp_queue)
>>>>>> flush,
>>>>>> buflen 0 (no need to flush)
>>>>>> 8194.255398716:main thread : strm 0x1c02760: file 6(relp_queue)
>>>>>> closing
>>>>>> 8194.255404785:main thread : strm 0x1c02a00: file -1(relp_queue)
>>>>>> closing
>>>>>> 8194.255409145:main thread : strmCloseFile: deleting
>>>>>> '/var/spool/rsyslog/relp_queue.00000001'
>>>>>> 8194.255460812:main thread : strm 0x7f9bc4002480: file 4(messages)
>>>>>> closing
>>>>>> 8194.255466309:main thread : strm 0x7f9bc4002480: file 4(messages)
>>>>>> flush, buflen 0 (no need to flush)
>>>>>> 8194.255479325:main thread : all primary multi-thread sources have
>>>>>> been
>>>>>> terminated - now doing aux cleanup...
>>>>>> 8194.255482809:main thread : destructing current config...
>>>>>> 8194.255486662:main thread : calling freeCnf(0x1bf41e0) for module
>>>>>> 'builtin:omfile'
>>>>>> 8194.255490098:main thread : calling freeCnf(0x1bf43a0) for module
>>>>>> 'builtin:ompipe'
>>>>>> 8194.255493809:main thread : calling freeCnf(0x1bf53a0) for module
>>>>>> 'builtin:omfwd'
>>>>>> 8194.255497793:main thread : calling freeCnf(0x1c04b00) for module
>>>>>> 'imuxsock'
>>>>>> 8194.255501565:main thread : calling freeCnf(0x1c05c70) for module
>>>>>> 'imklog'
>>>>>> 8194.255543579:main thread : destructing parser 'rsyslog.rfc5424'
>>>>>> 8194.255547468:main thread : destructing parser 'rsyslog.rfc3164'
>>>>>> 8194.255550784:main thread : pmrfc3164: free parser instance
>>>>>> 0x1bf58b0
>>>>>> 8194.255559659:main thread : file rsyslogd.c released module
>>>>>> 'lmnet',
>>>>>> reference count now 2
>>>>>> 8194.255563470:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 2
>>>>>> 8194.255566676:main thread : Unloading module builtin:omfile
>>>>>> 8194.255570420:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 2
>>>>>> 8194.255573674:main thread : Unloading module builtin:ompipe
>>>>>> 8194.255576857:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 2
>>>>>> 8194.255579938:main thread : Unloading module builtin-shell
>>>>>> 8194.255583060:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 2
>>>>>> 8194.255586144:main thread : Unloading module builtin:omdiscard
>>>>>> 8194.255589509:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 2
>>>>>> 8194.255592658:main thread : Unloading module builtin:omfwd
>>>>>> 8194.255596441:main thread : file omfwd.c released module 'lmnet',
>>>>>> reference count now 1
>>>>>> 8194.255599981:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255603068:main thread : Unloading module builtin:omusrmsg
>>>>>> 8194.255606240:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255609287:main thread : Unloading module builtin:pmrfc5424
>>>>>> 8194.255612755:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255615826:main thread : Unloading module builtin:pmrfc3164
>>>>>> 8194.255619252:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255622315:main thread : Unloading module builtin:smfile
>>>>>> 8194.255625406:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255628486:main thread : Unloading module builtin:smtradfile
>>>>>> 8194.255631753:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255634819:main thread : Unloading module builtin:smfwd
>>>>>> 8194.255637916:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255640987:main thread : Unloading module builtin:smtradfwd
>>>>>> 8194.255644187:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255647267:main thread : Unloading module imuxsock
>>>>>> 8194.255676232:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255680751:main thread : Unloading module imklog
>>>>>> 8194.255694428:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255698085:main thread : Unloading module omrelp
>>>>>> 8194.255750953:main thread : module lmnet NOT unloaded because it
>>>>>> still
>>>>>> has a refcount of 1
>>>>>> 8194.255755236:main thread : module lmregexp NOT unloaded because
>>>>>> it
>>>>>> still has a refcount of 1
>>>>>> 8194.255759705:main thread : file conf.c released module 'lmnet',
>>>>>> reference count now 0
>>>>>> 8194.255762875:main thread : module 'lmnet' has zero reference
>>>>>> count,
>>>>>> unloading...
>>>>>> 8194.255765901:main thread : Unloading module lmnet
>>>>>> 8194.255780970:main thread : rsyslog runtime de-initialized,
>>>>>> current
>>>>>> users 0
>>>>>> 8194.255784704:main thread : Clean shutdown completed, bye
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 13, 2016 at 9:30 PM, David Lang <[email protected]> wrote:
>>>>>>
>>>>>> On Sun, 13 Mar 2016, Kane Kim wrote:
>>>>>>>
>>>>>>>
>>>>>>> +1to David, having module handle the errors is not 100% reliable,
>>>>>>> what
>>>>>>> if
>>>>>>>
>>>>>>>> module dies before having a chance to save error log, those messages
>>>>>>>> will
>>>>>>>> be lost.
>>>>>>>>
>>>>>>>>
>>>>>>> well, the module is a thread of rsyslog, so if it dies, rsyslog has
>>>>>>> died.
>>>>>>>
>>>>>>> But if you are doing large batches (hundreds or thousands), putting
>>>>>>> all
>>>>>>> of them in a 'bad messages' file because one has a problem seems like
>>>>>>> the
>>>>>>> wrong thing to do.
>>>>>>>
>>>>>>> given that the error is not detected until the batch is being flushed
>>>>>>> for
>>>>>>> the output, any outputs that have already flushed their batches can't
>>>>>>> be
>>>>>>> changed, so the question is if it's worse to have messages resent to
>>>>>>> those
>>>>>>> outputs as the batch gets re-processed, or having to have the output
>>>>>>> go to
>>>>>>> a "couldn't deliver" file.
>>>>>>>
>>>>>>> It's bad enough if the problem is a corrupt message that can never be
>>>>>>> delivered and the error is a permanent one, but if the problem is
>>>>>>> instead a
>>>>>>> temporary error (server is down), you will have a LOT of messages
>>>>>>> going
>>>>>>> into this 'couldn't deliver' bucket, when what should really be
>>>>>>> happening
>>>>>>> is that rsyslog should be pausing in the processing of messages,
>>>>>>> retrying
>>>>>>> after the suspended timeout.
>>>>>>>
>>>>>>> David Lang
>>>>>>>
>>>>>>>
>>>>>>> On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, 13 Mar 2016, Rainer Gerhards wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Rainer, can you explain what happens when a batch fails on an
>>>>>>>>> action?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> It depends on when what happens and if we have batches of one or
>>>>>>>>>>
>>>>>>>>>> batches of many. For batches of one, the failure is immediately
>>>>>>>>>> detected and acted upon. For batches of many,all action strings or
>>>>>>>>>> stored temporarily. During that time, it is assume that the
>>>>>>>>>> messages
>>>>>>>>>> could be processed. There is no way around this assumption,
>>>>>>>>>> because
>>>>>>>>>> we
>>>>>>>>>> now run each action completely through the rule engine before the
>>>>>>>>>> next
>>>>>>>>>> one is begun to process, which means that other parts of the
>>>>>>>>>> config
>>>>>>>>>> have actually acted on a message, which means we cannot undo that
>>>>>>>>>> state. If is needed to act on the "real" state, we need to process
>>>>>>>>>> batches of one. There simply is no other solution possible. So
>>>>>>>>>> let's
>>>>>>>>>> stay with batches of many. Once the batch is completed, the
>>>>>>>>>> temporary
>>>>>>>>>> strings are passed to the output's commitTransaction entry point.
>>>>>>>>>> It
>>>>>>>>>> is currently assumed that the output does inteligently handle the
>>>>>>>>>> case, and as it looks most do this well (no regression tests
>>>>>>>>>> failed,
>>>>>>>>>> no bad reports since we introduced this with the initial v8
>>>>>>>>>> version
>>>>>>>>>> (15+ month ago). If the action fails, status is reflected
>>>>>>>>>> accordingly.
>>>>>>>>>> Also, the core engine *should* (but obviously does not support
>>>>>>>>>> this
>>>>>>>>>> right now) write an error file with the messages in error (just
>>>>>>>>>> like
>>>>>>>>>> omelasticsearch does, which has proven to be quite good).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> well, it's not in the specs that modules should have local error
>>>>>>>>>
>>>>>>>>> files
>>>>>>>>> the
>>>>>>>>> way omelasticsearch does.
>>>>>>>>>
>>>>>>>>> What I would expect if there is a problem when commiting the batch
>>>>>>>>> is
>>>>>>>>> that
>>>>>>>>> the entire batch would be marked as not delivered and try again
>>>>>>>>> with
>>>>>>>>> half
>>>>>>>>> of it.
>>>>>>>>>
>>>>>>>>> This will cause messages to other outputs working from the same
>>>>>>>>> queue to
>>>>>>>>> get duplciate messages, but it's better than loosing an entire
>>>>>>>>> batch
>>>>>>>>> of
>>>>>>>>> messages because one is corrupt in some way. or there is a
>>>>>>>>> temporary
>>>>>>>>> problem with the output.
>>>>>>>>>
>>>>>>>>> David Lang
>>>>>>>>>
>>>>>>>>> I have managed to look at least briefly at the commitTransaction
>>>>>>>>> core
>>>>>>>>>
>>>>>>>>> engine code. I think we can even improve it and may get some
>>>>>>>>> ability
>>>>>>>>>>
>>>>>>>>>> to do retries if the action itself does not handle that. At the
>>>>>>>>>> minimum, it should have the ability (which was intended
>>>>>>>>>> originally,
>>>>>>>>>> but pushed away by more urgent work) to write that error file.
>>>>>>>>>>
>>>>>>>>>> I am very grateful that Kane is working on a repro with the
>>>>>>>>>> standard
>>>>>>>>>> modules. Once we have that, I can go deeper into the code and
>>>>>>>>>> check
>>>>>>>>>> what I am thinking about. It might be possible that we cannot
>>>>>>>>>> manage
>>>>>>>>>> to repro this with the standard modules. Then we need to think
>>>>>>>>>> about a
>>>>>>>>>> way via the testing module to do the same thing.
>>>>>>>>>>
>>>>>>>>>> I will try to address this within the 8.18 timeframe, but cannot
>>>>>>>>>> totally commit to it. There has a lot of work been going on, and
>>>>>>>>>> especially all that CI testing has taken up lots of works.
>>>>>>>>>> Nevertheless, I've been able to iron out quite some nits, so it's
>>>>>>>>>> probably worth it. I would still like to continue to get the CI
>>>>>>>>>> environment into better shape, because now I know what I am doing.
>>>>>>>>>> If
>>>>>>>>>> I stop again and let it go for a couple of weeks, I would lose a
>>>>>>>>>> lot
>>>>>>>>>> of momentum. I have also do to some work for support customers,
>>>>>>>>>> which
>>>>>>>>>> keeps me able to work on rsyslog in general, so this, too, has
>>>>>>>>>> priority (and validly has so because without support customers I
>>>>>>>>>> would
>>>>>>>>>> be totally unable to work on rsyslog other than as a hobby)
>>>>>>>>>>
>>>>>>>>>> I hope this explains the current state.
>>>>>>>>>> Rainer
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kane, can you try and duplicate your problem using standard
>>>>>>>>>> modules?
>>>>>>>>>>
>>>>>>>>>>> I'm
>>>>>>>>>>> thinking that if you define a omrelp action that doesn't go
>>>>>>>>>>> anywhere,
>>>>>>>>>>> it
>>>>>>>>>>> should return errors similar to what you are trying to do, so you
>>>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>> able to duplicate the problem that way.
>>>>>>>>>>>
>>>>>>>>>>> David Lang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 23 Feb 2016, Kane Kim wrote:
>>>>>>>>>>>
>>>>>>>>>>> Looking at omkafka module source code it seems that it relies on
>>>>>>>>>>> rsyslog
>>>>>>>>>>>
>>>>>>>>>>> retries in DoAction, returning RS_RET_SUSPENDED:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
>>>>>>>>>>>> if(iRet != RS_RET_OK) {
>>>>>>>>>>>> iRet = RS_RET_SUSPENDED;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> I've tried similar code in DoAction, it seems that action
>>>>>>>>>>>> processing
>>>>>>>>>>>> is suspended at that point and messages are not retried to
>>>>>>>>>>>> module.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help Rainer, I'll try to debug what's going on,
>>>>>>>>>>>> so
>>>>>>>>>>>> far
>>>>>>>>>>>>
>>>>>>>>>>>> it
>>>>>>>>>>>>>
>>>>>>>>>>>>> seems rsyslog doesn't retry even with batch size 1. It doesn't
>>>>>>>>>>>>> retry
>>>>>>>>>>>>> if I
>>>>>>>>>>>>> return RS_RET_SUSPENDED from DoAction as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
>>>>>>>>>>>>> <[email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some
>>>>>>>>>>>>>>>>> context:
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to write module that both using batching and also can't
>>>>>>>>>>>>>>>>> loose
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> messages in
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> any circumstances. Are you saying it is by design that rsyslog
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> that together? According to documentation rsyslog will retry
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> module
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> returns any error. Do you plan to fix this in rsyslog or
>>>>>>>>>>>>>>>>> update
>>>>>>>>>>>>>>>>> documentation to say batching and retries don't work?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It depends on many things. In almost all cases, the retry
>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>> well (and does so in practice). Unfortunately, I am pretty
>>>>>>>>>>>>>>>> swamped.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> need to go to a conference tomorrow and have had quite some
>>>>>>>>>>>>>>>> unexpected
>>>>>>>>>>>>>>>> work today. It would probably be good if you could ping me
>>>>>>>>>>>>>>>> next
>>>>>>>>>>>>>>>> week
>>>>>>>>>>>>>>>> to see if we can look into more details what is causing you
>>>>>>>>>>>>>>>> pain.
>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>> I can't guarantee that I will be available early next week.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In general, we cannot handle a fatal error here from an
>>>>>>>>>>>>>>>> engine
>>>>>>>>>>>>>>>> PoV,
>>>>>>>>>>>>>>>> because everything is already processed and we do no longer
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> original messages. This is simply needed if you want to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> messages one after another through the full config (a goal
>>>>>>>>>>>>>>>> for v8
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> was muuuuch requested). As I said, the solution is to use
>>>>>>>>>>>>>>>> batches of
>>>>>>>>>>>>>>>> one, because otherwise we would really need to turn back
>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> undo
>>>>>>>>>>>>>>>> everything that was already done on the messages in question
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>> modules (including state advances).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I thought that if a batch failed, it pushed all the messages
>>>>>>>>>>>>>>> back
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> queue and retried with a half size batch until it got to the
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>> that could not be processed and only did a fatal fail on that
>>>>>>>>>>>>>>> message.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, there is a big difference between a module giving a hard
>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> message is never going to be able to be processed no matter
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> times
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it's retried" vs a soft error "there is a problem delivering
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> destination right now, retry later". I thought the batch
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> these differently.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's no longer possible with v8, at least generically. As I
>>>>>>>>>>>>>> said
>>>>>>>>>>>>>> above, we would need to turn back time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But I really run out of time now...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Rainer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David Lang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
>>>>>>>>>>>>>>> by a
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> myriad of
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> DON'T
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LIKE THAT.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> myriad
>>>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>>>>>>>>>>> POST
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>>>
>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by
>>>>>>>>>>>> a
>>>>>>>>>>>> myriad
>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>>>>>>>>> if
>>>>>>>>>>>> you
>>>>>>>>>>>> DON'T
>>>>>>>>>>>> LIKE THAT.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>
>>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>>
>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>>> myriad
>>>>>>>>>>> of
>>>>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>>> you
>>>>>>>>>>> DON'T
>>>>>>>>>>> LIKE THAT.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>
>>>>>>>>>> rsyslog mailing list
>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>> myriad
>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>> you
>>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>>
>>>>>>>>>> rsyslog mailing list
>>>>>>>>>
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>> myriad
>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>> you
>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>>> rsyslog mailing list
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>> myriad
>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>> you
>>>>>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>> myriad
>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>> you
>>>>>>> DON'T LIKE THAT.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.