I'm trying to post to kafka reliably from rsyslog. omkafka module is using doAction to post to kafka. I've changed doAction to always return RS_RET_SUSPENDED, which would happen if there is any error posting to kafka. There are a couple of use cases I want to make sure we are not loosing messages: 1. log a couple of lines, restart rsyslog - make sure those lines are not lost (on graceful restart rsyslog always loosing message in-flight) 2. the same as above, but kill -9 rsyslog (it seems that rsyslog looses all uncommitted messages)
So far I can't make it working, some messages always lost. Also rsyslog's retry handling seems to relay on tryResume to block before calling doAction, e.g. if TryResume can't reach kafka it will be blocked until kafka is reachable again and DoAction will *probably* work. The issue is if TryResume returns success and then DoAction fails, in this case rsyslog never retries that message. On Wed, Feb 24, 2016 at 5:37 PM, David Lang <[email protected]> wrote: > I saw something in a changelog recently about how the first message could > be lost under some conditions. Try the current nightly build (I think this > was in something post 8.16) and see if the same thing happens. > > David Lang > > On Wed, 24 Feb 2016, Kane Kim wrote: > > Date: Wed, 24 Feb 2016 17:27:27 -0800 >> From: Kane Kim <[email protected]> >> Reply-To: rsyslog-users <[email protected]> >> To: rsyslog-users <[email protected]> >> Subject: Re: [rsyslog] retry if output module returns RS_RET_SUSPENDED >> >> >> BTW, another question, I've created dummy plugin with all endpoints >> returning RS_RET_SUSPENDED >> (BeginTransaction/TryResume/DoAction/EndTransaction). >> I'm logging 3 lines: 1, 2 and 3, queue.saveonshutdown is set to >> "on", action.resumeRetryCount="-1" >> action.reportSuspensionContinuation="on" >> action.resumeInterval="1" queue.dequeuebatchsize="1" queue.type="Disk". >> So rsyslog is retrying line "1" indefinitely, then I stop and restart >> rsyslog, it starts to retry line "2" and after another restart it retries >> line "3". After 3rd restart queue is deleted and nothing is retried. >> Apparently none of the lines were committed successfully by the output >> module and they are all lost. >> >> Is it expected behavior? >> >> >> On Tue, Feb 23, 2016 at 9:11 AM, Kane Kim <[email protected]> wrote: >> >> Looking at omkafka module source code it seems that it relies on rsyslog >>> retries in DoAction, returning RS_RET_SUSPENDED: >>> >>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet); >>> if(iRet != RS_RET_OK) { >>> iRet = RS_RET_SUSPENDED; >>> } >>> >>> I've tried similar code in DoAction, it seems that action processing is >>> suspended at that point and messages are not retried to module. >>> >>> >>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]> wrote: >>> >>> Thanks for your help Rainer, I'll try to debug what's going on, so far it >>>> seems rsyslog doesn't retry even with batch size 1. It doesn't retry if >>>> I >>>> return RS_RET_SUSPENDED from DoAction as well. >>>> >>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards < >>>> [email protected]> wrote: >>>> >>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>: >>>>> >>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote: >>>>>> >>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>: >>>>>>> >>>>>>>> >>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some context: >>>>>>>> >>>>>>> I >>>>> >>>>>> want >>>>>>>> to write module that both using batching and also can't loose >>>>>>>> >>>>>>> messages in >>>>> >>>>>> any circumstances. Are you saying it is by design that rsyslog can't >>>>>>>> >>>>>>> do >>>>> >>>>>> that together? According to documentation rsyslog will retry if >>>>>>>> >>>>>>> module >>>>> >>>>>> returns any error. Do you plan to fix this in rsyslog or update >>>>>>>> documentation to say batching and retries don't work? >>>>>>>> >>>>>>> >>>>>>> >>>>>>> It depends on many things. In almost all cases, the retry should work >>>>>>> well (and does so in practice). Unfortunately, I am pretty swamped. I >>>>>>> need to go to a conference tomorrow and have had quite some >>>>>>> unexpected >>>>>>> work today. It would probably be good if you could ping me next week >>>>>>> to see if we can look into more details what is causing you pain. But >>>>>>> I can't guarantee that I will be available early next week. >>>>>>> >>>>>>> In general, we cannot handle a fatal error here from an engine PoV, >>>>>>> because everything is already processed and we do no longer have the >>>>>>> original messages. This is simply needed if you want to process >>>>>>> messages one after another through the full config (a goal for v8 >>>>>>> that >>>>>>> was muuuuch requested). As I said, the solution is to use batches of >>>>>>> one, because otherwise we would really need to turn back time and >>>>>>> undo >>>>>>> everything that was already done on the messages in question by other >>>>>>> modules (including state advances). >>>>>>> >>>>>> >>>>>> >>>>>> I thought that if a batch failed, it pushed all the messages back on >>>>>> >>>>> the >>>>> >>>>>> queue and retried with a half size batch until it got to the one >>>>>> >>>>> message >>>>> >>>>>> that could not be processed and only did a fatal fail on that message. >>>>>> >>>>>> Now, there is a big difference between a module giving a hard error >>>>>> >>>>> "this >>>>> >>>>>> message is never going to be able to be processed no matter how many >>>>>> >>>>> times >>>>> >>>>>> it's retried" vs a soft error "there is a problem delivering things to >>>>>> >>>>> this >>>>> >>>>>> destination right now, retry later". I thought the batch processing >>>>>> >>>>> handled >>>>> >>>>>> these differently. >>>>>> >>>>> >>>>> That's no longer possible with v8, at least generically. As I said >>>>> above, we would need to turn back time. >>>>> >>>>> But I really run out of time now... >>>>> >>>>> Rainer >>>>> >>>>>> >>>>>> David Lang >>>>>> >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> >>>>> myriad of >>>>> >>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> >>>>> DON'T >>>>> >>>>>> LIKE THAT. >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> >>>> >>>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

