+1to David, having module handle the errors is not 100% reliable, what if
module dies before having a chance to save error log, those messages will
be lost.
On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:

> On Sun, 13 Mar 2016, Rainer Gerhards wrote:
>
> Rainer, can you explain what happens when a batch fails on an action?
>>>
>>
>> It depends on when what happens and if we have batches of one or
>> batches of many. For batches of one, the failure is immediately
>> detected and acted upon. For batches of many,all action strings or
>> stored temporarily. During that time, it is assume that the messages
>> could be processed. There is no way around this assumption, because we
>> now run each action completely through the rule engine before the next
>> one is begun to process, which means that other parts of the config
>> have actually acted on a message, which means we cannot undo that
>> state. If is needed to act on the "real" state, we need to process
>> batches of one. There simply is no other solution possible. So let's
>> stay with batches of many. Once the batch is completed, the temporary
>> strings are passed to the output's commitTransaction entry point. It
>> is currently assumed that the output does inteligently handle the
>> case, and as it looks most do this well (no regression tests failed,
>> no bad reports since we introduced this with the initial v8 version
>> (15+ month ago). If the action fails, status is reflected accordingly.
>> Also, the core engine *should* (but obviously does not support this
>> right now) write an error file with the messages in error (just like
>> omelasticsearch does, which has proven to be quite good).
>>
>
> well, it's not in the specs that modules should have local error files the
> way omelasticsearch does.
>
> What I would expect if there is a problem when commiting the batch is that
> the entire batch would be marked as not delivered and try again with half
> of it.
>
> This will cause messages to other outputs working from the same queue to
> get duplciate messages, but it's better than loosing an entire batch of
> messages because one is corrupt in some way. or there is a temporary
> problem with the output.
>
> David Lang
>
> I have managed to look at least briefly at the commitTransaction core
>> engine code. I think we can even improve it and may get some ability
>> to do retries if the action itself does not handle that. At the
>> minimum, it should have the ability (which was intended originally,
>> but pushed away by more urgent work) to write that error file.
>>
>> I am very grateful that Kane is working on a repro with the standard
>> modules. Once we have that, I can go deeper into the code and check
>> what I am thinking about. It might be possible that we cannot manage
>> to repro this with the standard modules. Then we need to think about a
>> way via the testing module to do the same thing.
>>
>> I will try to address this within the 8.18 timeframe, but cannot
>> totally commit to it. There has a lot of work been going on, and
>> especially all that CI testing has taken up lots of works.
>> Nevertheless, I've been able to iron out quite some nits, so it's
>> probably worth it. I would still like to continue to get the CI
>> environment into better shape, because now I know what I am doing. If
>> I stop again and let it go for a couple of weeks, I would lose a lot
>> of momentum. I have also do to some work for support customers, which
>> keeps me able to work on rsyslog in general, so this, too, has
>> priority (and validly has so because without support customers I would
>> be totally unable to work on rsyslog other than as a hobby)
>>
>> I hope this explains the current state.
>> Rainer
>>
>>
>>> Kane, can you try and duplicate your problem using standard modules? I'm
>>> thinking that if you define a omrelp action that doesn't go anywhere, it
>>> should return errors similar to what you are trying to do, so you should
>>> be
>>> able to duplicate the problem that way.
>>>
>>> David Lang
>>>
>>>
>>>
>>>
>>> On Tue, 23 Feb 2016, Kane Kim wrote:
>>>
>>> Looking at omkafka module source code it seems that it relies on rsyslog
>>>> retries in DoAction, returning RS_RET_SUSPENDED:
>>>>
>>>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
>>>> if(iRet != RS_RET_OK) {
>>>>   iRet = RS_RET_SUSPENDED;
>>>> }
>>>>
>>>> I've tried similar code in DoAction, it seems that action processing
>>>> is suspended at that point and messages are not retried to module.
>>>>
>>>>
>>>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]>
>>>> wrote:
>>>>
>>>> Thanks for your help Rainer, I'll try to debug what's going on, so far
>>>>> it
>>>>> seems rsyslog doesn't retry even with batch size 1. It doesn't retry
>>>>> if I
>>>>> return RS_RET_SUSPENDED from DoAction as well.
>>>>>
>>>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
>>>>> <[email protected]
>>>>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>>
>>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
>>>>>>
>>>>>>>
>>>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote:
>>>>>>>
>>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some
>>>>>>>>> context:
>>>>>>>>> I
>>>>>>>>> want
>>>>>>>>> to write module that both using batching and also can't loose
>>>>>>>>>
>>>>>>>>
>>>>>> messages in
>>>>>>
>>>>>>>
>>>>>>>>> any circumstances. Are you saying it is by design that rsyslog
>>>>>>>>> can't
>>>>>>>>>
>>>>>>>>
>>>>>> do
>>>>>>
>>>>>>>
>>>>>>>>> that together? According to documentation rsyslog will retry if
>>>>>>>>> module
>>>>>>>>> returns any error. Do you plan to fix this in rsyslog or update
>>>>>>>>> documentation to say batching and retries don't work?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It depends on many things. In almost all cases, the retry should
>>>>>>>> work
>>>>>>>> well (and does so in practice). Unfortunately, I am pretty swamped.
>>>>>>>> I
>>>>>>>> need to go to a conference tomorrow and have had quite some
>>>>>>>> unexpected
>>>>>>>> work today. It would probably be good if you could ping me next week
>>>>>>>> to see if we can look into more details what is causing you pain.
>>>>>>>> But
>>>>>>>> I can't guarantee that I will be available early next week.
>>>>>>>>
>>>>>>>> In general, we cannot handle a fatal error here from an engine PoV,
>>>>>>>> because everything is already processed and we do no longer have the
>>>>>>>> original messages. This is simply needed if you want to process
>>>>>>>> messages one after another through the full config (a goal for v8
>>>>>>>> that
>>>>>>>> was muuuuch requested). As I said, the solution is to use batches of
>>>>>>>> one, because otherwise we would really need to turn back time and
>>>>>>>> undo
>>>>>>>> everything that was already done on the messages in question by
>>>>>>>> other
>>>>>>>> modules (including state advances).
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I thought that if a batch failed, it pushed all the messages back on
>>>>>>> the
>>>>>>> queue and retried with a half size batch until it got to the one
>>>>>>> message
>>>>>>> that could not be processed and only did a fatal fail on that
>>>>>>> message.
>>>>>>>
>>>>>>> Now, there is a big difference between a module giving a hard error
>>>>>>>
>>>>>>
>>>>>> "this
>>>>>>
>>>>>>>
>>>>>>> message is never going to be able to be processed no matter how many
>>>>>>>
>>>>>>
>>>>>> times
>>>>>>
>>>>>>>
>>>>>>> it's retried" vs a soft error "there is a problem delivering things
>>>>>>> to
>>>>>>>
>>>>>>
>>>>>> this
>>>>>>
>>>>>>>
>>>>>>> destination right now, retry later". I thought the batch processing
>>>>>>>
>>>>>>
>>>>>> handled
>>>>>>
>>>>>>>
>>>>>>> these differently.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> That's no longer possible with v8, at least generically. As I said
>>>>>> above, we would need to turn back time.
>>>>>>
>>>>>> But I really run out of time now...
>>>>>>
>>>>>> Rainer
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> David Lang
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>
>>>>>>
>>>>>> myriad of
>>>>>>
>>>>>>>
>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>>>
>>>>>>
>>>>>> DON'T
>>>>>>
>>>>>>>
>>>>>>> LIKE THAT.
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T
>>>> LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of
>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>> LIKE THAT.
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to