2016-03-13 0:46 GMT+01:00 David Lang <[email protected]>:
> now that 8.17 is out and the high profile bugs found in the last couple of
> weeks have been dealt with, let's look at this again.

Yeah, but please bear a bit with me. I had to push quite some things
away in order to get to that release and these other things really
demand some time (among that some final work needed to relase v2 of
liblognorm).

>
> Rainer, can you explain what happens when a batch fails on an action?

It depends on when what happens and if we have batches of one or
batches of many. For batches of one, the failure is immediately
detected and acted upon. For batches of many,all action strings or
stored temporarily. During that time, it is assume that the messages
could be processed. There is no way around this assumption, because we
now run each action completely through the rule engine before the next
one is begun to process, which means that other parts of the config
have actually acted on a message, which means we cannot undo that
state. If is needed to act on the "real" state, we need to process
batches of one. There simply is no other solution possible. So let's
stay with batches of many. Once the batch is completed, the temporary
strings are passed to the output's commitTransaction entry point. It
is currently assumed that the output does inteligently handle the
case, and as it looks most do this well (no regression tests failed,
no bad reports since we introduced this with the initial v8 version
(15+ month ago). If the action fails, status is reflected accordingly.
Also, the core engine *should* (but obviously does not support this
right now) write an error file with the messages in error (just like
omelasticsearch does, which has proven to be quite good).

I have managed to look at least briefly at the commitTransaction core
engine code. I think we can even improve it and may get some ability
to do retries if the action itself does not handle that. At the
minimum, it should have the ability (which was intended originally,
but pushed away by more urgent work) to write that error file.

I am very grateful that Kane is working on a repro with the standard
modules. Once we have that, I can go deeper into the code and check
what I am thinking about. It might be possible that we cannot manage
to repro this with the standard modules. Then we need to think about a
way via the testing module to do the same thing.

I will try to address this within the 8.18 timeframe, but cannot
totally commit to it. There has a lot of work been going on, and
especially all that CI testing has taken up lots of works.
Nevertheless, I've been able to iron out quite some nits, so it's
probably worth it. I would still like to continue to get the CI
environment into better shape, because now I know what I am doing. If
I stop again and let it go for a couple of weeks, I would lose a lot
of momentum. I have also do to some work for support customers, which
keeps me able to work on rsyslog in general, so this, too, has
priority (and validly has so because without support customers I would
be totally unable to work on rsyslog other than as a hobby)

I hope this explains the current state.
Rainer

>
> Kane, can you try and duplicate your problem using standard modules? I'm
> thinking that if you define a omrelp action that doesn't go anywhere, it
> should return errors similar to what you are trying to do, so you should be
> able to duplicate the problem that way.
>
> David Lang
>
>
>
>
> On Tue, 23 Feb 2016, Kane Kim wrote:
>
>> Looking at omkafka module source code it seems that it relies on rsyslog
>> retries in DoAction, returning RS_RET_SUSPENDED:
>>
>> DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
>> if(iRet != RS_RET_OK) {
>>   iRet = RS_RET_SUSPENDED;
>> }
>>
>> I've tried similar code in DoAction, it seems that action processing
>> is suspended at that point and messages are not retried to module.
>>
>>
>> On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]> wrote:
>>
>>> Thanks for your help Rainer, I'll try to debug what's going on, so far it
>>> seems rsyslog doesn't retry even with batch size 1. It doesn't retry if I
>>> return RS_RET_SUSPENDED from DoAction as well.
>>>
>>> On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
>>> <[email protected]
>>>>
>>>> wrote:
>>>
>>>
>>>> 2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:
>>>>>
>>>>> On Tue, 23 Feb 2016, Rainer Gerhards wrote:
>>>>>
>>>>>> 2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:
>>>>>>>
>>>>>>>
>>>>>>> Hello Rainer, thanks for the prompt reply! To give you some context:
>>>>>>> I
>>>>>>> want
>>>>>>> to write module that both using batching and also can't loose
>>>>
>>>> messages in
>>>>>>>
>>>>>>> any circumstances. Are you saying it is by design that rsyslog can't
>>>>
>>>> do
>>>>>>>
>>>>>>> that together? According to documentation rsyslog will retry if
>>>>>>> module
>>>>>>> returns any error. Do you plan to fix this in rsyslog or update
>>>>>>> documentation to say batching and retries don't work?
>>>>>>
>>>>>>
>>>>>>
>>>>>> It depends on many things. In almost all cases, the retry should work
>>>>>> well (and does so in practice). Unfortunately, I am pretty swamped. I
>>>>>> need to go to a conference tomorrow and have had quite some unexpected
>>>>>> work today. It would probably be good if you could ping me next week
>>>>>> to see if we can look into more details what is causing you pain. But
>>>>>> I can't guarantee that I will be available early next week.
>>>>>>
>>>>>> In general, we cannot handle a fatal error here from an engine PoV,
>>>>>> because everything is already processed and we do no longer have the
>>>>>> original messages. This is simply needed if you want to process
>>>>>> messages one after another through the full config (a goal for v8 that
>>>>>> was muuuuch requested). As I said, the solution is to use batches of
>>>>>> one, because otherwise we would really need to turn back time and undo
>>>>>> everything that was already done on the messages in question by other
>>>>>> modules (including state advances).
>>>>>
>>>>>
>>>>>
>>>>> I thought that if a batch failed, it pushed all the messages back on
>>>>> the
>>>>> queue and retried with a half size batch until it got to the one
>>>>> message
>>>>> that could not be processed and only did a fatal fail on that message.
>>>>>
>>>>> Now, there is a big difference between a module giving a hard error
>>>>
>>>> "this
>>>>>
>>>>> message is never going to be able to be processed no matter how many
>>>>
>>>> times
>>>>>
>>>>> it's retried" vs a soft error "there is a problem delivering things to
>>>>
>>>> this
>>>>>
>>>>> destination right now, retry later". I thought the batch processing
>>>>
>>>> handled
>>>>>
>>>>> these differently.
>>>>
>>>>
>>>> That's no longer possible with v8, at least generically. As I said
>>>> above, we would need to turn back time.
>>>>
>>>> But I really run out of time now...
>>>>
>>>> Rainer
>>>>>
>>>>>
>>>>> David Lang
>>>>>
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>
>>>> myriad of
>>>>>
>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>
>>>> DON'T
>>>>>
>>>>> LIKE THAT.
>>>>
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to