On Sun, 13 Mar 2016, Kane Kim wrote:

+1to David, having module handle the errors is not 100% reliable, what if
module dies before having a chance to save error log, those messages will
be lost.

well, the module is a thread of rsyslog, so if it dies, rsyslog has died.

But if you are doing large batches (hundreds or thousands), putting all of them in a 'bad messages' file because one has a problem seems like the wrong thing to do.

given that the error is not detected until the batch is being flushed for the output, any outputs that have already flushed their batches can't be changed, so the question is if it's worse to have messages resent to those outputs as the batch gets re-processed, or having to have the output go to a "couldn't deliver" file.

It's bad enough if the problem is a corrupt message that can never be delivered and the error is a permanent one, but if the problem is instead a temporary error (server is down), you will have a LOT of messages going into this 'couldn't deliver' bucket, when what should really be happening is that rsyslog should be pausing in the processing of messages, retrying after the suspended timeout.

David Lang

On Mar 13, 2016 9:07 PM, "David Lang" <[email protected]> wrote:

On Sun, 13 Mar 2016, Rainer Gerhards wrote:

Rainer, can you explain what happens when a batch fails on an action?


It depends on when what happens and if we have batches of one or
batches of many. For batches of one, the failure is immediately
detected and acted upon. For batches of many,all action strings or
stored temporarily. During that time, it is assume that the messages
could be processed. There is no way around this assumption, because we
now run each action completely through the rule engine before the next
one is begun to process, which means that other parts of the config
have actually acted on a message, which means we cannot undo that
state. If is needed to act on the "real" state, we need to process
batches of one. There simply is no other solution possible. So let's
stay with batches of many. Once the batch is completed, the temporary
strings are passed to the output's commitTransaction entry point. It
is currently assumed that the output does inteligently handle the
case, and as it looks most do this well (no regression tests failed,
no bad reports since we introduced this with the initial v8 version
(15+ month ago). If the action fails, status is reflected accordingly.
Also, the core engine *should* (but obviously does not support this
right now) write an error file with the messages in error (just like
omelasticsearch does, which has proven to be quite good).


well, it's not in the specs that modules should have local error files the
way omelasticsearch does.

What I would expect if there is a problem when commiting the batch is that
the entire batch would be marked as not delivered and try again with half
of it.

This will cause messages to other outputs working from the same queue to
get duplciate messages, but it's better than loosing an entire batch of
messages because one is corrupt in some way. or there is a temporary
problem with the output.

David Lang

I have managed to look at least briefly at the commitTransaction core
engine code. I think we can even improve it and may get some ability
to do retries if the action itself does not handle that. At the
minimum, it should have the ability (which was intended originally,
but pushed away by more urgent work) to write that error file.

I am very grateful that Kane is working on a repro with the standard
modules. Once we have that, I can go deeper into the code and check
what I am thinking about. It might be possible that we cannot manage
to repro this with the standard modules. Then we need to think about a
way via the testing module to do the same thing.

I will try to address this within the 8.18 timeframe, but cannot
totally commit to it. There has a lot of work been going on, and
especially all that CI testing has taken up lots of works.
Nevertheless, I've been able to iron out quite some nits, so it's
probably worth it. I would still like to continue to get the CI
environment into better shape, because now I know what I am doing. If
I stop again and let it go for a couple of weeks, I would lose a lot
of momentum. I have also do to some work for support customers, which
keeps me able to work on rsyslog in general, so this, too, has
priority (and validly has so because without support customers I would
be totally unable to work on rsyslog other than as a hobby)

I hope this explains the current state.
Rainer


Kane, can you try and duplicate your problem using standard modules? I'm
thinking that if you define a omrelp action that doesn't go anywhere, it
should return errors similar to what you are trying to do, so you should
be
able to duplicate the problem that way.

David Lang




On Tue, 23 Feb 2016, Kane Kim wrote:

Looking at omkafka module source code it seems that it relies on rsyslog
retries in DoAction, returning RS_RET_SUSPENDED:

DBGPRINTF("omkafka: writeKafka returned %d\n", iRet);
if(iRet != RS_RET_OK) {
  iRet = RS_RET_SUSPENDED;
}

I've tried similar code in DoAction, it seems that action processing
is suspended at that point and messages are not retried to module.


On Tue, Feb 23, 2016 at 9:06 AM, Kane Kim <[email protected]>
wrote:

Thanks for your help Rainer, I'll try to debug what's going on, so far
it
seems rsyslog doesn't retry even with batch size 1. It doesn't retry
if I
return RS_RET_SUSPENDED from DoAction as well.

On Tue, Feb 23, 2016 at 9:05 AM, Rainer Gerhards
<[email protected]


wrote:



2016-02-23 18:03 GMT+01:00 David Lang <[email protected]>:


On Tue, 23 Feb 2016, Rainer Gerhards wrote:

2016-02-23 17:38 GMT+01:00 Kane Kim <[email protected]>:



Hello Rainer, thanks for the prompt reply! To give you some
context:
I
want
to write module that both using batching and also can't loose


messages in


any circumstances. Are you saying it is by design that rsyslog
can't


do


that together? According to documentation rsyslog will retry if
module
returns any error. Do you plan to fix this in rsyslog or update
documentation to say batching and retries don't work?




It depends on many things. In almost all cases, the retry should
work
well (and does so in practice). Unfortunately, I am pretty swamped.
I
need to go to a conference tomorrow and have had quite some
unexpected
work today. It would probably be good if you could ping me next week
to see if we can look into more details what is causing you pain.
But
I can't guarantee that I will be available early next week.

In general, we cannot handle a fatal error here from an engine PoV,
because everything is already processed and we do no longer have the
original messages. This is simply needed if you want to process
messages one after another through the full config (a goal for v8
that
was muuuuch requested). As I said, the solution is to use batches of
one, because otherwise we would really need to turn back time and
undo
everything that was already done on the messages in question by
other
modules (including state advances).




I thought that if a batch failed, it pushed all the messages back on
the
queue and retried with a half size batch until it got to the one
message
that could not be processed and only did a fatal fail on that
message.

Now, there is a big difference between a module giving a hard error


"this


message is never going to be able to be processed no matter how many


times


it's retried" vs a soft error "there is a problem delivering things
to


this


destination right now, retry later". I thought the batch processing


handled


these differently.



That's no longer possible with v8, at least generically. As I said
above, we would need to turn back time.

But I really run out of time now...

Rainer



David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a


myriad of


sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you


DON'T


LIKE THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.



_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to