Re: [rsyslog] omelasticsearch - failed operation handling

David Lang Thu, 17 May 2018 09:26:32 -0700

On Thu, 17 May 2018, Rich Megginson wrote:

then you can mark the ones accepted as done and just retry the ones thatfail.
That's what I'm proposing.
But there's still no need for a separate ruleset and queue. In Rsyslog, ifan output cannot accept a message and there's reason to think that it willin the future, then you suspend that output and try again later. If youhave reason to believe that the message is never going to be able to bedelivered, then you need to fail the message or you will be stuck forever.This is what the error output was made for.
So how would that work on a per-record basis?
Would this be something different than using MsgConstruct -> set fields inmsg from original request -> ratelimitAddMsg for each record to resubmit?

Rainer, in a batch, is there any way to mark some of the messages as deliveredand others as failed as opposed to failing the entire batch?

If using the "index" (default) bulk type, this causes duplicate records tobe added.If using the "create" type (and you have assigned a unique _id), you willget back many 409 Duplicate errors.This causes problems - we know because this is how the fluentd plugin usedto work, which is why we had to change it.
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section"Bulk Rejections""It is much better to handle queuing in your application by gracefullyhandling the back pressure from a full queue. When you receive bulkrejections, you should take these steps:
    Pause the import thread for 3–5 seconds.
Extract the rejected actions from the bulk response, since it isprobable that many of the actions were successful. The bulk response willtell you which succeeded and which were rejected.
    Send a new bulk request with just the rejected actions.
    Repeat from step 1 if rejections are encountered again.
Using this procedure, your code naturally adapts to the load of yourcluster and naturally backs off.
"
Does it really accept some and reject some in a random manner? or is it amatter of accepting the first X and rejecting any after that point? Thefirst is easier to deal with.
It appears to be random. So you may get a failure from the first record inthe batch and the last record in the batch, and success for the others. Orvice versa. There appear to be many, many factors in the tuning, hardware,network, etc. that come into play.
There isn't an easy way to deal with this :P
Batch mode was created to be able to more efficiently process messages thatare inserted into databases, we then found that the reduced queuecongestion was a significant advantage in itself.
But unless you have a queue just for the ES action,
That's what we had to do for the fluentd case - we have a separate "ES retryqueue". One of the tricky parts is that there may be multiple outputs - youmay want to send each log record to Elasticsearch _and_ a message bus _and_ aremote rsyslog forwarder. But you only want to retry sending to Elasticsearchto avoid duplication in the other outputs.

In Rsyslog, queues are explicitly configured by the admin (for various reasons,including performance and reliability trade-offs), I really don't like the ideaof omelasticsearch creating it's own queue without these options. Kafka doesthis and it's an ongoing source of problems.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] omelasticsearch - failed operation handling

Reply via email to