On Thu, 17 May 2018, Rich Megginson wrote:

then you can mark the ones accepted as done and just retry the ones that fail.

That's what I'm proposing.

But there's still no need for a separate ruleset and queue. In Rsyslog, if an output cannot accept a message and there's reason to think that it will in the future, then you suspend that output and try again later. If you have reason to believe that the message is never going to be able to be delivered, then you need to fail the message or you will be stuck forever. This is what the error output was made for.

So how would that work on a per-record basis?

Would this be something different than using MsgConstruct -> set fields in msg from original request -> ratelimitAddMsg for each record to resubmit?

Rainer, in a batch, is there any way to mark some of the messages as delivered and others as failed as opposed to failing the entire batch?


If using the "index" (default) bulk type, this causes duplicate records to be added. If using the "create" type (and you have assigned a unique _id), you will get back many 409 Duplicate errors. This causes problems - we know because this is how the fluentd plugin used to work, which is why we had to change it.

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section "Bulk Rejections" "It is much better to handle queuing in your application by gracefully handling the back pressure from a full queue. When you receive bulk rejections, you should take these steps:

    Pause the import thread for 3–5 seconds.
    Extract the rejected actions from the bulk response, since it is probable that many of the actions were successful. The bulk response will tell you which succeeded and which were rejected.
    Send a new bulk request with just the rejected actions.
    Repeat from step 1 if rejections are encountered again.

Using this procedure, your code naturally adapts to the load of your cluster and naturally backs off.
"

Does it really accept some and reject some in a random manner? or is it a matter of accepting the first X and rejecting any after that point? The first is easier to deal with.

It appears to be random.  So you may get a failure from the first record in the batch and the last record in the batch, and success for the others.  Or vice versa.  There appear to be many, many factors in the tuning, hardware, network, etc. that come into play.

There isn't an easy way to deal with this :P



Batch mode was created to be able to more efficiently process messages that are inserted into databases, we then found that the reduced queue congestion was a significant advantage in itself.

But unless you have a queue just for the ES action,

That's what we had to do for the fluentd case - we have a separate "ES retry queue".  One of the tricky parts is that there may be multiple outputs - you may want to send each log record to Elasticsearch _and_ a message bus _and_ a remote rsyslog forwarder. But you only want to retry sending to Elasticsearch to avoid duplication in the other outputs.

In Rsyslog, queues are explicitly configured by the admin (for various reasons, including performance and reliability trade-offs), I really don't like the idea of omelasticsearch creating it's own queue without these options. Kafka does this and it's an ongoing source of problems.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to