On Thu, 17 May 2018, Rich Megginson wrote:
then you can mark the ones accepted as done and just retry the ones that
fail.
That's what I'm proposing.
But there's still no need for a separate ruleset and queue. In Rsyslog, if
an output cannot accept a message and there's reason to think that it will
in the future, then you suspend that output and try again later. If you
have reason to believe that the message is never going to be able to be
delivered, then you need to fail the message or you will be stuck forever.
This is what the error output was made for.
So how would that work on a per-record basis?
Would this be something different than using MsgConstruct -> set fields in
msg from original request -> ratelimitAddMsg for each record to resubmit?
Rainer, in a batch, is there any way to mark some of the messages as delivered
and others as failed as opposed to failing the entire batch?
If using the "index" (default) bulk type, this causes duplicate records to
be added.
If using the "create" type (and you have assigned a unique _id), you will
get back many 409 Duplicate errors.
This causes problems - we know because this is how the fluentd plugin used
to work, which is why we had to change it.
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section
"Bulk Rejections"
"It is much better to handle queuing in your application by gracefully
handling the back pressure from a full queue. When you receive bulk
rejections, you should take these steps:
Pause the import thread for 3–5 seconds.
Extract the rejected actions from the bulk response, since it is
probable that many of the actions were successful. The bulk response will
tell you which succeeded and which were rejected.
Send a new bulk request with just the rejected actions.
Repeat from step 1 if rejections are encountered again.
Using this procedure, your code naturally adapts to the load of your
cluster and naturally backs off.
"
Does it really accept some and reject some in a random manner? or is it a
matter of accepting the first X and rejecting any after that point? The
first is easier to deal with.
It appears to be random. So you may get a failure from the first record in
the batch and the last record in the batch, and success for the others. Or
vice versa. There appear to be many, many factors in the tuning, hardware,
network, etc. that come into play.
There isn't an easy way to deal with this :P
Batch mode was created to be able to more efficiently process messages that
are inserted into databases, we then found that the reduced queue
congestion was a significant advantage in itself.
But unless you have a queue just for the ES action,
That's what we had to do for the fluentd case - we have a separate "ES retry
queue". One of the tricky parts is that there may be multiple outputs - you
may want to send each log record to Elasticsearch _and_ a message bus _and_ a
remote rsyslog forwarder. But you only want to retry sending to Elasticsearch
to avoid duplication in the other outputs.
In Rsyslog, queues are explicitly configured by the admin (for various reasons,
including performance and reliability trade-offs), I really don't like the idea
of omelasticsearch creating it's own queue without these options. Kafka does
this and it's an ongoing source of problems.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.