On Wed, 11 Apr 2012, Radu Gheorghe wrote:
2012/4/11 Vlad Grigorescu <[email protected]>:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 4/11/12 2:33 AM, Radu Gheorghe wrote:
2012/4/10 Vlad Grigorescu <[email protected]>:
The thing to consider here is what happens when you have multiple rsyslog
servers logging to ElasticSearch. Does there need to be some kind of
concurrency, so that each of them have unique IDs for the messages? What
happens if two messages have the same ID?
If two messages have the same ID, the one that gets inserted last
overrides the previous one, and gets an incremented _version. Which
basically means you lose data, because the old message isn't there
anymore.
Well, that's certainly not what you want when it comes to logs.
These are questions I'm unsure of, but for now, I'm happy to use
ElasticSearch's automatic ID generation features.
Well, if you rely on Elasticsearch to generate the IDs, I don't think
there's a way for rsyslog to know which documents were successfully
inserted and which not:
The only way to know which document was inserted and which not is by
order. Which looks a bit risky in my book.
According to the ES documentation[1] the order of the responses is the same as
the order you sent to be indexed. I'll try to confirm with the author that that
will remain the same down the line, but I suspect that many people rely on that
fact at this point.
--Vlad
[1] -
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java#L32
Cool!
Then maybe there's a hackish way to actually avoid duplicates while
still relying on ES to generate the IDs. Something like:
1. when the bulk is first sent, don't put in any IDs
2. if the reply has errors, take the original bulk and complete it
with IDs where you have them
3. take half of the original bulk and re-insert
4. repeat steps 2 and 3
But I guess implementing this sort of logic is nearly as complicated,
only slower (and probably less reliable) than identifying the failed
messages and try to reindex them.
As for identifying which message are malformed and which messages are
worth retrying to insert, I guess it depends on the exception type.
Actually, I think that the way the default batch handling works would want
logic more like:
when the bulk is sent, try to insert it all
if any inserts fail, remove any messages that were successfully inserted
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/