Re: [rsyslog] Who is interested in ElasticSearch?

david Wed, 11 Apr 2012 11:36:46 -0700

On Wed, 11 Apr 2012, Radu Gheorghe wrote:

2012/4/11 Vlad Grigorescu <[email protected]>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


On 4/11/12 2:33 AM, Radu Gheorghe wrote:

2012/4/10 Vlad Grigorescu <[email protected]>:

The thing to consider here is what happens when you have multiple rsyslog 
servers logging to ElasticSearch. Does there need to be some kind of 
concurrency, so that each of them have unique IDs for the messages? What 
happens if two messages have the same ID?


If two messages have the same ID, the one that gets inserted last
overrides the previous one, and gets an incremented _version. Which
basically means you lose data, because the old message isn't there
anymore.


Well, that's certainly not what you want when it comes to logs.

These are questions I'm unsure of, but for now, I'm happy to use 
ElasticSearch's automatic ID generation features.


Well, if you rely on Elasticsearch to generate the IDs, I don't think
there's a way for rsyslog to know which documents were successfully
inserted and which not:

The only way to know which document was inserted and which not is by
order. Which looks a bit risky in my book.


According to the ES documentation[1] the order of the responses is the same as 
the order you sent to be indexed. I'll try to confirm with the author that that 
will remain the same down the line, but I suspect that many people rely on that 
fact at this point.

 --Vlad

[1] - 
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java#L32


Cool!

Then maybe there's a hackish way to actually avoid duplicates while
still relying on ES to generate the IDs. Something like:
1. when the bulk is first sent, don't put in any IDs
2. if the reply has errors, take the original bulk and complete it
with IDs where you have them
3. take half of the original bulk and re-insert
4. repeat steps 2 and 3

But I guess implementing this sort of logic is nearly as complicated,
only slower (and probably less reliable) than identifying the failed
messages and try to reindex them.

As for identifying which message are malformed and which messages are
worth retrying to insert, I guess it depends on the exception type.

Actually, I think that the way the default batch handling works would wantlogic more like:


when the bulk is sent, try to insert it all

if any inserts fail, remove any messages that were successfully inserted

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

Re: [rsyslog] Who is interested in ElasticSearch?

Reply via email to