2012/4/11 Vlad Grigorescu <[email protected]>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 4/11/12 2:33 AM, Radu Gheorghe wrote:
>> 2012/4/10 Vlad Grigorescu <[email protected]>:
>>> The thing to consider here is what happens when you have multiple rsyslog 
>>> servers logging to ElasticSearch. Does there need to be some kind of 
>>> concurrency, so that each of them have unique IDs for the messages? What 
>>> happens if two messages have the same ID?
>>
>> If two messages have the same ID, the one that gets inserted last
>> overrides the previous one, and gets an incremented _version. Which
>> basically means you lose data, because the old message isn't there
>> anymore.
>
> Well, that's certainly not what you want when it comes to logs.
>
>>> These are questions I'm unsure of, but for now, I'm happy to use 
>>> ElasticSearch's automatic ID generation features.
>>
>> Well, if you rely on Elasticsearch to generate the IDs, I don't think
>> there's a way for rsyslog to know which documents were successfully
>> inserted and which not:
>>
>> The only way to know which document was inserted and which not is by
>> order. Which looks a bit risky in my book.
>
> According to the ES documentation[1] the order of the responses is the same 
> as the order you sent to be indexed. I'll try to confirm with the author that 
> that will remain the same down the line, but I suspect that many people rely 
> on that fact at this point.
>
>  --Vlad
>
> [1] - 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java#L32

Cool!

Then maybe there's a hackish way to actually avoid duplicates while
still relying on ES to generate the IDs. Something like:
1. when the bulk is first sent, don't put in any IDs
2. if the reply has errors, take the original bulk and complete it
with IDs where you have them
3. take half of the original bulk and re-insert
4. repeat steps 2 and 3

But I guess implementing this sort of logic is nearly as complicated,
only slower (and probably less reliable) than identifying the failed
messages and try to reindex them.

As for identifying which message are malformed and which messages are
worth retrying to insert, I guess it depends on the exception type.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

Reply via email to