2012/4/10 Vlad Grigorescu <[email protected]>:
> The thing to consider here is what happens when you have multiple rsyslog 
> servers logging to ElasticSearch. Does there need to be some kind of 
> concurrency, so that each of them have unique IDs for the messages? What 
> happens if two messages have the same ID?

If two messages have the same ID, the one that gets inserted last
overrides the previous one, and gets an incremented _version. Which
basically means you lose data, because the old message isn't there
anymore.

>
> These are questions I'm unsure of, but for now, I'm happy to use 
> ElasticSearch's automatic ID generation features.

Well, if you rely on Elasticsearch to generate the IDs, I don't think
there's a way for rsyslog to know which documents were successfully
inserted and which not:

# curl -XPUT 'http://localhost:9200/test2/'
{"ok":true,"acknowledged":true}
# curl -XPUT 'http://localhost:9200/test2/type1/_mapping' -d '
{
    "type2" : {
        "properties" : {
            "field1" : {"type" : "long"}
        }
    }
}
'
{"ok":true,"acknowledged":true}
# cat requests
{ "index" : { "_index" : "test2", "_type" : "type1" } }
{ "field1" : 1 }
{ "index" : { "_index" : "test2", "_type" : "type1" } }
{ "field1" : "bla" }
{ "index" : { "_index" : "test2", "_type" : "type1" } }
{ "field1" : 3 }
# curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
{"took":29,"items":[{"create":{"_index":"test2","_type":"type1","_id":"F5a5Rxt1RCSLXQ0N7wV4_w","_version":1,"ok":true}},{"create":{"_index":"test2","_type":"type1","_id":"vU07l91nQu-Nx9xLoextrA","error":"MapperParsingException[Failed
to parse [field1]]; nested: NumberFormatException[For input string:
\"bla\"]; 
"}},{"create":{"_index":"test2","_type":"type1","_id":"q2uJUEleRTmVv0jGoPxZkQ","_version":1,"ok":true}}]}

The only way to know which document was inserted and which not is by
order. Which looks a bit risky in my book.

>
>  --Vlad
>
> On 04/10/2012 09:49 AM, Radu Gheorghe wrote:
>> 2012/4/10  <[email protected]>:
>>> On Tue, 10 Apr 2012, Vlad Grigorescu wrote:
>>>
>>>>  a) Messages that didn't get successfully inserted should probably be
>>>> queued and reattempted once or twice before being discarded. Unfortunately,
>>>> the new transactional interface won't be sufficient here - if messages 1, 
>>>> 2,
>>>> 4, and 5 are successfully inserted, but message 3 fails, as far as I know,
>>>> there's no way in the transactional interface to communicate that only
>>>> message 3 failed, instead of message 3-5.
>>>
>>>
>>> actually, what happens is that rsyslog sends a transaction and gets a single
>>> success or failure message.
>>>
>>> if success, all messages were inserted
>>>
>>> if failure, it tries again with half as many messages to see if that goes
>>> through. If it gets down to one message and that fails, then it considers it
>>> a failure (and either retries, or drops the failed message)
>>>
>>> so if elasticsearch doesn't have transactions (all or none succeed), then
>>> some messages will be inserted multiple times.
>>
>> Maybe a solution to this is to use IDs somehow to avoid entering
>> duplicates. Trying to add the same bulk (with the same IDs) will only
>> "update" existing documents, and increment the "_version" number.
>>
>> I'm not sure how this could actually be implemented, but it might be an 
>> option.
>>
>> BTW, I'm also interested in Elasticsearch :). But since I'm using it
>> for logs, I'm not so much affected by duplicates.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>
> --
> Vlad Grigorescu | IT Security Engineer
> Office of Privacy and Information Assurance
> University of Illinois at Urbana-Champaign
> 0x632E5272 | 217.244.1922
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

Reply via email to