Hi, I've got a question about the TTL in elasticsearch sink of apache flume
I've working on elastic search + flume integration. I'm using elasticsearch version 1.4.1 and flume version 1.5.2 Both are running locally on my machine In Flume My ElasticSearch Sink is configured as follows: agent.sinks.elasticSearchSink.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink agent.sinks.elasticSearchSink.channel = fileChannel agent.sinks.elasticSearchSink.hostNames=localhost:9300 agent.sinks.elasticSearchSink.indexName=platform agent.sinks.elasticSearchSink.indexType=platformtype agent.sinks.elasticSearchSink.ttl=1m agent.sinks.elasticSearchSink.batchSize=1000 agent.sinks.elasticSearchSink.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer Note, there is a ttl of 1m (1 minute) for the sake of test. The ES starts empty with default configurations Now, I place the log events in my system into flume and see that they get stored in elasticsearch, for example, after 3 events being added to flume, I see in ES rest interface: >> GET:: http://localhost:9200/_search { - "took": 2, - "timed_out": false, - "_shards": { - "total": 5, - "successful": 5, - "failed": 0 }, - "hits": { - "total": 3, - "max_score": 1, - "hits": [ - { - "_index": "platform-2015-03-15", - "_type": "platformtype", - "_id": "AUweJbNImsCLrYBu-7gJ", - "_score": 1, - "_source": { - "@message": "", - "@fields": { … } } }, - { - "_index": "platform-2015-03-15", - "_type": "platformtype", - "_id": "AUweJbNImsCLrYBu-7gI", - "_score": 1, - "_source": { - "@message": "", - "@fields": { … } } }, - { - "_index": "platform-2015-03-15", - "_type": "platformtype", - "_id": "AUweJbNJmsCLrYBu-7gK", - "_score": 1, - "_source": { - "@message": "", - "@fields": { … } } } ] } } Now, I would expect the messages to be deleted after a minute, but unfortunately its not the case. I can see that the ES index doesn't include any TTL definition at all: >> GET: http://localhost:9200/_all/platformtype/_mapping { - "platform-2015-03-15": { - "mappings": { - "platformtype": { - "properties": { - "@fields": { - "properties": { … } // the event properties, all are present here } } } } } } So, these messages get stuck forever in ES. I know that _ttl is disabled by default in ES as stated here: http://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-ttl-field.html So, I'm trying to enable the TTL "manually" to examine the behavior: >> PUT: http://localhost:9200/_all/platformtype/_mapping with body: {"platformtype" : {"_ttl" : {"enabled" : true, "default" : "2m"}}} This results (the ttl has been set): { - "acknowledged": true } Note, that intentionally I've put 2m unlike the definition of 1 minute in flume sink configuration. So, now I can see the following in the mapping: >> http://localhost:9200/_all/platformtype/_mapping { - "platform-2015-03-15": { - "mappings": { - "platformtype": { - "_ttl": { - "enabled": true, - "default": 120000 }, - "properties": { … } } } } } Ok, Now I'm adding 3 more events to flume and there are totally 6 events in ES now. I'm waiting for 1 minute and the messages get deleted (it takes less than 2 minutes), which means that ES sink's TTL definition definitely take place. So I'm confused, I've assumed that the TTL on index is working by default, based solely on flume elastic search definitions, but it looks that I'm wrong. Could you please explain, whether its a bug in ES sink or intended behavior, how it should work? Thanks and have a nice day, Mark Bramnik
