2015-08-24 7:42 GMT+02:00 Radu Gheorghe <[email protected]>: > On Sat, Aug 22, 2015 at 6:26 AM, David Lang <[email protected]> wrote: > >> On Fri, 21 Aug 2015, Otis Gospodnetić wrote: >> >> Hi, >>> >>> This sounds like something that should be om-specific. What Radu is >>> suggesting would definitely help with ES, but may not be relevant for >>> other >>> output targets. >>> What I think is overlooked here is the ES side - more specifically ES and >>> searches that ES has to handle. If we don't care about maxing out ES and >>> just pushing data in it as fast as it arrives, then how >>> rsyslog/omelasticsearch works today makes sense. But this approach if >>> focused on ingestion and ignores how this can hurt ES's ability to handle >>> queries in a timely manner. Exposing controls Radu suggested would help >>> people avoid this problem. I know David would like to see numbers :) I >>> love numbers, too, but I'm not sure if we'll have the time to provide them >>> :( That said, we work with ES 24/7 and have been doing that for years >>> (many hundreds of ES deployments under our belt by now), so I am hoping >>> somebody will trust us this option would be great to have in >>> omelasticsearch. :) >>> >> >> I think that this really should be addressed on the ElasticSearch side of >> things. >> >> This really shouldn't be a numerical limit thing. >> >> What is ideal is that if ES is lightly loaded, things get pushed into ES >> with the minimum latency. But if ES is more heavily loaded, batch things up. >> >> The right way to do this (as I said in another discussion) is for ES to >> have a way to prioritize searches over inputting new data. That way as the >> load climbs, the rate of processing new inserts will slow and inserts will >> get batched more. > > > While that would be an option (and I guess it can be done by tuning sizes > and priorities of threadpools - I don't see another way), I don't agree > that it's the right way to do this. In my experience, you'd want to avoid > to put load on ES in the first place. ES does lots of things besides > actually indexing and searching. Cluster management, for instance, where > nodes are pinging each other and gathering statistics of what each node is > doing and each shard hosted on said node. Re-opening searchers to make > newly indexed data available for searches, warming up caches, backing up > data and so on. There's a semi-complete list of thread pools here: > https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html > and obviously a single threadpool doesn't only do one job. And ideally, > you'd want all these tasks to be snappy, you don't want a node to drop out > of the cluster because it didn't reply to requests in a timely manner. > > As a result, I wouldn't put load on the indexing end just because I can > (i.e. I'm not generating "enough load" to justify batching). Plus, > forwarding data "immediately" (as opposed to every second or every 5 > seconds...) isn't necessarily helping the user, either. Elasticsearch is > "near realtime" in the sense that by default, it "refreshes" the view on > the index periodically to expose newly indexed data. This trades off some > "realtime-ness" for "cache-ability" (both internally and at the OS level). > Normally, users would make this refresh interval as long as possible > without impacting the user experience too much, in order to reduce the load > and increase the indexing throughput (some numbers here: > http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/). > Because of this, in the logging case a refresh interval of 5-10 seconds or > even more is common, especially when you do lots of indexing. That's why > I'm saying it doesn't really matter if rsyslog sends data immediately or > waits a second or two for batches to be larger.
I am mostly with Radu on this topic. I think there are some use cases where it really would be advantageous to submit a larger batch, even if this means waiting. True, these use cases were very seldom in the early days of rsyslog and may still be, but I think it's something one might validly want. Just my 2cts... Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

