ES uses worker-pool for indexing(there is a worker-pool for bulk-indexing too). Prioritizing approach may not be easy, and possibly a little dangerous too, but sizing that thread-pool is definitely easy. Just size it to your need and it'll shape the batch-size optimally when under pressure (like David explained).
On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]> wrote: > well, there is something I can't understand: If rsyslog use 10msg per bulk > because Elasticsearch keep up the sending speed, why the output queue has a > size reached maxsize and discarded.nf/enqueued = 90%. > > here is my configuration: > > ``` > action ( > type="omelasticsearch" > template="videotmpl" > server="10.13.244.214" > dynSearchIndex="on" > searchIndex="videoIndexName" > searchType="videoaccess" > bulkmode="on" > name="action_videoaccess-es1003" > queue.size="1000000" > queue.dequeuebatchsize="40000" > queue.discardmark="950000" > queue.highwatermark="600000" > queue.lowwatermark="400000" > queue.discardseverity="3" > queue.dequeueslowdown="10000" > queue.type="linkedlist" > queue.maxdiskspace="15G" > queue.maxfilesize="500M" > queue.filename="action_videoaccess-es1003" > queue.checkpointinterval="10000" > queue.saveonshutdown="on" > ) > ``` > > and pstats.log: > > ``` > 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats: > {"name":"action_videoaccess-es1003 > queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9," > discarded.nf":0,"maxqsize":28153530} > 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats: > {"name":"action_videoaccess-es1003 > queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0," > discarded.nf":442298,"maxqsize":950000} > ``` > > btw: I had try slowdown setting from 10 to 10000, no change to 10 msg per > bulk. > > 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>: > >> That might work, thanks for the feedback and the interesting article! >> >> -- >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote: >> >> > Probably a risk, something to keep an eye on (or watch the pstats from >> > rsyslog and tweak the priority if the queue too large) >> > >> > I also believe that the vast majority of searches that are typically done >> > are done wrong (see my dashboards/reports article at >> > >> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards >> > ) >> > >> > David Lang >> > >> > On Wed, 17 Jun 2015, Radu Gheorghe wrote: >> > >> > This sounds interesting, David. I guess it's possible to renice just >> some >> >> threads from an app and make it "nicer", right? Googling a bit it seems >> it >> >> is possible. >> >> >> >> The only problem I see with this approach is that searches (and other >> >> kinds >> >> of requests from other threadpools >> >> < >> >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html >> >> >) >> >> >> >> would automatically have higher priority so, with heavy searches, >> indexing >> >> might fall behind more than usual. Am I getting it right? >> >> >> >> -- >> >> Performance Monitoring * Log Analytics * Search Analytics >> >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> >> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote: >> >> >> >> Thinking about it, probably the best thing to do is to renice the ES >> >>> threads that accept the messages from rsyslog. That way if nothing else >> >>> needs the capacity, everything works at the fastest insert speed (even >> if >> >>> less optimized than if there were larger batches) But if anything else >> on >> >>> the system need the resources, the indexing threads work slower, which >> >>> will >> >>> result in larger batches. >> >>> >> >>> all self tuning. >> >>> >> >>> David Lang >> >>> >> >>> >> >>> >> >>> On Wed, 17 Jun 2015, Radu Gheorghe wrote: >> >>> >> >>> Date: Wed, 17 Jun 2015 10:20:46 +0300 >> >>> >> >>>> From: Radu Gheorghe <[email protected]> >> >>>> Reply-To: rsyslog-users <[email protected]> >> >>>> To: rsyslog-users <[email protected]> >> >>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk >> size? >> >>>> >> >>>> Maybe this went overlooked, but David suggested earlier that you can >> >>>> slowdown the queue to let more messages arrive before sending a bulk. >> >>>> queue.dequeueslowdown >> >>>> < >> >>>> >> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html >> >>>> > >> >>>> is the option and it's in microseconds. >> >>>> >> >>>> I think you have a valid point in that if batches are too small then >> >>>> Elasticsearch will do more work than necessary (as indexing in very >> >>>> small >> >>>> batches is more expensive). Plus, since the refresh rate (i.e. how >> long >> >>>> it >> >>>> may take for an indexed doc to be visible to searches, because >> Searchers >> >>>> reopen their view in the index at a certain interval) is typically a >> few >> >>>> seconds >> >>>> < >> >>>> >> >>>> >> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/ >> >>>> >> >>>>> , >> >>>>> >> >>>> >> >>>> waiting a bit before submitting a batch will have no impact on the >> user >> >>>> experience. >> >>>> >> >>>> On the other hand, in my experience you'll be sending small batches if >> >>>> the >> >>>> indexing rate is low - which means the load on ES is low anyway. So >> I'm >> >>>> not >> >>>> sure if optimizing this will actually give significant results. You >> >>>> could >> >>>> introduce that slowdown, but then rsyslog may have trouble keeping up >> >>>> when >> >>>> the load is high. You can compensate by raising the limit of maximum >> >>>> worker >> >>>> threads for the queue (queue.workerthreads) and play with >> >>>> queue.workerthreadminimummessages and >> queue.timeoutworkerthreadshutdown >> >>>> to >> >>>> make rsyslog spawn new threads when there are at least N messages in >> the >> >>>> queue (that's what min messages does) and kill them when the queue is >> >>>> smaller than that for a while (that's the timeout option). If the load >> >>>> is >> >>>> low, you'd have just one thread that works with that slowdown. >> >>>> >> >>>> I hope this helps. >> >>>> >> >>>> Best regards, >> >>>> Radu >> >>>> >> >>>> -- >> >>>> Performance Monitoring * Log Analytics * Search Analytics >> >>>> Solr & Elasticsearch Support * http://sematext.com/ >> >>>> >> >>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <[email protected]> >> >>>> wrote: >> >>>> >> >>>> So how can I define the output queue configuration? >> >>>> >> >>>>> I found the omelasticsearch action process 60000/min, and the >> >>>>> queue.discarded.nf was 600000. >> >>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200' | grep >> >>>>> Content-Length` >> >>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk size >> >>>>> is >> >>>>> only 10. Too small. >> >>>>> >> >>>>> Sometimes when I restart rsyslogd, the Content-Length grows to 8MB. >> >>>>> Why~~ >> >>>>> >> >>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>: >> >>>>> >> >>>>> On Tue, 5 May 2015, chenlin rao wrote: >> >>>>> >> >>>>>> >> >>>>>> I'm using rsyslog-elasticsearch to writing nginx accesslog into >> >>>>>> >> >>>>>> Elasticsearch cluster. I found the document told that the plugin >> >>>>>>> would >> >>>>>>> >> >>>>>>> use >> >>>>>> >> >>>>> >> >>>>> queue.dequeuesize as the bulk size.But my tcpdump show that every >> POST >> >>>>>> >> >>>>>>> only >> >>>>>>> has 8-9 events in the bulk body while my input flow is nearly 10k >> per >> >>>>>>> second. >> >>>>>>> >> >>>>>>> How can I force a larger bulk size? >> >>>>>>> >> >>>>>>> >> >>>>>>> Rsyslog adapts the size to the number of messages waiting to be >> >>>>>> >> >>>>>> delivered, >> >>>>> >> >>>>> so if it's keeping up at that size, it won't increase it. >> >>>>>> >> >>>>>> are you running impstats? if so, please look at the queue size. If >> >>>>>> it's >> >>>>>> staying low, then you just have a nice, fast ES instance that is >> able >> >>>>>> to >> >>>>>> >> >>>>>> do >> >>>>> >> >>>>> 1k inserts/sec (which is not unreasonable), so each insert would be >> >>>>>> <10 >> >>>>>> >> >>>>>>> >> >>>>>>> messages. >> >>>>>> >> >>>>>> Trying to force a larger bulk size would mean not inserting messages >> >>>>>> as >> >>>>>> fast as we can, and instead pausing and waiting for enough messages >> to >> >>>>>> accumulate to fill the bulk size. We never delay messages >> >>>>>> intentionally, >> >>>>>> each pass through the loop we grab all pending messages, up to the >> max >> >>>>>> dequeue size, and deliver them. If more messages arrive than we >> >>>>>> deliver, >> >>>>>> the next pass through the queue is larger, so we grab more messages >> >>>>>> (this >> >>>>>> quickly stabilizes to inserting messages as fast as they are >> arriving) >> >>>>>> >> >>>>>> there is a dequeue delay that forces rsyslog to sit and do nothing >> >>>>>> >> >>>>>> between >> >>>>> >> >>>>> one batch of messages and the next. It's use is discouraged, but >> >>>>>> delaying >> >>>>>> like this would allow more messages to accumulate. >> >>>>>> >> >>>>>> David Lang >> >>>>>> _______________________________________________ >> >>>>>> rsyslog mailing list >> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>>>>> http://www.rsyslog.com/professional-services/ >> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >> >>>>>> myriad >> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >> you >> >>>>>> DON'T LIKE THAT. >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> >> >>>>> rsyslog mailing list >> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>>>> http://www.rsyslog.com/professional-services/ >> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >> >>>>> myriad >> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >> you >> >>>>> DON'T LIKE THAT. >> >>>>> >> >>>>> _______________________________________________ >> >>>>> >> >>>> rsyslog mailing list >> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>>> http://www.rsyslog.com/professional-services/ >> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >> myriad >> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> >>>> DON'T LIKE THAT. >> >>>> >> >>>> _______________________________________________ >> >>>> >> >>> rsyslog mailing list >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>> http://www.rsyslog.com/professional-services/ >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >> myriad >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> >>> DON'T LIKE THAT. >> >>> >> >>> _______________________________________________ >> >> rsyslog mailing list >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >> http://www.rsyslog.com/professional-services/ >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> >> DON'T LIKE THAT. >> >> >> >> _______________________________________________ >> > rsyslog mailing list >> > http://lists.adiscon.net/mailman/listinfo/rsyslog >> > http://www.rsyslog.com/professional-services/ >> > What's up with rsyslog? Follow https://twitter.com/rgerhards >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> > DON'T LIKE THAT. >> > >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. -- Regards, Janmejay http://codehunk.wordpress.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

