Re: [rsyslog] how to force a larger omelasticsearch bulk size?

singh.janmejay Wed, 17 Jun 2015 08:50:28 -0700

ES uses worker-pool for indexing(there is a worker-pool for
bulk-indexing too). Prioritizing approach may not be easy, and
possibly a little dangerous too, but sizing that thread-pool is
definitely easy. Just size it to your need and it'll shape the
batch-size optimally when under pressure (like David explained).


On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]> wrote:
> well, there is something I can't understand: If rsyslog use 10msg per bulk
> because Elasticsearch keep up the sending speed, why the output queue has a
> size reached maxsize and discarded.nf/enqueued = 90%.
>
> here is my configuration:
>
> ```
>    action (
>         type="omelasticsearch"
>         template="videotmpl"
>         server="10.13.244.214"
>         dynSearchIndex="on"
>         searchIndex="videoIndexName"
>         searchType="videoaccess"
>         bulkmode="on"
>         name="action_videoaccess-es1003"
>         queue.size="1000000"
>         queue.dequeuebatchsize="40000"
>         queue.discardmark="950000"
>         queue.highwatermark="600000"
>         queue.lowwatermark="400000"
>         queue.discardseverity="3"
>         queue.dequeueslowdown="10000"
>         queue.type="linkedlist"
>         queue.maxdiskspace="15G"
>         queue.maxfilesize="500M"
>         queue.filename="action_videoaccess-es1003"
>         queue.checkpointinterval="10000"
>         queue.saveonshutdown="on"
>     )
> ```
>
> and pstats.log:
>
> ```
> 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
> {"name":"action_videoaccess-es1003
> queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
> discarded.nf":0,"maxqsize":28153530}
> 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
> {"name":"action_videoaccess-es1003
> queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
> discarded.nf":442298,"maxqsize":950000}
> ```
>
> btw: I had try slowdown setting from 10 to 10000, no change to 10 msg per
> bulk.
>
> 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>:
>
>> That might work, thanks for the feedback and the interesting article!
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:
>>
>> > Probably a risk, something to keep an eye on (or watch the pstats from
>> > rsyslog and tweak the priority if the queue too large)
>> >
>> > I also believe that the vast majority of searches that are typically done
>> > are done wrong (see my dashboards/reports article at
>> >
>> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
>> > )
>> >
>> > David Lang
>> >
>> > On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>> >
>> >  This sounds interesting, David. I guess it's possible to renice just
>> some
>> >> threads from an app and make it "nicer", right? Googling a bit it seems
>> it
>> >> is possible.
>> >>
>> >> The only problem I see with this approach is that searches (and other
>> >> kinds
>> >> of requests from other threadpools
>> >> <
>> >>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
>> >> >)
>> >>
>> >> would automatically have higher priority so, with heavy searches,
>> indexing
>> >> might fall behind more than usual. Am I getting it right?
>> >>
>> >> --
>> >> Performance Monitoring * Log Analytics * Search Analytics
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote:
>> >>
>> >>  Thinking about it, probably the best thing to do is to renice the ES
>> >>> threads that accept the messages from rsyslog. That way if nothing else
>> >>> needs the capacity, everything works at the fastest insert speed (even
>> if
>> >>> less optimized than if there were larger batches) But if anything else
>> on
>> >>> the system need the resources, the indexing threads work slower, which
>> >>> will
>> >>> result in larger batches.
>> >>>
>> >>> all self tuning.
>> >>>
>> >>> David Lang
>> >>>
>> >>>
>> >>>
>> >>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>> >>>
>> >>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
>> >>>
>> >>>> From: Radu Gheorghe <[email protected]>
>> >>>> Reply-To: rsyslog-users <[email protected]>
>> >>>> To: rsyslog-users <[email protected]>
>> >>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk
>> size?
>> >>>>
>> >>>> Maybe this went overlooked, but David suggested earlier that you can
>> >>>> slowdown the queue to let more messages arrive before sending a bulk.
>> >>>> queue.dequeueslowdown
>> >>>> <
>> >>>>
>> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
>> >>>> >
>> >>>> is the option and it's in microseconds.
>> >>>>
>> >>>> I think you have a valid point in that if batches are too small then
>> >>>> Elasticsearch will do more work than necessary (as indexing in very
>> >>>> small
>> >>>> batches is more expensive). Plus, since the refresh rate (i.e. how
>> long
>> >>>> it
>> >>>> may take for an indexed doc to be visible to searches, because
>> Searchers
>> >>>> reopen their view in the index at a certain interval) is typically a
>> few
>> >>>> seconds
>> >>>> <
>> >>>>
>> >>>>
>> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
>> >>>>
>> >>>>> ,
>> >>>>>
>> >>>>
>> >>>> waiting a bit before submitting a batch will have no impact on the
>> user
>> >>>> experience.
>> >>>>
>> >>>> On the other hand, in my experience you'll be sending small batches if
>> >>>> the
>> >>>> indexing rate is low - which means the load on ES is low anyway. So
>> I'm
>> >>>> not
>> >>>> sure if optimizing this will actually give significant results. You
>> >>>> could
>> >>>> introduce that slowdown, but then rsyslog may have trouble keeping up
>> >>>> when
>> >>>> the load is high. You can compensate by raising the limit of maximum
>> >>>> worker
>> >>>> threads for the queue (queue.workerthreads) and play with
>> >>>> queue.workerthreadminimummessages and
>> queue.timeoutworkerthreadshutdown
>> >>>> to
>> >>>> make rsyslog spawn new threads when there are at least N messages in
>> the
>> >>>> queue (that's what min messages does) and kill them when the queue is
>> >>>> smaller than that for a while (that's the timeout option). If the load
>> >>>> is
>> >>>> low, you'd have just one thread that works with that slowdown.
>> >>>>
>> >>>> I hope this helps.
>> >>>>
>> >>>> Best regards,
>> >>>> Radu
>> >>>>
>> >>>> --
>> >>>> Performance Monitoring * Log Analytics * Search Analytics
>> >>>> Solr & Elasticsearch Support * http://sematext.com/
>> >>>>
>> >>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <[email protected]>
>> >>>> wrote:
>> >>>>
>> >>>>  So how can I define the output queue configuration?
>> >>>>
>> >>>>> I found the omelasticsearch action process 60000/min, and the
>> >>>>> queue.discarded.nf was 600000.
>> >>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
>> >>>>> Content-Length`
>> >>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk size
>> >>>>> is
>> >>>>> only 10. Too small.
>> >>>>>
>> >>>>> Sometimes when I restart rsyslogd, the Content-Length grows to 8MB.
>> >>>>> Why~~
>> >>>>>
>> >>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
>> >>>>>
>> >>>>>  On Tue, 5 May 2015, chenlin rao wrote:
>> >>>>>
>> >>>>>>
>> >>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog into
>> >>>>>>
>> >>>>>>  Elasticsearch cluster. I found the document told that the plugin
>> >>>>>>> would
>> >>>>>>>
>> >>>>>>>  use
>> >>>>>>
>> >>>>>
>> >>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that every
>> POST
>> >>>>>>
>> >>>>>>> only
>> >>>>>>> has 8-9 events in the bulk body while my input flow is nearly 10k
>> per
>> >>>>>>> second.
>> >>>>>>>
>> >>>>>>> How can I force a larger bulk size?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>  Rsyslog adapts the size to the number of messages waiting to be
>> >>>>>>
>> >>>>>>  delivered,
>> >>>>>
>> >>>>>  so if it's keeping up at that size, it won't increase it.
>> >>>>>>
>> >>>>>> are you running impstats? if so, please look at the queue size. If
>> >>>>>> it's
>> >>>>>> staying low, then you just have a nice, fast ES instance that is
>> able
>> >>>>>> to
>> >>>>>>
>> >>>>>>  do
>> >>>>>
>> >>>>>  1k inserts/sec (which is not unreasonable), so each insert would be
>> >>>>>> <10
>> >>>>>>
>> >>>>>>>
>> >>>>>>>  messages.
>> >>>>>>
>> >>>>>> Trying to force a larger bulk size would mean not inserting messages
>> >>>>>> as
>> >>>>>> fast as we can, and instead pausing and waiting for enough messages
>> to
>> >>>>>> accumulate to fill the bulk size. We never delay messages
>> >>>>>> intentionally,
>> >>>>>> each pass through the loop we grab all pending messages, up to the
>> max
>> >>>>>> dequeue size, and deliver them. If more messages arrive than we
>> >>>>>> deliver,
>> >>>>>> the next pass through the queue is larger, so we grab more messages
>> >>>>>> (this
>> >>>>>> quickly stabilizes to inserting messages as fast as they are
>> arriving)
>> >>>>>>
>> >>>>>> there is a dequeue delay that forces rsyslog to sit and do nothing
>> >>>>>>
>> >>>>>>  between
>> >>>>>
>> >>>>>  one batch of messages and the next. It's use is discouraged, but
>> >>>>>> delaying
>> >>>>>> like this would allow more messages to accumulate.
>> >>>>>>
>> >>>>>> David Lang
>> >>>>>> _______________________________________________
>> >>>>>> rsyslog mailing list
>> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>>>>> http://www.rsyslog.com/professional-services/
>> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> >>>>>> myriad
>> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you
>> >>>>>> DON'T LIKE THAT.
>> >>>>>>
>> >>>>>>  _______________________________________________
>> >>>>>>
>> >>>>> rsyslog mailing list
>> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>>>> http://www.rsyslog.com/professional-services/
>> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> >>>>> myriad
>> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you
>> >>>>> DON'T LIKE THAT.
>> >>>>>
>> >>>>>  _______________________________________________
>> >>>>>
>> >>>> rsyslog mailing list
>> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>>> http://www.rsyslog.com/professional-services/
>> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >>>> DON'T LIKE THAT.
>> >>>>
>> >>>>  _______________________________________________
>> >>>>
>> >>> rsyslog mailing list
>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>> http://www.rsyslog.com/professional-services/
>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >>> DON'T LIKE THAT.
>> >>>
>> >>>  _______________________________________________
>> >> rsyslog mailing list
>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> http://www.rsyslog.com/professional-services/
>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >> DON'T LIKE THAT.
>> >>
>> >>  _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> > DON'T LIKE THAT.
>> >
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Reply via email to