Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Rainer Gerhards Thu, 18 Jun 2015 02:19:36 -0700

2015-06-18 5:13 GMT+02:00 chenlin rao <[email protected]>:
> yes. you are right.
>
> FYI, This rsyslog server is sending a short msg with 100B size. And I use a
> special ES template with _source disabled for it.
>
> I check another rsyslog server which is sending 600B longmsg and has also a
> nearfull queue, it surely has a large bulk size(18MB). So I don't know
> where the problem is.


Can you pls disable the delay, run stats in 1-minute intervals and
provide the pstats file for a couple of hours -- so that we can see
how things evolve.

Rainer
>
> I use rsyslog-8.10.0.ad1-2.el6.x86_64 on CentOS6.2,
> elasticsearch-1.5.1-1.noarch on CentOS6.5.
>
> 2015-06-18 4:21 GMT+08:00 Radu Gheorghe <[email protected]>:
>
>> But I think what Chenlin is describing is a bug. He basically ends up with
>> small batches, but the queue is getting full. So rsyslog could build bigger
>> batches (there are messages in the queue) but it doesn't. Am I right? If
>> yes, it's a weird thing, I didn't see this issue before :( Maybe a full
>> reproduction (complete config of rsyslog + ES + versions + OSes) would
>> help?
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Wed, Jun 17, 2015 at 6:47 PM, singh.janmejay <[email protected]>
>> wrote:
>>
>> > ES uses worker-pool for indexing(there is a worker-pool for
>> > bulk-indexing too). Prioritizing approach may not be easy, and
>> > possibly a little dangerous too, but sizing that thread-pool is
>> > definitely easy. Just size it to your need and it'll shape the
>> > batch-size optimally when under pressure (like David explained).
>> >
>> > On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]>
>> > wrote:
>> > > well, there is something I can't understand: If rsyslog use 10msg per
>> > bulk
>> > > because Elasticsearch keep up the sending speed, why the output queue
>> > has a
>> > > size reached maxsize and discarded.nf/enqueued = 90%.
>> > >
>> > > here is my configuration:
>> > >
>> > > ```
>> > >    action (
>> > >         type="omelasticsearch"
>> > >         template="videotmpl"
>> > >         server="10.13.244.214"
>> > >         dynSearchIndex="on"
>> > >         searchIndex="videoIndexName"
>> > >         searchType="videoaccess"
>> > >         bulkmode="on"
>> > >         name="action_videoaccess-es1003"
>> > >         queue.size="1000000"
>> > >         queue.dequeuebatchsize="40000"
>> > >         queue.discardmark="950000"
>> > >         queue.highwatermark="600000"
>> > >         queue.lowwatermark="400000"
>> > >         queue.discardseverity="3"
>> > >         queue.dequeueslowdown="10000"
>> > >         queue.type="linkedlist"
>> > >         queue.maxdiskspace="15G"
>> > >         queue.maxfilesize="500M"
>> > >         queue.filename="action_videoaccess-es1003"
>> > >         queue.checkpointinterval="10000"
>> > >         queue.saveonshutdown="on"
>> > >     )
>> > > ```
>> > >
>> > > and pstats.log:
>> > >
>> > > ```
>> > > 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
>> > > {"name":"action_videoaccess-es1003
>> > >
>> >
>> queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
>> > > discarded.nf":0,"maxqsize":28153530}
>> > > 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
>> > > {"name":"action_videoaccess-es1003
>> > >
>> >
>> queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
>> > > discarded.nf":442298,"maxqsize":950000}
>> > > ```
>> > >
>> > > btw: I had try slowdown setting from 10 to 10000, no change to 10 msg
>> per
>> > > bulk.
>> > >
>> > > 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>:
>> > >
>> > >> That might work, thanks for the feedback and the interesting article!
>> > >>
>> > >> --
>> > >> Performance Monitoring * Log Analytics * Search Analytics
>> > >> Solr & Elasticsearch Support * http://sematext.com/
>> > >>
>> > >> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:
>> > >>
>> > >> > Probably a risk, something to keep an eye on (or watch the pstats
>> from
>> > >> > rsyslog and tweak the priority if the queue too large)
>> > >> >
>> > >> > I also believe that the vast majority of searches that are typically
>> > done
>> > >> > are done wrong (see my dashboards/reports article at
>> > >> >
>> > >>
>> >
>> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
>> > >> > )
>> > >> >
>> > >> > David Lang
>> > >> >
>> > >> > On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>> > >> >
>> > >> >  This sounds interesting, David. I guess it's possible to renice
>> just
>> > >> some
>> > >> >> threads from an app and make it "nicer", right? Googling a bit it
>> > seems
>> > >> it
>> > >> >> is possible.
>> > >> >>
>> > >> >> The only problem I see with this approach is that searches (and
>> other
>> > >> >> kinds
>> > >> >> of requests from other threadpools
>> > >> >> <
>> > >> >>
>> > >>
>> >
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
>> > >> >> >)
>> > >> >>
>> > >> >> would automatically have higher priority so, with heavy searches,
>> > >> indexing
>> > >> >> might fall behind more than usual. Am I getting it right?
>> > >> >>
>> > >> >> --
>> > >> >> Performance Monitoring * Log Analytics * Search Analytics
>> > >> >> Solr & Elasticsearch Support * http://sematext.com/
>> > >> >>
>> > >> >> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]>
>> wrote:
>> > >> >>
>> > >> >>  Thinking about it, probably the best thing to do is to renice the
>> ES
>> > >> >>> threads that accept the messages from rsyslog. That way if nothing
>> > else
>> > >> >>> needs the capacity, everything works at the fastest insert speed
>> > (even
>> > >> if
>> > >> >>> less optimized than if there were larger batches) But if anything
>> > else
>> > >> on
>> > >> >>> the system need the resources, the indexing threads work slower,
>> > which
>> > >> >>> will
>> > >> >>> result in larger batches.
>> > >> >>>
>> > >> >>> all self tuning.
>> > >> >>>
>> > >> >>> David Lang
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>> > >> >>>
>> > >> >>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
>> > >> >>>
>> > >> >>>> From: Radu Gheorghe <[email protected]>
>> > >> >>>> Reply-To: rsyslog-users <[email protected]>
>> > >> >>>> To: rsyslog-users <[email protected]>
>> > >> >>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk
>> > >> size?
>> > >> >>>>
>> > >> >>>> Maybe this went overlooked, but David suggested earlier that you
>> > can
>> > >> >>>> slowdown the queue to let more messages arrive before sending a
>> > bulk.
>> > >> >>>> queue.dequeueslowdown
>> > >> >>>> <
>> > >> >>>>
>> > >>
>> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
>> > >> >>>> >
>> > >> >>>> is the option and it's in microseconds.
>> > >> >>>>
>> > >> >>>> I think you have a valid point in that if batches are too small
>> > then
>> > >> >>>> Elasticsearch will do more work than necessary (as indexing in
>> very
>> > >> >>>> small
>> > >> >>>> batches is more expensive). Plus, since the refresh rate (i.e.
>> how
>> > >> long
>> > >> >>>> it
>> > >> >>>> may take for an indexed doc to be visible to searches, because
>> > >> Searchers
>> > >> >>>> reopen their view in the index at a certain interval) is
>> typically
>> > a
>> > >> few
>> > >> >>>> seconds
>> > >> >>>> <
>> > >> >>>>
>> > >> >>>>
>> > >>
>> >
>> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
>> > >> >>>>
>> > >> >>>>> ,
>> > >> >>>>>
>> > >> >>>>
>> > >> >>>> waiting a bit before submitting a batch will have no impact on
>> the
>> > >> user
>> > >> >>>> experience.
>> > >> >>>>
>> > >> >>>> On the other hand, in my experience you'll be sending small
>> > batches if
>> > >> >>>> the
>> > >> >>>> indexing rate is low - which means the load on ES is low anyway.
>> So
>> > >> I'm
>> > >> >>>> not
>> > >> >>>> sure if optimizing this will actually give significant results.
>> You
>> > >> >>>> could
>> > >> >>>> introduce that slowdown, but then rsyslog may have trouble
>> keeping
>> > up
>> > >> >>>> when
>> > >> >>>> the load is high. You can compensate by raising the limit of
>> > maximum
>> > >> >>>> worker
>> > >> >>>> threads for the queue (queue.workerthreads) and play with
>> > >> >>>> queue.workerthreadminimummessages and
>> > >> queue.timeoutworkerthreadshutdown
>> > >> >>>> to
>> > >> >>>> make rsyslog spawn new threads when there are at least N messages
>> > in
>> > >> the
>> > >> >>>> queue (that's what min messages does) and kill them when the
>> queue
>> > is
>> > >> >>>> smaller than that for a while (that's the timeout option). If the
>> > load
>> > >> >>>> is
>> > >> >>>> low, you'd have just one thread that works with that slowdown.
>> > >> >>>>
>> > >> >>>> I hope this helps.
>> > >> >>>>
>> > >> >>>> Best regards,
>> > >> >>>> Radu
>> > >> >>>>
>> > >> >>>> --
>> > >> >>>> Performance Monitoring * Log Analytics * Search Analytics
>> > >> >>>> Solr & Elasticsearch Support * http://sematext.com/
>> > >> >>>>
>> > >> >>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <
>> > [email protected]>
>> > >> >>>> wrote:
>> > >> >>>>
>> > >> >>>>  So how can I define the output queue configuration?
>> > >> >>>>
>> > >> >>>>> I found the omelasticsearch action process 60000/min, and the
>> > >> >>>>> queue.discarded.nf was 600000.
>> > >> >>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
>> > >> >>>>> Content-Length`
>> > >> >>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk
>> > size
>> > >> >>>>> is
>> > >> >>>>> only 10. Too small.
>> > >> >>>>>
>> > >> >>>>> Sometimes when I restart rsyslogd, the Content-Length grows to
>> > 8MB.
>> > >> >>>>> Why~~
>> > >> >>>>>
>> > >> >>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
>> > >> >>>>>
>> > >> >>>>>  On Tue, 5 May 2015, chenlin rao wrote:
>> > >> >>>>>
>> > >> >>>>>>
>> > >> >>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog
>> into
>> > >> >>>>>>
>> > >> >>>>>>  Elasticsearch cluster. I found the document told that the
>> plugin
>> > >> >>>>>>> would
>> > >> >>>>>>>
>> > >> >>>>>>>  use
>> > >> >>>>>>
>> > >> >>>>>
>> > >> >>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that
>> every
>> > >> POST
>> > >> >>>>>>
>> > >> >>>>>>> only
>> > >> >>>>>>> has 8-9 events in the bulk body while my input flow is nearly
>> > 10k
>> > >> per
>> > >> >>>>>>> second.
>> > >> >>>>>>>
>> > >> >>>>>>> How can I force a larger bulk size?
>> > >> >>>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>>  Rsyslog adapts the size to the number of messages waiting to
>> be
>> > >> >>>>>>
>> > >> >>>>>>  delivered,
>> > >> >>>>>
>> > >> >>>>>  so if it's keeping up at that size, it won't increase it.
>> > >> >>>>>>
>> > >> >>>>>> are you running impstats? if so, please look at the queue size.
>> > If
>> > >> >>>>>> it's
>> > >> >>>>>> staying low, then you just have a nice, fast ES instance that
>> is
>> > >> able
>> > >> >>>>>> to
>> > >> >>>>>>
>> > >> >>>>>>  do
>> > >> >>>>>
>> > >> >>>>>  1k inserts/sec (which is not unreasonable), so each insert
>> would
>> > be
>> > >> >>>>>> <10
>> > >> >>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>>  messages.
>> > >> >>>>>>
>> > >> >>>>>> Trying to force a larger bulk size would mean not inserting
>> > messages
>> > >> >>>>>> as
>> > >> >>>>>> fast as we can, and instead pausing and waiting for enough
>> > messages
>> > >> to
>> > >> >>>>>> accumulate to fill the bulk size. We never delay messages
>> > >> >>>>>> intentionally,
>> > >> >>>>>> each pass through the loop we grab all pending messages, up to
>> > the
>> > >> max
>> > >> >>>>>> dequeue size, and deliver them. If more messages arrive than we
>> > >> >>>>>> deliver,
>> > >> >>>>>> the next pass through the queue is larger, so we grab more
>> > messages
>> > >> >>>>>> (this
>> > >> >>>>>> quickly stabilizes to inserting messages as fast as they are
>> > >> arriving)
>> > >> >>>>>>
>> > >> >>>>>> there is a dequeue delay that forces rsyslog to sit and do
>> > nothing
>> > >> >>>>>>
>> > >> >>>>>>  between
>> > >> >>>>>
>> > >> >>>>>  one batch of messages and the next. It's use is discouraged,
>> but
>> > >> >>>>>> delaying
>> > >> >>>>>> like this would allow more messages to accumulate.
>> > >> >>>>>>
>> > >> >>>>>> David Lang
>> > >> >>>>>> _______________________________________________
>> > >> >>>>>> rsyslog mailing list
>> > >> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> >>>>>> http://www.rsyslog.com/professional-services/
>> > >> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
>> by a
>> > >> >>>>>> myriad
>> > >> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>> > if
>> > >> you
>> > >> >>>>>> DON'T LIKE THAT.
>> > >> >>>>>>
>> > >> >>>>>>  _______________________________________________
>> > >> >>>>>>
>> > >> >>>>> rsyslog mailing list
>> > >> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> >>>>> http://www.rsyslog.com/professional-services/
>> > >> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by
>> a
>> > >> >>>>> myriad
>> > >> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>> if
>> > >> you
>> > >> >>>>> DON'T LIKE THAT.
>> > >> >>>>>
>> > >> >>>>>  _______________________________________________
>> > >> >>>>>
>> > >> >>>> rsyslog mailing list
>> > >> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> >>>> http://www.rsyslog.com/professional-services/
>> > >> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> > >> myriad
>> > >> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>> if
>> > you
>> > >> >>>> DON'T LIKE THAT.
>> > >> >>>>
>> > >> >>>>  _______________________________________________
>> > >> >>>>
>> > >> >>> rsyslog mailing list
>> > >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> >>> http://www.rsyslog.com/professional-services/
>> > >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> > >> myriad
>> > >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> > you
>> > >> >>> DON'T LIKE THAT.
>> > >> >>>
>> > >> >>>  _______________________________________________
>> > >> >> rsyslog mailing list
>> > >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> >> http://www.rsyslog.com/professional-services/
>> > >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> > myriad
>> > >> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> > you
>> > >> >> DON'T LIKE THAT.
>> > >> >>
>> > >> >>  _______________________________________________
>> > >> > rsyslog mailing list
>> > >> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> > http://www.rsyslog.com/professional-services/
>> > >> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> > myriad
>> > >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you
>> > >> > DON'T LIKE THAT.
>> > >> >
>> > >> _______________________________________________
>> > >> rsyslog mailing list
>> > >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > >> http://www.rsyslog.com/professional-services/
>> > >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> > >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> > >> DON'T LIKE THAT.
>> > >>
>> > > _______________________________________________
>> > > rsyslog mailing list
>> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > > http://www.rsyslog.com/professional-services/
>> > > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> > DON'T LIKE THAT.
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Janmejay
>> > http://codehunk.wordpress.com
>> > _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> > DON'T LIKE THAT.
>> >
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Reply via email to