Re: [rsyslog] how to force a larger omelasticsearch bulk size?

chenlin rao Sat, 08 Aug 2015 08:02:15 -0700

Hello everyone.

I have some discovery so woke up this mail again.


Yesterday I use iptables to drop the input tcp 514, and watch the consume
progress of omelasticsearch queue[DA]. At that time, there are 6000000 msg
in DA. But tcpdump show Content-Length < 8000 means 10+ msg per bulk.



2015-06-18 17:18 GMT+08:00 Rainer Gerhards <[email protected]>:

> 2015-06-18 5:13 GMT+02:00 chenlin rao <[email protected]>:
> > yes. you are right.
> >
> > FYI, This rsyslog server is sending a short msg with 100B size. And I
> use a
> > special ES template with _source disabled for it.
> >
> > I check another rsyslog server which is sending 600B longmsg and has
> also a
> > nearfull queue, it surely has a large bulk size(18MB). So I don't know
> > where the problem is.
>
> Can you pls disable the delay, run stats in 1-minute intervals and
> provide the pstats file for a couple of hours -- so that we can see
> how things evolve.
>
> Rainer
> >
> > I use rsyslog-8.10.0.ad1-2.el6.x86_64 on CentOS6.2,
> > elasticsearch-1.5.1-1.noarch on CentOS6.5.
> >
> > 2015-06-18 4:21 GMT+08:00 Radu Gheorghe <[email protected]>:
> >
> >> But I think what Chenlin is describing is a bug. He basically ends up
> with
> >> small batches, but the queue is getting full. So rsyslog could build
> bigger
> >> batches (there are messages in the queue) but it doesn't. Am I right? If
> >> yes, it's a weird thing, I didn't see this issue before :( Maybe a full
> >> reproduction (complete config of rsyslog + ES + versions + OSes) would
> >> help?
> >>
> >> --
> >> Performance Monitoring * Log Analytics * Search Analytics
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >> On Wed, Jun 17, 2015 at 6:47 PM, singh.janmejay <
> [email protected]>
> >> wrote:
> >>
> >> > ES uses worker-pool for indexing(there is a worker-pool for
> >> > bulk-indexing too). Prioritizing approach may not be easy, and
> >> > possibly a little dangerous too, but sizing that thread-pool is
> >> > definitely easy. Just size it to your need and it'll shape the
> >> > batch-size optimally when under pressure (like David explained).
> >> >
> >> > On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]>
> >> > wrote:
> >> > > well, there is something I can't understand: If rsyslog use 10msg
> per
> >> > bulk
> >> > > because Elasticsearch keep up the sending speed, why the output
> queue
> >> > has a
> >> > > size reached maxsize and discarded.nf/enqueued = 90%.
> >> > >
> >> > > here is my configuration:
> >> > >
> >> > > ```
> >> > >    action (
> >> > >         type="omelasticsearch"
> >> > >         template="videotmpl"
> >> > >         server="10.13.244.214"
> >> > >         dynSearchIndex="on"
> >> > >         searchIndex="videoIndexName"
> >> > >         searchType="videoaccess"
> >> > >         bulkmode="on"
> >> > >         name="action_videoaccess-es1003"
> >> > >         queue.size="1000000"
> >> > >         queue.dequeuebatchsize="40000"
> >> > >         queue.discardmark="950000"
> >> > >         queue.highwatermark="600000"
> >> > >         queue.lowwatermark="400000"
> >> > >         queue.discardseverity="3"
> >> > >         queue.dequeueslowdown="10000"
> >> > >         queue.type="linkedlist"
> >> > >         queue.maxdiskspace="15G"
> >> > >         queue.maxfilesize="500M"
> >> > >         queue.filename="action_videoaccess-es1003"
> >> > >         queue.checkpointinterval="10000"
> >> > >         queue.saveonshutdown="on"
> >> > >     )
> >> > > ```
> >> > >
> >> > > and pstats.log:
> >> > >
> >> > > ```
> >> > > 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
> >> > > {"name":"action_videoaccess-es1003
> >> > >
> >> >
> >>
> queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
> >> > > discarded.nf":0,"maxqsize":28153530}
> >> > > 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
> >> > > {"name":"action_videoaccess-es1003
> >> > >
> >> >
> >>
> queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
> >> > > discarded.nf":442298,"maxqsize":950000}
> >> > > ```
> >> > >
> >> > > btw: I had try slowdown setting from 10 to 10000, no change to 10
> msg
> >> per
> >> > > bulk.
> >> > >
> >> > > 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <
> [email protected]>:
> >> > >
> >> > >> That might work, thanks for the feedback and the interesting
> article!
> >> > >>
> >> > >> --
> >> > >> Performance Monitoring * Log Analytics * Search Analytics
> >> > >> Solr & Elasticsearch Support * http://sematext.com/
> >> > >>
> >> > >> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]>
> wrote:
> >> > >>
> >> > >> > Probably a risk, something to keep an eye on (or watch the pstats
> >> from
> >> > >> > rsyslog and tweak the priority if the queue too large)
> >> > >> >
> >> > >> > I also believe that the vast majority of searches that are
> typically
> >> > done
> >> > >> > are done wrong (see my dashboards/reports article at
> >> > >> >
> >> > >>
> >> >
> >>
> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
> >> > >> > )
> >> > >> >
> >> > >> > David Lang
> >> > >> >
> >> > >> > On Wed, 17 Jun 2015, Radu Gheorghe wrote:
> >> > >> >
> >> > >> >  This sounds interesting, David. I guess it's possible to renice
> >> just
> >> > >> some
> >> > >> >> threads from an app and make it "nicer", right? Googling a bit
> it
> >> > seems
> >> > >> it
> >> > >> >> is possible.
> >> > >> >>
> >> > >> >> The only problem I see with this approach is that searches (and
> >> other
> >> > >> >> kinds
> >> > >> >> of requests from other threadpools
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
> >> > >> >> >)
> >> > >> >>
> >> > >> >> would automatically have higher priority so, with heavy
> searches,
> >> > >> indexing
> >> > >> >> might fall behind more than usual. Am I getting it right?
> >> > >> >>
> >> > >> >> --
> >> > >> >> Performance Monitoring * Log Analytics * Search Analytics
> >> > >> >> Solr & Elasticsearch Support * http://sematext.com/
> >> > >> >>
> >> > >> >> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]>
> >> wrote:
> >> > >> >>
> >> > >> >>  Thinking about it, probably the best thing to do is to renice
> the
> >> ES
> >> > >> >>> threads that accept the messages from rsyslog. That way if
> nothing
> >> > else
> >> > >> >>> needs the capacity, everything works at the fastest insert
> speed
> >> > (even
> >> > >> if
> >> > >> >>> less optimized than if there were larger batches) But if
> anything
> >> > else
> >> > >> on
> >> > >> >>> the system need the resources, the indexing threads work
> slower,
> >> > which
> >> > >> >>> will
> >> > >> >>> result in larger batches.
> >> > >> >>>
> >> > >> >>> all self tuning.
> >> > >> >>>
> >> > >> >>> David Lang
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
> >> > >> >>>
> >> > >> >>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
> >> > >> >>>
> >> > >> >>>> From: Radu Gheorghe <[email protected]>
> >> > >> >>>> Reply-To: rsyslog-users <[email protected]>
> >> > >> >>>> To: rsyslog-users <[email protected]>
> >> > >> >>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch
> bulk
> >> > >> size?
> >> > >> >>>>
> >> > >> >>>> Maybe this went overlooked, but David suggested earlier that
> you
> >> > can
> >> > >> >>>> slowdown the queue to let more messages arrive before sending
> a
> >> > bulk.
> >> > >> >>>> queue.dequeueslowdown
> >> > >> >>>> <
> >> > >> >>>>
> >> > >>
> >> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
> >> > >> >>>> >
> >> > >> >>>> is the option and it's in microseconds.
> >> > >> >>>>
> >> > >> >>>> I think you have a valid point in that if batches are too
> small
> >> > then
> >> > >> >>>> Elasticsearch will do more work than necessary (as indexing in
> >> very
> >> > >> >>>> small
> >> > >> >>>> batches is more expensive). Plus, since the refresh rate (i.e.
> >> how
> >> > >> long
> >> > >> >>>> it
> >> > >> >>>> may take for an indexed doc to be visible to searches, because
> >> > >> Searchers
> >> > >> >>>> reopen their view in the index at a certain interval) is
> >> typically
> >> > a
> >> > >> few
> >> > >> >>>> seconds
> >> > >> >>>> <
> >> > >> >>>>
> >> > >> >>>>
> >> > >>
> >> >
> >>
> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
> >> > >> >>>>
> >> > >> >>>>> ,
> >> > >> >>>>>
> >> > >> >>>>
> >> > >> >>>> waiting a bit before submitting a batch will have no impact on
> >> the
> >> > >> user
> >> > >> >>>> experience.
> >> > >> >>>>
> >> > >> >>>> On the other hand, in my experience you'll be sending small
> >> > batches if
> >> > >> >>>> the
> >> > >> >>>> indexing rate is low - which means the load on ES is low
> anyway.
> >> So
> >> > >> I'm
> >> > >> >>>> not
> >> > >> >>>> sure if optimizing this will actually give significant
> results.
> >> You
> >> > >> >>>> could
> >> > >> >>>> introduce that slowdown, but then rsyslog may have trouble
> >> keeping
> >> > up
> >> > >> >>>> when
> >> > >> >>>> the load is high. You can compensate by raising the limit of
> >> > maximum
> >> > >> >>>> worker
> >> > >> >>>> threads for the queue (queue.workerthreads) and play with
> >> > >> >>>> queue.workerthreadminimummessages and
> >> > >> queue.timeoutworkerthreadshutdown
> >> > >> >>>> to
> >> > >> >>>> make rsyslog spawn new threads when there are at least N
> messages
> >> > in
> >> > >> the
> >> > >> >>>> queue (that's what min messages does) and kill them when the
> >> queue
> >> > is
> >> > >> >>>> smaller than that for a while (that's the timeout option). If
> the
> >> > load
> >> > >> >>>> is
> >> > >> >>>> low, you'd have just one thread that works with that slowdown.
> >> > >> >>>>
> >> > >> >>>> I hope this helps.
> >> > >> >>>>
> >> > >> >>>> Best regards,
> >> > >> >>>> Radu
> >> > >> >>>>
> >> > >> >>>> --
> >> > >> >>>> Performance Monitoring * Log Analytics * Search Analytics
> >> > >> >>>> Solr & Elasticsearch Support * http://sematext.com/
> >> > >> >>>>
> >> > >> >>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <
> >> > [email protected]>
> >> > >> >>>> wrote:
> >> > >> >>>>
> >> > >> >>>>  So how can I define the output queue configuration?
> >> > >> >>>>
> >> > >> >>>>> I found the omelasticsearch action process 60000/min, and the
> >> > >> >>>>> queue.discarded.nf was 600000.
> >> > >> >>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
> >> > >> >>>>> Content-Length`
> >> > >> >>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the
> bulk
> >> > size
> >> > >> >>>>> is
> >> > >> >>>>> only 10. Too small.
> >> > >> >>>>>
> >> > >> >>>>> Sometimes when I restart rsyslogd, the Content-Length grows
> to
> >> > 8MB.
> >> > >> >>>>> Why~~
> >> > >> >>>>>
> >> > >> >>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
> >> > >> >>>>>
> >> > >> >>>>>  On Tue, 5 May 2015, chenlin rao wrote:
> >> > >> >>>>>
> >> > >> >>>>>>
> >> > >> >>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog
> >> into
> >> > >> >>>>>>
> >> > >> >>>>>>  Elasticsearch cluster. I found the document told that the
> >> plugin
> >> > >> >>>>>>> would
> >> > >> >>>>>>>
> >> > >> >>>>>>>  use
> >> > >> >>>>>>
> >> > >> >>>>>
> >> > >> >>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that
> >> every
> >> > >> POST
> >> > >> >>>>>>
> >> > >> >>>>>>> only
> >> > >> >>>>>>> has 8-9 events in the bulk body while my input flow is
> nearly
> >> > 10k
> >> > >> per
> >> > >> >>>>>>> second.
> >> > >> >>>>>>>
> >> > >> >>>>>>> How can I force a larger bulk size?
> >> > >> >>>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>>  Rsyslog adapts the size to the number of messages waiting
> to
> >> be
> >> > >> >>>>>>
> >> > >> >>>>>>  delivered,
> >> > >> >>>>>
> >> > >> >>>>>  so if it's keeping up at that size, it won't increase it.
> >> > >> >>>>>>
> >> > >> >>>>>> are you running impstats? if so, please look at the queue
> size.
> >> > If
> >> > >> >>>>>> it's
> >> > >> >>>>>> staying low, then you just have a nice, fast ES instance
> that
> >> is
> >> > >> able
> >> > >> >>>>>> to
> >> > >> >>>>>>
> >> > >> >>>>>>  do
> >> > >> >>>>>
> >> > >> >>>>>  1k inserts/sec (which is not unreasonable), so each insert
> >> would
> >> > be
> >> > >> >>>>>> <10
> >> > >> >>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>>  messages.
> >> > >> >>>>>>
> >> > >> >>>>>> Trying to force a larger bulk size would mean not inserting
> >> > messages
> >> > >> >>>>>> as
> >> > >> >>>>>> fast as we can, and instead pausing and waiting for enough
> >> > messages
> >> > >> to
> >> > >> >>>>>> accumulate to fill the bulk size. We never delay messages
> >> > >> >>>>>> intentionally,
> >> > >> >>>>>> each pass through the loop we grab all pending messages, up
> to
> >> > the
> >> > >> max
> >> > >> >>>>>> dequeue size, and deliver them. If more messages arrive
> than we
> >> > >> >>>>>> deliver,
> >> > >> >>>>>> the next pass through the queue is larger, so we grab more
> >> > messages
> >> > >> >>>>>> (this
> >> > >> >>>>>> quickly stabilizes to inserting messages as fast as they are
> >> > >> arriving)
> >> > >> >>>>>>
> >> > >> >>>>>> there is a dequeue delay that forces rsyslog to sit and do
> >> > nothing
> >> > >> >>>>>>
> >> > >> >>>>>>  between
> >> > >> >>>>>
> >> > >> >>>>>  one batch of messages and the next. It's use is discouraged,
> >> but
> >> > >> >>>>>> delaying
> >> > >> >>>>>> like this would allow more messages to accumulate.
> >> > >> >>>>>>
> >> > >> >>>>>> David Lang
> >> > >> >>>>>> _______________________________________________
> >> > >> >>>>>> rsyslog mailing list
> >> > >> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> >>>>>> http://www.rsyslog.com/professional-services/
> >> > >> >>>>>> What's up with rsyslog? Follow
> https://twitter.com/rgerhards
> >> > >> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
> >> by a
> >> > >> >>>>>> myriad
> >> > >> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
> POST
> >> > if
> >> > >> you
> >> > >> >>>>>> DON'T LIKE THAT.
> >> > >> >>>>>>
> >> > >> >>>>>>  _______________________________________________
> >> > >> >>>>>>
> >> > >> >>>>> rsyslog mailing list
> >> > >> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> >>>>> http://www.rsyslog.com/professional-services/
> >> > >> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
> by
> >> a
> >> > >> >>>>> myriad
> >> > >> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
> POST
> >> if
> >> > >> you
> >> > >> >>>>> DON'T LIKE THAT.
> >> > >> >>>>>
> >> > >> >>>>>  _______________________________________________
> >> > >> >>>>>
> >> > >> >>>> rsyslog mailing list
> >> > >> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> >>>> http://www.rsyslog.com/professional-services/
> >> > >> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
> by a
> >> > >> myriad
> >> > >> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
> POST
> >> if
> >> > you
> >> > >> >>>> DON'T LIKE THAT.
> >> > >> >>>>
> >> > >> >>>>  _______________________________________________
> >> > >> >>>>
> >> > >> >>> rsyslog mailing list
> >> > >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> >>> http://www.rsyslog.com/professional-services/
> >> > >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
> by a
> >> > >> myriad
> >> > >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
> POST if
> >> > you
> >> > >> >>> DON'T LIKE THAT.
> >> > >> >>>
> >> > >> >>>  _______________________________________________
> >> > >> >> rsyslog mailing list
> >> > >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> >> http://www.rsyslog.com/professional-services/
> >> > >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by
> a
> >> > myriad
> >> > >> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >> > you
> >> > >> >> DON'T LIKE THAT.
> >> > >> >>
> >> > >> >>  _______________________________________________
> >> > >> > rsyslog mailing list
> >> > >> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> > http://www.rsyslog.com/professional-services/
> >> > >> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> > myriad
> >> > >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >> you
> >> > >> > DON'T LIKE THAT.
> >> > >> >
> >> > >> _______________________________________________
> >> > >> rsyslog mailing list
> >> > >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > >> http://www.rsyslog.com/professional-services/
> >> > >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> > >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> > >> DON'T LIKE THAT.
> >> > >>
> >> > > _______________________________________________
> >> > > rsyslog mailing list
> >> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > > http://www.rsyslog.com/professional-services/
> >> > > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> > DON'T LIKE THAT.
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Janmejay
> >> > http://codehunk.wordpress.com
> >> > _______________________________________________
> >> > rsyslog mailing list
> >> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > http://www.rsyslog.com/professional-services/
> >> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> > DON'T LIKE THAT.
> >> >
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Reply via email to