But I think what Chenlin is describing is a bug. He basically ends up with
small batches, but the queue is getting full. So rsyslog could build bigger
batches (there are messages in the queue) but it doesn't. Am I right? If
yes, it's a weird thing, I didn't see this issue before :( Maybe a full
reproduction (complete config of rsyslog + ES + versions + OSes) would help?

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 6:47 PM, singh.janmejay <[email protected]>
wrote:

> ES uses worker-pool for indexing(there is a worker-pool for
> bulk-indexing too). Prioritizing approach may not be easy, and
> possibly a little dangerous too, but sizing that thread-pool is
> definitely easy. Just size it to your need and it'll shape the
> batch-size optimally when under pressure (like David explained).
>
> On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]>
> wrote:
> > well, there is something I can't understand: If rsyslog use 10msg per
> bulk
> > because Elasticsearch keep up the sending speed, why the output queue
> has a
> > size reached maxsize and discarded.nf/enqueued = 90%.
> >
> > here is my configuration:
> >
> > ```
> >    action (
> >         type="omelasticsearch"
> >         template="videotmpl"
> >         server="10.13.244.214"
> >         dynSearchIndex="on"
> >         searchIndex="videoIndexName"
> >         searchType="videoaccess"
> >         bulkmode="on"
> >         name="action_videoaccess-es1003"
> >         queue.size="1000000"
> >         queue.dequeuebatchsize="40000"
> >         queue.discardmark="950000"
> >         queue.highwatermark="600000"
> >         queue.lowwatermark="400000"
> >         queue.discardseverity="3"
> >         queue.dequeueslowdown="10000"
> >         queue.type="linkedlist"
> >         queue.maxdiskspace="15G"
> >         queue.maxfilesize="500M"
> >         queue.filename="action_videoaccess-es1003"
> >         queue.checkpointinterval="10000"
> >         queue.saveonshutdown="on"
> >     )
> > ```
> >
> > and pstats.log:
> >
> > ```
> > 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
> > {"name":"action_videoaccess-es1003
> >
> queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
> > discarded.nf":0,"maxqsize":28153530}
> > 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
> > {"name":"action_videoaccess-es1003
> >
> queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
> > discarded.nf":442298,"maxqsize":950000}
> > ```
> >
> > btw: I had try slowdown setting from 10 to 10000, no change to 10 msg per
> > bulk.
> >
> > 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>:
> >
> >> That might work, thanks for the feedback and the interesting article!
> >>
> >> --
> >> Performance Monitoring * Log Analytics * Search Analytics
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:
> >>
> >> > Probably a risk, something to keep an eye on (or watch the pstats from
> >> > rsyslog and tweak the priority if the queue too large)
> >> >
> >> > I also believe that the vast majority of searches that are typically
> done
> >> > are done wrong (see my dashboards/reports article at
> >> >
> >>
> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
> >> > )
> >> >
> >> > David Lang
> >> >
> >> > On Wed, 17 Jun 2015, Radu Gheorghe wrote:
> >> >
> >> >  This sounds interesting, David. I guess it's possible to renice just
> >> some
> >> >> threads from an app and make it "nicer", right? Googling a bit it
> seems
> >> it
> >> >> is possible.
> >> >>
> >> >> The only problem I see with this approach is that searches (and other
> >> >> kinds
> >> >> of requests from other threadpools
> >> >> <
> >> >>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
> >> >> >)
> >> >>
> >> >> would automatically have higher priority so, with heavy searches,
> >> indexing
> >> >> might fall behind more than usual. Am I getting it right?
> >> >>
> >> >> --
> >> >> Performance Monitoring * Log Analytics * Search Analytics
> >> >> Solr & Elasticsearch Support * http://sematext.com/
> >> >>
> >> >> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote:
> >> >>
> >> >>  Thinking about it, probably the best thing to do is to renice the ES
> >> >>> threads that accept the messages from rsyslog. That way if nothing
> else
> >> >>> needs the capacity, everything works at the fastest insert speed
> (even
> >> if
> >> >>> less optimized than if there were larger batches) But if anything
> else
> >> on
> >> >>> the system need the resources, the indexing threads work slower,
> which
> >> >>> will
> >> >>> result in larger batches.
> >> >>>
> >> >>> all self tuning.
> >> >>>
> >> >>> David Lang
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
> >> >>>
> >> >>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
> >> >>>
> >> >>>> From: Radu Gheorghe <[email protected]>
> >> >>>> Reply-To: rsyslog-users <[email protected]>
> >> >>>> To: rsyslog-users <[email protected]>
> >> >>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk
> >> size?
> >> >>>>
> >> >>>> Maybe this went overlooked, but David suggested earlier that you
> can
> >> >>>> slowdown the queue to let more messages arrive before sending a
> bulk.
> >> >>>> queue.dequeueslowdown
> >> >>>> <
> >> >>>>
> >> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
> >> >>>> >
> >> >>>> is the option and it's in microseconds.
> >> >>>>
> >> >>>> I think you have a valid point in that if batches are too small
> then
> >> >>>> Elasticsearch will do more work than necessary (as indexing in very
> >> >>>> small
> >> >>>> batches is more expensive). Plus, since the refresh rate (i.e. how
> >> long
> >> >>>> it
> >> >>>> may take for an indexed doc to be visible to searches, because
> >> Searchers
> >> >>>> reopen their view in the index at a certain interval) is typically
> a
> >> few
> >> >>>> seconds
> >> >>>> <
> >> >>>>
> >> >>>>
> >>
> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
> >> >>>>
> >> >>>>> ,
> >> >>>>>
> >> >>>>
> >> >>>> waiting a bit before submitting a batch will have no impact on the
> >> user
> >> >>>> experience.
> >> >>>>
> >> >>>> On the other hand, in my experience you'll be sending small
> batches if
> >> >>>> the
> >> >>>> indexing rate is low - which means the load on ES is low anyway. So
> >> I'm
> >> >>>> not
> >> >>>> sure if optimizing this will actually give significant results. You
> >> >>>> could
> >> >>>> introduce that slowdown, but then rsyslog may have trouble keeping
> up
> >> >>>> when
> >> >>>> the load is high. You can compensate by raising the limit of
> maximum
> >> >>>> worker
> >> >>>> threads for the queue (queue.workerthreads) and play with
> >> >>>> queue.workerthreadminimummessages and
> >> queue.timeoutworkerthreadshutdown
> >> >>>> to
> >> >>>> make rsyslog spawn new threads when there are at least N messages
> in
> >> the
> >> >>>> queue (that's what min messages does) and kill them when the queue
> is
> >> >>>> smaller than that for a while (that's the timeout option). If the
> load
> >> >>>> is
> >> >>>> low, you'd have just one thread that works with that slowdown.
> >> >>>>
> >> >>>> I hope this helps.
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Radu
> >> >>>>
> >> >>>> --
> >> >>>> Performance Monitoring * Log Analytics * Search Analytics
> >> >>>> Solr & Elasticsearch Support * http://sematext.com/
> >> >>>>
> >> >>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <
> [email protected]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>  So how can I define the output queue configuration?
> >> >>>>
> >> >>>>> I found the omelasticsearch action process 60000/min, and the
> >> >>>>> queue.discarded.nf was 600000.
> >> >>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
> >> >>>>> Content-Length`
> >> >>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk
> size
> >> >>>>> is
> >> >>>>> only 10. Too small.
> >> >>>>>
> >> >>>>> Sometimes when I restart rsyslogd, the Content-Length grows to
> 8MB.
> >> >>>>> Why~~
> >> >>>>>
> >> >>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
> >> >>>>>
> >> >>>>>  On Tue, 5 May 2015, chenlin rao wrote:
> >> >>>>>
> >> >>>>>>
> >> >>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog into
> >> >>>>>>
> >> >>>>>>  Elasticsearch cluster. I found the document told that the plugin
> >> >>>>>>> would
> >> >>>>>>>
> >> >>>>>>>  use
> >> >>>>>>
> >> >>>>>
> >> >>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that every
> >> POST
> >> >>>>>>
> >> >>>>>>> only
> >> >>>>>>> has 8-9 events in the bulk body while my input flow is nearly
> 10k
> >> per
> >> >>>>>>> second.
> >> >>>>>>>
> >> >>>>>>> How can I force a larger bulk size?
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>  Rsyslog adapts the size to the number of messages waiting to be
> >> >>>>>>
> >> >>>>>>  delivered,
> >> >>>>>
> >> >>>>>  so if it's keeping up at that size, it won't increase it.
> >> >>>>>>
> >> >>>>>> are you running impstats? if so, please look at the queue size.
> If
> >> >>>>>> it's
> >> >>>>>> staying low, then you just have a nice, fast ES instance that is
> >> able
> >> >>>>>> to
> >> >>>>>>
> >> >>>>>>  do
> >> >>>>>
> >> >>>>>  1k inserts/sec (which is not unreasonable), so each insert would
> be
> >> >>>>>> <10
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>>  messages.
> >> >>>>>>
> >> >>>>>> Trying to force a larger bulk size would mean not inserting
> messages
> >> >>>>>> as
> >> >>>>>> fast as we can, and instead pausing and waiting for enough
> messages
> >> to
> >> >>>>>> accumulate to fill the bulk size. We never delay messages
> >> >>>>>> intentionally,
> >> >>>>>> each pass through the loop we grab all pending messages, up to
> the
> >> max
> >> >>>>>> dequeue size, and deliver them. If more messages arrive than we
> >> >>>>>> deliver,
> >> >>>>>> the next pass through the queue is larger, so we grab more
> messages
> >> >>>>>> (this
> >> >>>>>> quickly stabilizes to inserting messages as fast as they are
> >> arriving)
> >> >>>>>>
> >> >>>>>> there is a dequeue delay that forces rsyslog to sit and do
> nothing
> >> >>>>>>
> >> >>>>>>  between
> >> >>>>>
> >> >>>>>  one batch of messages and the next. It's use is discouraged, but
> >> >>>>>> delaying
> >> >>>>>> like this would allow more messages to accumulate.
> >> >>>>>>
> >> >>>>>> David Lang
> >> >>>>>> _______________________________________________
> >> >>>>>> rsyslog mailing list
> >> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>>>>> http://www.rsyslog.com/professional-services/
> >> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> >>>>>> myriad
> >> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >> you
> >> >>>>>> DON'T LIKE THAT.
> >> >>>>>>
> >> >>>>>>  _______________________________________________
> >> >>>>>>
> >> >>>>> rsyslog mailing list
> >> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>>>> http://www.rsyslog.com/professional-services/
> >> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> >>>>> myriad
> >> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> >> you
> >> >>>>> DON'T LIKE THAT.
> >> >>>>>
> >> >>>>>  _______________________________________________
> >> >>>>>
> >> >>>> rsyslog mailing list
> >> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>>> http://www.rsyslog.com/professional-services/
> >> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> >>>> DON'T LIKE THAT.
> >> >>>>
> >> >>>>  _______________________________________________
> >> >>>>
> >> >>> rsyslog mailing list
> >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>> http://www.rsyslog.com/professional-services/
> >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> >>> DON'T LIKE THAT.
> >> >>>
> >> >>>  _______________________________________________
> >> >> rsyslog mailing list
> >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >> http://www.rsyslog.com/professional-services/
> >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> >> DON'T LIKE THAT.
> >> >>
> >> >>  _______________________________________________
> >> > rsyslog mailing list
> >> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> > http://www.rsyslog.com/professional-services/
> >> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> > DON'T LIKE THAT.
> >> >
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to