Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Radu Gheorghe Wed, 17 Jun 2015 14:10:04 -0700

No, it's dequeuebatchsize. Or at least that's what I've know and seen for
years. It would be quite a thing if I'm wrong :D


--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 11:35 PM, David Lang <[email protected]> wrote:

> I seem to remember seeing that there is a different variable for
> omelasticsearch to set the max bulk size for the ES insert as opposed to
> the batch size used internally by rsyslog. I don't remember what it is.
>
>
> David Lang
>
> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>
>  But I think what Chenlin is describing is a bug. He basically ends up with
>> small batches, but the queue is getting full. So rsyslog could build
>> bigger
>> batches (there are messages in the queue) but it doesn't. Am I right? If
>> yes, it's a weird thing, I didn't see this issue before :( Maybe a full
>> reproduction (complete config of rsyslog + ES + versions + OSes) would
>> help?
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Wed, Jun 17, 2015 at 6:47 PM, singh.janmejay <[email protected]
>> >
>> wrote:
>>
>>  ES uses worker-pool for indexing(there is a worker-pool for
>>> bulk-indexing too). Prioritizing approach may not be easy, and
>>> possibly a little dangerous too, but sizing that thread-pool is
>>> definitely easy. Just size it to your need and it'll shape the
>>> batch-size optimally when under pressure (like David explained).
>>>
>>> On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]>
>>> wrote:
>>>
>>>> well, there is something I can't understand: If rsyslog use 10msg per
>>>>
>>> bulk
>>>
>>>> because Elasticsearch keep up the sending speed, why the output queue
>>>>
>>> has a
>>>
>>>> size reached maxsize and discarded.nf/enqueued = 90%.
>>>>
>>>> here is my configuration:
>>>>
>>>> ```
>>>>    action (
>>>>         type="omelasticsearch"
>>>>         template="videotmpl"
>>>>         server="10.13.244.214"
>>>>         dynSearchIndex="on"
>>>>         searchIndex="videoIndexName"
>>>>         searchType="videoaccess"
>>>>         bulkmode="on"
>>>>         name="action_videoaccess-es1003"
>>>>         queue.size="1000000"
>>>>         queue.dequeuebatchsize="40000"
>>>>         queue.discardmark="950000"
>>>>         queue.highwatermark="600000"
>>>>         queue.lowwatermark="400000"
>>>>         queue.discardseverity="3"
>>>>         queue.dequeueslowdown="10000"
>>>>         queue.type="linkedlist"
>>>>         queue.maxdiskspace="15G"
>>>>         queue.maxfilesize="500M"
>>>>         queue.filename="action_videoaccess-es1003"
>>>>         queue.checkpointinterval="10000"
>>>>         queue.saveonshutdown="on"
>>>>     )
>>>> ```
>>>>
>>>> and pstats.log:
>>>>
>>>> ```
>>>> 2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
>>>> {"name":"action_videoaccess-es1003
>>>>
>>>>
>>> queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
>>>
>>>> discarded.nf":0,"maxqsize":28153530}
>>>> 2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
>>>> {"name":"action_videoaccess-es1003
>>>>
>>>>
>>> queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
>>>
>>>> discarded.nf":442298,"maxqsize":950000}
>>>> ```
>>>>
>>>> btw: I had try slowdown setting from 10 to 10000, no change to 10 msg
>>>> per
>>>> bulk.
>>>>
>>>> 2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>:
>>>>
>>>>  That might work, thanks for the feedback and the interesting article!
>>>>>
>>>>> --
>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>
>>>>> On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:
>>>>>
>>>>>  Probably a risk, something to keep an eye on (or watch the pstats from
>>>>>> rsyslog and tweak the priority if the queue too large)
>>>>>>
>>>>>> I also believe that the vast majority of searches that are typically
>>>>>>
>>>>> done
>>>
>>>> are done wrong (see my dashboards/reports article at
>>>>>>
>>>>>>
>>>>>
>>> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
>>>
>>>> )
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>>>>>>
>>>>>>  This sounds interesting, David. I guess it's possible to renice just
>>>>>>
>>>>> some
>>>>>
>>>>>> threads from an app and make it "nicer", right? Googling a bit it
>>>>>>>
>>>>>> seems
>>>
>>>> it
>>>>>
>>>>>> is possible.
>>>>>>>
>>>>>>> The only problem I see with this approach is that searches (and other
>>>>>>> kinds
>>>>>>> of requests from other threadpools
>>>>>>> <
>>>>>>>
>>>>>>>
>>>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
>>>
>>>> )
>>>>>>>>
>>>>>>>
>>>>>>> would automatically have higher priority so, with heavy searches,
>>>>>>>
>>>>>> indexing
>>>>>
>>>>>> might fall behind more than usual. Am I getting it right?
>>>>>>>
>>>>>>> --
>>>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>>>
>>>>>>> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote:
>>>>>>>
>>>>>>>  Thinking about it, probably the best thing to do is to renice the ES
>>>>>>>
>>>>>>>> threads that accept the messages from rsyslog. That way if nothing
>>>>>>>>
>>>>>>> else
>>>
>>>> needs the capacity, everything works at the fastest insert speed
>>>>>>>>
>>>>>>> (even
>>>
>>>> if
>>>>>
>>>>>> less optimized than if there were larger batches) But if anything
>>>>>>>>
>>>>>>> else
>>>
>>>> on
>>>>>
>>>>>> the system need the resources, the indexing threads work slower,
>>>>>>>>
>>>>>>> which
>>>
>>>> will
>>>>>>>> result in larger batches.
>>>>>>>>
>>>>>>>> all self tuning.
>>>>>>>>
>>>>>>>> David Lang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>>>>>>>>
>>>>>>>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
>>>>>>>>
>>>>>>>>  From: Radu Gheorghe <[email protected]>
>>>>>>>>> Reply-To: rsyslog-users <[email protected]>
>>>>>>>>> To: rsyslog-users <[email protected]>
>>>>>>>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk
>>>>>>>>>
>>>>>>>> size?
>>>>>
>>>>>>
>>>>>>>>> Maybe this went overlooked, but David suggested earlier that you
>>>>>>>>>
>>>>>>>> can
>>>
>>>> slowdown the queue to let more messages arrive before sending a
>>>>>>>>>
>>>>>>>> bulk.
>>>
>>>> queue.dequeueslowdown
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>>
>>>>> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
>>>>>
>>>>>>
>>>>>>>>>>  is the option and it's in microseconds.
>>>>>>>>>
>>>>>>>>> I think you have a valid point in that if batches are too small
>>>>>>>>>
>>>>>>>> then
>>>
>>>> Elasticsearch will do more work than necessary (as indexing in very
>>>>>>>>> small
>>>>>>>>> batches is more expensive). Plus, since the refresh rate (i.e. how
>>>>>>>>>
>>>>>>>> long
>>>>>
>>>>>> it
>>>>>>>>> may take for an indexed doc to be visible to searches, because
>>>>>>>>>
>>>>>>>> Searchers
>>>>>
>>>>>> reopen their view in the index at a certain interval) is typically
>>>>>>>>>
>>>>>>>> a
>>>
>>>> few
>>>>>
>>>>>> seconds
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
>>>
>>>>
>>>>>>>>>  ,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> waiting a bit before submitting a batch will have no impact on the
>>>>>>>>>
>>>>>>>> user
>>>>>
>>>>>> experience.
>>>>>>>>>
>>>>>>>>> On the other hand, in my experience you'll be sending small
>>>>>>>>>
>>>>>>>> batches if
>>>
>>>> the
>>>>>>>>> indexing rate is low - which means the load on ES is low anyway. So
>>>>>>>>>
>>>>>>>> I'm
>>>>>
>>>>>> not
>>>>>>>>> sure if optimizing this will actually give significant results. You
>>>>>>>>> could
>>>>>>>>> introduce that slowdown, but then rsyslog may have trouble keeping
>>>>>>>>>
>>>>>>>> up
>>>
>>>> when
>>>>>>>>> the load is high. You can compensate by raising the limit of
>>>>>>>>>
>>>>>>>> maximum
>>>
>>>> worker
>>>>>>>>> threads for the queue (queue.workerthreads) and play with
>>>>>>>>> queue.workerthreadminimummessages and
>>>>>>>>>
>>>>>>>> queue.timeoutworkerthreadshutdown
>>>>>
>>>>>> to
>>>>>>>>> make rsyslog spawn new threads when there are at least N messages
>>>>>>>>>
>>>>>>>> in
>>>
>>>> the
>>>>>
>>>>>> queue (that's what min messages does) and kill them when the queue
>>>>>>>>>
>>>>>>>> is
>>>
>>>> smaller than that for a while (that's the timeout option). If the
>>>>>>>>>
>>>>>>>> load
>>>
>>>> is
>>>>>>>>> low, you'd have just one thread that works with that slowdown.
>>>>>>>>>
>>>>>>>>> I hope this helps.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Radu
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>>>>>
>>>>>>>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <
>>>>>>>>>
>>>>>>>> [email protected]>
>>>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>  So how can I define the output queue configuration?
>>>>>>>>>
>>>>>>>>>  I found the omelasticsearch action process 60000/min, and the
>>>>>>>>>> queue.discarded.nf was 600000.
>>>>>>>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
>>>>>>>>>> Content-Length`
>>>>>>>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk
>>>>>>>>>>
>>>>>>>>> size
>>>
>>>> is
>>>>>>>>>> only 10. Too small.
>>>>>>>>>>
>>>>>>>>>> Sometimes when I restart rsyslogd, the Content-Length grows to
>>>>>>>>>>
>>>>>>>>> 8MB.
>>>
>>>> Why~~
>>>>>>>>>>
>>>>>>>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>  On Tue, 5 May 2015, chenlin rao wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog into
>>>>>>>>>>>
>>>>>>>>>>>  Elasticsearch cluster. I found the document told that the plugin
>>>>>>>>>>>
>>>>>>>>>>>> would
>>>>>>>>>>>>
>>>>>>>>>>>>  use
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that every
>>>>>>>>>>
>>>>>>>>> POST
>>>>>
>>>>>>
>>>>>>>>>>>  only
>>>>>>>>>>>> has 8-9 events in the bulk body while my input flow is nearly
>>>>>>>>>>>>
>>>>>>>>>>> 10k
>>>
>>>> per
>>>>>
>>>>>> second.
>>>>>>>>>>>>
>>>>>>>>>>>> How can I force a larger bulk size?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Rsyslog adapts the size to the number of messages waiting to be
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  delivered,
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  so if it's keeping up at that size, it won't increase it.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> are you running impstats? if so, please look at the queue size.
>>>>>>>>>>>
>>>>>>>>>> If
>>>
>>>> it's
>>>>>>>>>>> staying low, then you just have a nice, fast ES instance that is
>>>>>>>>>>>
>>>>>>>>>> able
>>>>>
>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>>  do
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  1k inserts/sec (which is not unreasonable), so each insert would
>>>>>>>>>>
>>>>>>>>> be
>>>
>>>> <10
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  messages.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Trying to force a larger bulk size would mean not inserting
>>>>>>>>>>>
>>>>>>>>>> messages
>>>
>>>> as
>>>>>>>>>>> fast as we can, and instead pausing and waiting for enough
>>>>>>>>>>>
>>>>>>>>>> messages
>>>
>>>> to
>>>>>
>>>>>> accumulate to fill the bulk size. We never delay messages
>>>>>>>>>>> intentionally,
>>>>>>>>>>> each pass through the loop we grab all pending messages, up to
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> max
>>>>>
>>>>>> dequeue size, and deliver them. If more messages arrive than we
>>>>>>>>>>> deliver,
>>>>>>>>>>> the next pass through the queue is larger, so we grab more
>>>>>>>>>>>
>>>>>>>>>> messages
>>>
>>>> (this
>>>>>>>>>>> quickly stabilizes to inserting messages as fast as they are
>>>>>>>>>>>
>>>>>>>>>> arriving)
>>>>>
>>>>>>
>>>>>>>>>>> there is a dequeue delay that forces rsyslog to sit and do
>>>>>>>>>>>
>>>>>>>>>> nothing
>>>
>>>>
>>>>>>>>>>>  between
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  one batch of messages and the next. It's use is discouraged, but
>>>>>>>>>>
>>>>>>>>>>> delaying
>>>>>>>>>>> like this would allow more messages to accumulate.
>>>>>>>>>>>
>>>>>>>>>>> David Lang
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>>> myriad
>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>>>>>>>>
>>>>>>>>>> if
>>>
>>>> you
>>>>>
>>>>>> DON'T LIKE THAT.
>>>>>>>>>>>
>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>
>>>>>>>>>>>  rsyslog mailing list
>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>> myriad
>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>>
>>>>>>>>> you
>>>>>
>>>>>> DON'T LIKE THAT.
>>>>>>>>>>
>>>>>>>>>>  _______________________________________________
>>>>>>>>>>
>>>>>>>>>>  rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>
>>>>>>>> myriad
>>>>>
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>
>>>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>>  _______________________________________________
>>>>>>>>>
>>>>>>>>>  rsyslog mailing list
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>
>>>>>>> myriad
>>>>>
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>
>>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>>
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>
>>>>>> myriad
>>>
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>
>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>
>>>>> myriad
>>>
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>>  _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>>  _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>>
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Janmejay
>>> http://codehunk.wordpress.com
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>  _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>  _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Reply via email to