Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Radu Gheorghe Wed, 17 Jun 2015 04:54:48 -0700

That might work, thanks for the feedback and the interesting article!

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:

> Probably a risk, something to keep an eye on (or watch the pstats from
> rsyslog and tweak the priority if the queue too large)
>
> I also believe that the vast majority of searches that are typically done
> are done wrong (see my dashboards/reports article at
> https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
> )
>
> David Lang
>
> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>
>  This sounds interesting, David. I guess it's possible to renice just some
>> threads from an app and make it "nicer", right? Googling a bit it seems it
>> is possible.
>>
>> The only problem I see with this approach is that searches (and other
>> kinds
>> of requests from other threadpools
>> <
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
>> >)
>>
>> would automatically have higher priority so, with heavy searches, indexing
>> might fall behind more than usual. Am I getting it right?
>>
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote:
>>
>>  Thinking about it, probably the best thing to do is to renice the ES
>>> threads that accept the messages from rsyslog. That way if nothing else
>>> needs the capacity, everything works at the fastest insert speed (even if
>>> less optimized than if there were larger batches) But if anything else on
>>> the system need the resources, the indexing threads work slower, which
>>> will
>>> result in larger batches.
>>>
>>> all self tuning.
>>>
>>> David Lang
>>>
>>>
>>>
>>> On Wed, 17 Jun 2015, Radu Gheorghe wrote:
>>>
>>>  Date: Wed, 17 Jun 2015 10:20:46 +0300
>>>
>>>> From: Radu Gheorghe <[email protected]>
>>>> Reply-To: rsyslog-users <[email protected]>
>>>> To: rsyslog-users <[email protected]>
>>>> Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk size?
>>>>
>>>> Maybe this went overlooked, but David suggested earlier that you can
>>>> slowdown the queue to let more messages arrive before sending a bulk.
>>>> queue.dequeueslowdown
>>>> <
>>>> http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html
>>>> >
>>>> is the option and it's in microseconds.
>>>>
>>>> I think you have a valid point in that if batches are too small then
>>>> Elasticsearch will do more work than necessary (as indexing in very
>>>> small
>>>> batches is more expensive). Plus, since the refresh rate (i.e. how long
>>>> it
>>>> may take for an indexed doc to be visible to searches, because Searchers
>>>> reopen their view in the index at a certain interval) is typically a few
>>>> seconds
>>>> <
>>>>
>>>> http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
>>>>
>>>>> ,
>>>>>
>>>>
>>>> waiting a bit before submitting a batch will have no impact on the user
>>>> experience.
>>>>
>>>> On the other hand, in my experience you'll be sending small batches if
>>>> the
>>>> indexing rate is low - which means the load on ES is low anyway. So I'm
>>>> not
>>>> sure if optimizing this will actually give significant results. You
>>>> could
>>>> introduce that slowdown, but then rsyslog may have trouble keeping up
>>>> when
>>>> the load is high. You can compensate by raising the limit of maximum
>>>> worker
>>>> threads for the queue (queue.workerthreads) and play with
>>>> queue.workerthreadminimummessages and queue.timeoutworkerthreadshutdown
>>>> to
>>>> make rsyslog spawn new threads when there are at least N messages in the
>>>> queue (that's what min messages does) and kill them when the queue is
>>>> smaller than that for a while (that's the timeout option). If the load
>>>> is
>>>> low, you'd have just one thread that works with that slowdown.
>>>>
>>>> I hope this helps.
>>>>
>>>> Best regards,
>>>> Radu
>>>>
>>>> --
>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>> On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <[email protected]>
>>>> wrote:
>>>>
>>>>  So how can I define the output queue configuration?
>>>>
>>>>> I found the omelasticsearch action process 60000/min, and the
>>>>> queue.discarded.nf was 600000.
>>>>> I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
>>>>> Content-Length`
>>>>> and saw the length is 1.6k. As my msgline size is 0.1k, the bulk size
>>>>> is
>>>>> only 10. Too small.
>>>>>
>>>>> Sometimes when I restart rsyslogd, the Content-Length grows to 8MB.
>>>>> Why~~
>>>>>
>>>>> 2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:
>>>>>
>>>>>  On Tue, 5 May 2015, chenlin rao wrote:
>>>>>
>>>>>>
>>>>>>  I'm using rsyslog-elasticsearch to writing nginx accesslog into
>>>>>>
>>>>>>  Elasticsearch cluster. I found the document told that the plugin
>>>>>>> would
>>>>>>>
>>>>>>>  use
>>>>>>
>>>>>
>>>>>  queue.dequeuesize as the bulk size.But my tcpdump show that every POST
>>>>>>
>>>>>>> only
>>>>>>> has 8-9 events in the bulk body while my input flow is nearly 10k per
>>>>>>> second.
>>>>>>>
>>>>>>> How can I force a larger bulk size?
>>>>>>>
>>>>>>>
>>>>>>>  Rsyslog adapts the size to the number of messages waiting to be
>>>>>>
>>>>>>  delivered,
>>>>>
>>>>>  so if it's keeping up at that size, it won't increase it.
>>>>>>
>>>>>> are you running impstats? if so, please look at the queue size. If
>>>>>> it's
>>>>>> staying low, then you just have a nice, fast ES instance that is able
>>>>>> to
>>>>>>
>>>>>>  do
>>>>>
>>>>>  1k inserts/sec (which is not unreasonable), so each insert would be
>>>>>> <10
>>>>>>
>>>>>>>
>>>>>>>  messages.
>>>>>>
>>>>>> Trying to force a larger bulk size would mean not inserting messages
>>>>>> as
>>>>>> fast as we can, and instead pausing and waiting for enough messages to
>>>>>> accumulate to fill the bulk size. We never delay messages
>>>>>> intentionally,
>>>>>> each pass through the loop we grab all pending messages, up to the max
>>>>>> dequeue size, and deliver them. If more messages arrive than we
>>>>>> deliver,
>>>>>> the next pass through the queue is larger, so we grab more messages
>>>>>> (this
>>>>>> quickly stabilizes to inserting messages as fast as they are arriving)
>>>>>>
>>>>>> there is a dequeue delay that forces rsyslog to sit and do nothing
>>>>>>
>>>>>>  between
>>>>>
>>>>>  one batch of messages and the next. It's use is discouraged, but
>>>>>> delaying
>>>>>> like this would allow more messages to accumulate.
>>>>>>
>>>>>> David Lang
>>>>>> _______________________________________________
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>>  _______________________________________________
>>>>>>
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>>  _______________________________________________
>>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>>  _______________________________________________
>>>>
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>  _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>  _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] how to force a larger omelasticsearch bulk size?

Reply via email to