I seem to remember seeing that there is a different variable for omelasticsearch to set the max bulk size for the ES insert as opposed to the batch size used internally by rsyslog. I don't remember what it is.

David Lang

On Wed, 17 Jun 2015, Radu Gheorghe wrote:

But I think what Chenlin is describing is a bug. He basically ends up with
small batches, but the queue is getting full. So rsyslog could build bigger
batches (there are messages in the queue) but it doesn't. Am I right? If
yes, it's a weird thing, I didn't see this issue before :( Maybe a full
reproduction (complete config of rsyslog + ES + versions + OSes) would help?

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 6:47 PM, singh.janmejay <[email protected]>
wrote:

ES uses worker-pool for indexing(there is a worker-pool for
bulk-indexing too). Prioritizing approach may not be easy, and
possibly a little dangerous too, but sizing that thread-pool is
definitely easy. Just size it to your need and it'll shape the
batch-size optimally when under pressure (like David explained).

On Wed, Jun 17, 2015 at 6:14 PM, chenlin rao <[email protected]>
wrote:
well, there is something I can't understand: If rsyslog use 10msg per
bulk
because Elasticsearch keep up the sending speed, why the output queue
has a
size reached maxsize and discarded.nf/enqueued = 90%.

here is my configuration:

```
   action (
        type="omelasticsearch"
        template="videotmpl"
        server="10.13.244.214"
        dynSearchIndex="on"
        searchIndex="videoIndexName"
        searchType="videoaccess"
        bulkmode="on"
        name="action_videoaccess-es1003"
        queue.size="1000000"
        queue.dequeuebatchsize="40000"
        queue.discardmark="950000"
        queue.highwatermark="600000"
        queue.lowwatermark="400000"
        queue.discardseverity="3"
        queue.dequeueslowdown="10000"
        queue.type="linkedlist"
        queue.maxdiskspace="15G"
        queue.maxfilesize="500M"
        queue.filename="action_videoaccess-es1003"
        queue.checkpointinterval="10000"
        queue.saveonshutdown="on"
    )
```

and pstats.log:

```
2015-06-17T12:17:48.708364+08:00 localhost rsyslogd-pstats:
{"name":"action_videoaccess-es1003

queue[DA]","origin":"core.queue","size":27838434,"enqueued":9,"full":735,"discarded.full":9,"
discarded.nf":0,"maxqsize":28153530}
2015-06-17T12:17:48.708370+08:00 localhost rsyslogd-pstats:
{"name":"action_videoaccess-es1003

queue","origin":"core.queue","size":950000,"enqueued":522298,"full":0,"discarded.full":0,"
discarded.nf":442298,"maxqsize":950000}
```

btw: I had try slowdown setting from 10 to 10000, no change to 10 msg per
bulk.

2015-06-17 19:54 GMT+08:00 Radu Gheorghe <[email protected]>:

That might work, thanks for the feedback and the interesting article!

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 12:58 PM, David Lang <[email protected]> wrote:

Probably a risk, something to keep an eye on (or watch the pstats from
rsyslog and tweak the priority if the queue too large)

I also believe that the vast majority of searches that are typically
done
are done wrong (see my dashboards/reports article at


https://www.usenix.org/publications/login/feb14/logging-reports-dashboards
)

David Lang

On Wed, 17 Jun 2015, Radu Gheorghe wrote:

 This sounds interesting, David. I guess it's possible to renice just
some
threads from an app and make it "nicer", right? Googling a bit it
seems
it
is possible.

The only problem I see with this approach is that searches (and other
kinds
of requests from other threadpools
<


https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
)

would automatically have higher priority so, with heavy searches,
indexing
might fall behind more than usual. Am I getting it right?

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 11:53 AM, David Lang <[email protected]> wrote:

 Thinking about it, probably the best thing to do is to renice the ES
threads that accept the messages from rsyslog. That way if nothing
else
needs the capacity, everything works at the fastest insert speed
(even
if
less optimized than if there were larger batches) But if anything
else
on
the system need the resources, the indexing threads work slower,
which
will
result in larger batches.

all self tuning.

David Lang



On Wed, 17 Jun 2015, Radu Gheorghe wrote:

 Date: Wed, 17 Jun 2015 10:20:46 +0300

From: Radu Gheorghe <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] how to force a larger omelasticsearch bulk
size?

Maybe this went overlooked, but David suggested earlier that you
can
slowdown the queue to let more messages arrive before sending a
bulk.
queue.dequeueslowdown
<

http://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html

is the option and it's in microseconds.

I think you have a valid point in that if batches are too small
then
Elasticsearch will do more work than necessary (as indexing in very
small
batches is more expensive). Plus, since the refresh rate (i.e. how
long
it
may take for an indexed doc to be visible to searches, because
Searchers
reopen their view in the index at a certain interval) is typically
a
few
seconds
<



http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/

,


waiting a bit before submitting a batch will have no impact on the
user
experience.

On the other hand, in my experience you'll be sending small
batches if
the
indexing rate is low - which means the load on ES is low anyway. So
I'm
not
sure if optimizing this will actually give significant results. You
could
introduce that slowdown, but then rsyslog may have trouble keeping
up
when
the load is high. You can compensate by raising the limit of
maximum
worker
threads for the queue (queue.workerthreads) and play with
queue.workerthreadminimummessages and
queue.timeoutworkerthreadshutdown
to
make rsyslog spawn new threads when there are at least N messages
in
the
queue (that's what min messages does) and kill them when the queue
is
smaller than that for a while (that's the timeout option). If the
load
is
low, you'd have just one thread that works with that slowdown.

I hope this helps.

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Jun 17, 2015 at 6:23 AM, chenlin rao <
[email protected]>
wrote:

 So how can I define the output queue configuration?

I found the omelasticsearch action process 60000/min, and the
queue.discarded.nf was 600000.
I run `tcpdump -i eth1 -s0 -A 'tcp dst port 9200'  | grep
Content-Length`
and saw the length is 1.6k. As my msgline size is 0.1k, the bulk
size
is
only 10. Too small.

Sometimes when I restart rsyslogd, the Content-Length grows to
8MB.
Why~~

2015-05-06 1:39 GMT+08:00 David Lang <[email protected]>:

 On Tue, 5 May 2015, chenlin rao wrote:


 I'm using rsyslog-elasticsearch to writing nginx accesslog into

 Elasticsearch cluster. I found the document told that the plugin
would

 use


 queue.dequeuesize as the bulk size.But my tcpdump show that every
POST

only
has 8-9 events in the bulk body while my input flow is nearly
10k
per
second.

How can I force a larger bulk size?


 Rsyslog adapts the size to the number of messages waiting to be

 delivered,

 so if it's keeping up at that size, it won't increase it.

are you running impstats? if so, please look at the queue size.
If
it's
staying low, then you just have a nice, fast ES instance that is
able
to

 do

 1k inserts/sec (which is not unreasonable), so each insert would
be
<10


 messages.

Trying to force a larger bulk size would mean not inserting
messages
as
fast as we can, and instead pausing and waiting for enough
messages
to
accumulate to fill the bulk size. We never delay messages
intentionally,
each pass through the loop we grab all pending messages, up to
the
max
dequeue size, and deliver them. If more messages arrive than we
deliver,
the next pass through the queue is larger, so we grab more
messages
(this
quickly stabilizes to inserting messages as fast as they are
arriving)

there is a dequeue delay that forces rsyslog to sit and do
nothing

 between

 one batch of messages and the next. It's use is discouraged, but
delaying
like this would allow more messages to accumulate.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
you
DON'T LIKE THAT.

 _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.

 _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.

 _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.

 _______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.

 _______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.



--
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to