On Wed, 9 Sep 2015, Risto Vaarandi wrote:

I am currently tuning one of my rsyslog+elasticsearch installations and questions about optimal settings have emerged. In the web, there is a nice guide with several recommendations http://blog.sematext.com/2014/01/20/rsyslog-8-1-elasticsearch-output-performance/, but it has one elasticsearch action, while my configuration has many. In a nutshell, my current setup looks like this:

<snip>
Altogether, I have about 20 omelasticsearch actions in the above block of statements. My questions is -- should I use larger values for queue and batch size than just 10000 and 500? The guide http://blog.sematext.com/2014/01/20/rsyslog-8-1-elasticsearch-output-performance/ recommends much larger values, but these are used for only one action statement which handles all writes to Elasticsearch. In contrast, my setup has many actions, and although some actions are less busy, the most active 7-8 actions see roughly the same amount of traffic. This installations receives 4-5 thousand messages per second, but the workload will increase gradually. Also, what about the queue sizes for the entire ruleset, do the current settings look reasonable? (As I have understood, each ruleset uses its own queue, and changing the size of the main queue does not influence the ruleset.)

Are there any other settings I should consider, in order to increase performance?

Max queue size is how many messages you want to be able to handle if ES is down. Once you get beyond ~2x batch size, it won't have any effect on performance

dequeue size is the max number of messages to pull from the queue and attempt to send to ES at once. If it's too small, the per-send and per-batch processing overhead of ES will waste resources. If it's too large, ES ends up needing too much RAM to process the messages, so the optimum batch size depends on the size of individual messages

If you have 200B messages, you should send a lot more of them at once then if you have 2MB messages. Sending 1000 200B messages will be just over 200KB of data, but 1000 2M messages will need 2G of buffering to process them.

From other comments, it sounds as if ES is limited in the number of inbound
connections it can handle, so you may want to do something along the lines of:

$template manual,"%$.custommessage%\n"
ruleset es(queue.type="linkedlist" queue.size="10000" 
queue.dequeuebatchsize="500") {
  action(type="omelasticsearch" template="manual" dynSearchIndex="on"
     searchIndex="SyslogIndex" server="localhost" bulkmode="on"
     action.resumeretrycount="-1")
{

then do

if $programname contains 'app1' then {
  set $.custommessage = exec_template("App1");
  call es
  stop
}

if $programname contains 'app2' then {
  set $.custommessage = exec_template("App2");
  call es
  stop
}

so that all your sending to ES is funneled through one queue and one connection to ES rather than a separate one per filter.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to