Re: [rsyslog] Help Optimizing Action Queue

Devin Christensen Thu, 18 Sep 2014 15:12:02 -0700

I'm getting closer on this. Adding queue.timeoutenqueue="0" to theaction gave me what I was after in terms of discarding. It filled up thedisk queue to the configured max and then started discarding allmessages after that. I'm using impstats to track things. Here are therelative output lines after filling up my disk-assisted queue:

action 1 queue[DA]: size=4101477 enqueued=5948040 full=1846563discarded.full=1846563 discarded.nf=0 maxqsize=4101477action 1 queue: size=51960 enqueued=6000000 full=0 discarded.full=0discarded.nf=0 maxqsize=61776main Q: size=17 enqueued=6000941 full=0 discarded.full=0 discarded.nf=0maxqsize=5309

As you can see, the messages are only getting discarded at "action 1queue[DA]" which is what I expected. So things are looking great there.Back pressure is not piling up on the main Q (maxqsize=5309) either.Very happy about the way that is working, but I'm now bumping into anissue during restarts. When I restart rsyslog, the impstats outputchanges from what's above to:

action 1 queue[DA]: size=0 enqueued=0 full=0 discarded.full=0discarded.nf=0 maxqsize=0rsyslogd-pstats: action 1 queue: size=0 enqueued=0 full=0discarded.full=0 discarded.nf=0 maxqsize=0main Q: size=17 enqueued=21 full=0 discarded.full=0 discarded.nf=0maxqsize=17

As you can see, all the logs in "action 1 queue[DA]" are now gone(size=0), even though the spool files still exist on disk. I expectedthe "action 1 queue[DA]" to remain at "size=4101477" after a restart. Iconfirmed the logs were lost by starting the omfwd receiver again andwaiting for 5 minutes. No logs were ever transmitted; howerver, it doesreceive new logs just fine. When I shutdown the receiver and send morelogs I get:

action 1 queue[DA]: size=101666 enqueued=101667 full=0 discarded.full=0discarded.nf=0 maxqsize=101666action 1 queue: size=77270 enqueued=500000 full=290102discarded.full=290102 discarded.nf=0 maxqsize=100000main Q: size=17 enqueued=501421 full=0 discarded.full=0 discarded.nf=0maxqsize=475

Now the logs are getting discared at "action 1 queue" rather than"action 1 queue[DA]". It also looks like rsyslog reuses the spool files.Even though "action 1 queue[DA]" filled to "size=101666", the actual ondisk size never increased beyond the size it was at when I restartedrsyslog. I'm going to do some more experimenting, but do let me know ifyou see antyhing wrong with my setup. I'm currently doing all testing onrsyslog 8.4.0. I'll try 7.6.3 if I can't get 8.4.0 working. Here's mytest configuration:


################################################################################
# Global Config
################################################################################
$FileGroup adm
$FileOwner syslog
$PrivDropToGroup syslog
$PrivDropToUser syslog
$RepeatedMsgReduction off
$Umask 0022
$WorkDirectory /var/spool/rsyslog

################################################################################
# Input
################################################################################
module(load="imuxsock") # provides support for local system logging
module(load="imklog")   # provides kernel logging support
module(load="imfile")
module(load="impstats" interval="5")

################################################################################
# Main Queue
################################################################################
main_queue(
  queue.type="LinkedList"
  queue.size="100000"
  queue.dequeuebatchsize="1000"
  queue.workerthreads="2"
  queue.dequeueslowdown="0"
)

################################################################################
# Templates
################################################################################
template(name="preformatted" type="list") {
  property(name="msg")
}

################################################################################
# Output
################################################################################

# Defaults for file output
module(
  load="builtin:omfile"
  template="RSYSLOG_TraditionalFileFormat"
  dirCreateMode="0755"
  fileCreateMode="0644"
)

# Test forward action
local1.* action(
  type="omfwd"
  Target="localhost"
  Port="4414"
  Protocol="tcp"
  template="preformatted"
  action.resumeRetryCount="-1"
  action.resumeInterval="15"
  queue.type="LinkedList"
  queue.size="100000"
  queue.highwatermark="60000"
  queue.lowwatermark="50000"
  queue.dequeuebatchsize="10"
  queue.timeoutenqueue="0"
  queue.workerthreads="2"
  queue.filename="fwd_test"
  queue.maxdiskspace="4g"
  queue.maxfilesize="64m"
  queue.saveonshutdown="on"
)

auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none;\
local0,local1,local2,local3,local4,local5,local6,local7.none /var/log/syslog
syslog.*        /var/log/rsyslog.log #rsyslog error messages
kern.*          /var/log/kern.log
mail.*          /var/log/mail.log
mail.err        /var/log/mail.err
news.crit       /var/log/news/news.crit
news.err        /var/log/news/news.err
news.notice     /var/log/news/news.notice
*.emerg         :omusrmsg:*

On 09/18/2014 02:50 AM, Rainer Gerhards wrote:

2014-09-18 9:03 GMT+02:00 Radu Gheorghe <[email protected]>:

Hi Devin,

My understanding is that a timeoutenqueue of 0 might have you lose
messages, as they're enqueued from the main queue to the action queue when
rsyslog is busy, because it waits for... no time... for the action queue to
enqueue that message. I was losing tons of messages during a performance
test with that setting. So I changed it to the default and had no more
problems.

It's used in cases where the queue in question is full or near-full, as
Devin quoted:

    We can not hold processing infinitely, not even when throtteling.
    For example, throtteling the local log socket too long would cause
    the system at whole come to a standstill. To prevent this, rsyslogd
    times out after a configured period ("$<object>QueueTimeoutEnqueue
",
    specified in milliseconds) if no space becomes available. As a last
    resort, it then discards the newly arrived message.

You probably lost messages because the queue was full and the output was
not able to free space. With defaults, the output could quickly enough free
space and so the whole process continued without loss (remember, if there
is no timeout, even a nanosecond is "too long" a waittime).

Rainer

I'm still not 100% on what timeoutenqueue does, but I'd leave it to the
default for now on the action queue. Maybe Rainer or someone else who
understand this stuff better can clarify.

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Sep 18, 2014 at 9:52 AM, Devin Christensen <
[email protected]> wrote:

Thanks for the clarification. In that case, am I right in thinking that
the following action will discard all messages it receives when its disk
queue reaches 1 gigabyte, and avoid any throttling?

local1.* action(
    type="omfwd"
    Target="remote.example.com"
    Port="4414"
    Protocol="tcp"
    template="preformatted"
    action.resumeRetryCount="-1"
    action.resumeInterval="15"
    queue.type="LinkedList"
    queue.size="100000"
    queue.highwatermark="60000"
    queue.lowwatermark="50000"
    queue.dequeuebatchsize="1000"
    queue.timeoutenqueue="0"
    queue.workerthreads="2"
    queue.filename="fwd_to_remote"
    queue.maxdisksize="1g"
    queue.maxfilesize="16m"
    queue.saveonshutdown="on"
)

On 09/18/2014 12:43 AM, Rainer Gerhards wrote:

2014-09-18 8:28 GMT+02:00 Devin Christensen <
[email protected]>:

  Actually, it looks like there may be some conflicting documentation

around
"queue.timeoutenqueue". From "Understanding Rsyslog Queues"

     We can not hold processing infinitely, not even when throtteling.
     For example, throtteling the local log socket too long would cause
     the system at whole come to a standstill. To prevent this, rsyslogd
     times out after a configured period

("$<object>QueueTimeoutEnqueue",

     specified in milliseconds) if no space becomes available. As a last
     resort, it then discards the newly arrived message.

     *If you do not like throtteling, set the timeout to 0 - the message
     will then immediately be discarded*. If you use a high timeout, be
     sure you know what you do. If a high main message queue enqueue
     timeout is set, it can lead to something like a complete hang of

the

     system. The same problem does not apply to action queues.

  From "General Queue Parameters"

     *queue.timeoutenqueue* number number is timeout in ms (1000ms is
     1sec!), default 2000, *0 means indefinite*


  This is wrong, just fixed the doc. 0 is "discard immediately".

Thx,
Rainer

  Guess I won't tinker with that without a bit of clarification.


On 09/18/2014 12:15 AM, Devin Christensen wrote:

  Thanks for the quick response. The other setting that I thought might

help is "queue.timeoutenqueue" which I was considering setting to 0 on
the
action queue. The documentation leads me to believe this will discard
any
new messages arriving to the action when the disk queue reaches its

max

size. Does that sound right?

If I can isolate the discarded messages to those going to the omfwd
action that would be ideal. None of the other logs should cause back
pressure becuase they're not dependent on a remote host being up. Of
course, I think I should also add queue.discardmark and
queue.discardseverity to the main queue for additional reassurance.

On 09/17/2014 11:54 PM, Radu Gheorghe wrote:

  Hi Devin,

I'm not 100% sure about this, but it sounds like what you should do

is

to
apply queue.discardmark and queue.discardseverity on the main queue.
This
should allow the action queue to fill up (to that 1GB), and put
pressure
on
the main queue. When main queue has more than $DISCARDMARK messages,

it

should begin discarding messages with a severity number higher than
$DISCARDSEVERITY.

You could go all-or-nothing with this, and discard everything
(severity=1
or maybe even 0 works?) when you hit 999999 messages, or you can

show a

bit
of mercy and, say, let only errors pass after you have 800K messages

in

the
queue. In the latter case you'd risk putting pressure back on the
socket,
though.

It sounds like you already know about all the queue parameters, but
just
in
case you missed the docs:
http://www.rsyslog.com/doc/master/rainerscript/queue_parameters.html

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, Sep 18, 2014 at 8:41 AM, Devin Christensen <
[email protected]> wrote:

   I'm trying to configure an action queue so that it will discard all

messages immediately if it fills up it's allocated disk space. The

log

messages are coming in on the local socket. I just recovered from a
scenario where rsyslog was bringing systems to a halt, presumably
because
back pressure is ending up on the local log socket, filling it up,

and

letting nothing else write.

Here is my current configuration for my main queue and the action.

main_queue(
     queue.type="LinkedList"
     queue.size="1000000"
     queue.dequeuebatchsize="1000"
     queue.workerthreads="5"
     queue.dequeueslowdown="0"
)

local1.* action(
     type="omfwd"
     Target="remote.example.com"
     Port="4414"
     Protocol="tcp"
     template="preformatted"
     action.resumeRetryCount="-1"
     action.resumeInterval="15"
     queue.type="LinkedList"
     queue.size="100000"
     queue.highwatermark="60000"
     queue.lowwatermark="50000"
     queue.dequeuebatchsize="1000"
     queue.workerthreads="2"
     queue.filename="fwd_preformatted_to_logflume"
     queue.maxdisksize="1g"
     queue.maxfilesize="16m"
     queue.saveonshutdown="on"
)

In the event that the target (remote.example.com) is unavailable, I
would
like logs to spool to disk upto 1 gigabyte, and discard everything
immediately after that. I want to avoid any back pressure ending up

on

the
local log socket. It's much more valuable for our systems to

continue

running than to get all the log data.

My question is, what am I missing or completely messed up?
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if

you

DON'T LIKE THAT.

   _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if

you

DON'T LIKE THAT.

  _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a

myriad

of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

  _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a

myriad

of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

  _______________________________________________

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Help Optimizing Action Queue

Reply via email to