On Tue, 3 Nov 2009, Aaron Wiebe wrote:
This is still taking place this hour - though I obviously can't
restart onto a newer version without losing this case. Is there
anything I can do in the current configuration to try and debug this
situation?
if you can strace each thread for a few seconds it may give you an idea
what's happening.
in the meantime you need to stop the system from getting further behind,
can you redirect or reconfigure the senders so that they are not sending
new logs to this box so that it can dig itself out (stopping the input may
be enough by itself, rsyslog has historicly done a LOT of locking around
the main queue, and if you have a full queue with more data trying to be
delivered it can really slow things down. I wouldn't expect it to be this
much, but it could be part of it)
David Lang
(We're up to 18:46:51 now!)
-Aaron
On Tue, Nov 3, 2009 at 1:46 AM, Rainer Gerhards
<[email protected]> wrote:
mhhh... This is obviously not intended behavior. There are some rate-limiting
features that can deliberately slow down a queue, but I guess you have not
configured these. So there either is a bug that activates them at some point
during processing (e.g. an invalid memory access could do that) or there is
some other bug that causes the dequeue to happen very slowly. In any case, it
is hard to guess.
Given the volume of the messages, a debug log probably does not help. We
could gain a little insight in at least the queue sizes that rsyslog sees via
imdiag plus the (very rudamentary) v5 debug front-end (it doesn't require the
engine to be v5!). I would need to spend at least a little work on the
front-end to enable remote access, but that's not really a problem.
One other thing is that I am still holding a v4-beta with additional fixes as
I didn't want to put these in the middle of some other debugging. However,
these fixes address potential memory problems, so the most appropriate course
of action would be to give that version at least a try. It needs to be
released in the next days in any case.
I have uploaded that (pre-4.5.6) version so that you can give it a try if you
like:
http://www.rsyslog.com/download/rsyslog/pre/rsyslog-4.5.6.tar.gz
I think it would good if you could try it at least once, because I think any
other troubleshooting will boil down to looking at the fixes this version
contains.
Rainer
-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of Aaron Wiebe
Sent: Monday, November 02, 2009 11:52 PM
To: rsyslog-users
Subject: [rsyslog] Queuing bug (4.5.5)
Greetings all,
I appear to have run into an issue with 4.5.5 where queues are not
being flushed in a timely manner. In this specific case, I have data
from last wednesday that is being written to disk in small chunks
today since last wednesday. Unfortunately I cannot be too specific in
details, but here is what I can see:
Using omfile+gzip, there appears to have been a decent burst in
traffic sometime last wednesday. The rsyslog instance has grown to
1.8GB of memory and stayed there for a while. I have both main
message and action queues defined using fixedarray, and I see no
on-disk queues (appears to be entirely in memory).
I've got templates writing out to filenames using an hourly timestamp
(filenames like: [token]-[host]-YYYYMMDD-HH.txt.gz) In some of those
files, all of them less than 5k in size, there are between 5 and 15
lines of content, all of them from last wednesday, and within a few
seconds of each other. It's almost like there was a significant queue
built up, the hour rolled over, and only the first block of lines were
pulled from the queue. Then the hour rolled over again, and another
block of lines were pulled from the queue. Then the next hour, then
another 5-15 lines.
Is it possible that one of the queues still has a good chunk of data
built up, and is flushing it out very slowly? It hasn't been
consistantly at the top of the hour either, and not every hour. But
the log lines themselves are sequentially written out, and usually the
lines are within a few seconds of each other.
For example:
syslog-myhost-20091102-18.txt.gz: 3 lines, 2 with TS Oct 21 18:46:34
and one 18:46:35
syslog-myhost-20091102-19.txt.gz: 17 lines, 3 Oct 21 18:46:35, 14
lines Oct 21 18:46:36
syslog-myhost-20091102-20.txt.gz: 12 lines, 8 Oct 21 18:46:36, 4
lines Oct 21 18:46:37
Thoughts?
-Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com