On Tue, 6 Dec 2016, Arik Mitschang wrote:
Hi David,
the problem is probably not on the system that stops forwarding messages, but
rather on the system they are forwarding the messages to.
When the queues fill up, unles you have configured rsyslog to throw away
messages, it will stop accepting any new messages as it can't put them in the
queue. This is "working as designed" (one of these days I've got to sit down and
finish writing my "how to make your logs unreliable" article :-)
There are several reasons I do not think this is the case:
We have multiple relays downstream connected to upstream relays, and see
messages come through these other paths when this situation occurs.
Also, a frequent solution to the problem is to restart the stuck process
(and only that one), where we see the messages flush through upstream
relays when shutting down, implying they are not holding back messages.
Finally, we do have impstats enabled, it is going through he main queue
but this actually allows to probe the status of the queue. We have
nagios alert when there are no stats messages coming in a fixed time
window. Before getting stuck, messages in the queue are at maximum
(actually we see 700k in the main queue which is set at 1M), then we see
no more stats from only the stuck relays, others keep pushing stats and
reflect the reduction in message throughput in their main queue sizes.
do you have the stats messages configured to go through the main queue (like any
other message)? or do you have them set to use a separate queue so that they
will get through even if the main queue is blocked?
can you configure one to write to either a separate queue (i.e. ruleset with
it's own queue) or to a file so that we can see what the stats look like when
things break? On my system I created a 'high priority' ruleset with it's own
queue for the stats to go through that bypassed my intermediate relays and
delivered directly to my central servers so that if anything happened to the
main queue, I would still get the stats data. I also had this write to the local
disk and send stats to my monitoring system.
If the stats messages are queued and sent after the restart, what do they show
during the time when you have trouble? do they show any of the actions being
suspended?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.