On Tue, 6 Dec 2016, Arik Mitschang wrote:

Hi David,

the problem is probably not on the system that stops forwarding messages, but
rather on the system they are forwarding the messages to.

When the queues fill up, unles you have configured rsyslog to throw away
messages, it will stop accepting any new messages as it can't put them in the
queue. This is "working as designed" (one of these days I've got to sit down and
finish writing my "how to make your logs unreliable" article :-)

There are several reasons I do not think this is the case:

We have multiple relays downstream connected to upstream relays, and see
messages come through these other paths when this situation occurs.

Also, a frequent solution to the problem is to restart the stuck process
(and only that one), where we see the messages flush through upstream
relays when shutting down, implying they are not holding back messages.

Finally, we do have impstats enabled, it is going through he main queue
but this actually allows to probe the status of the queue. We have
nagios alert when there are no stats messages coming in a fixed time
window. Before getting stuck, messages in the queue are at maximum
(actually we see 700k in the main queue which is set at 1M), then we see
no more stats from only the stuck relays, others keep pushing stats and
reflect the reduction in message throughput in their main queue sizes.

do you have the stats messages configured to go through the main queue (like any other message)? or do you have them set to use a separate queue so that they will get through even if the main queue is blocked?

can you configure one to write to either a separate queue (i.e. ruleset with it's own queue) or to a file so that we can see what the stats look like when things break? On my system I created a 'high priority' ruleset with it's own queue for the stats to go through that bypassed my intermediate relays and delivered directly to my central servers so that if anything happened to the main queue, I would still get the stats data. I also had this write to the local disk and send stats to my monitoring system.

If the stats messages are queued and sent after the restart, what do they show during the time when you have trouble? do they show any of the actions being suspended?

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to