On Wed, 28 Dec 2016, mostolog--- via rsyslog wrote:

While testing our current infrastructure we have suffered a /log explosion/, ie: errors when processing logs caused error logs on the machine that also caused errors when processed...and finally, disk became full and everything died.

I'm wondering if worrying about this is useful, or how could it be managed/prevented (as automatically as possible).

For example:

* Rate-limiting for specific log events (eg: rate limiting events with
  syslogtag="foo" or matching a filter)
* Having counters and ignore events of /type/ if more than N /
  last_X_minutes
* Being able to reduce rsyslog verbosity, logging "fail and recover"
  messages, instead of logging an error on each failure.

How do you handle those situations? Should we stop worrying about things that haven't happened and probably won't ever happen?

monitor disk space and alert if it starts filling up.

monitor logs/sec and alert if they jump much higher than normal

ideally setup anomoly detection and alert when the rate of disk usage/logs per sec are unusually high OR unusually low (see https://www.usenix.org/legacy/publications/library/proceedings/lisa2000/full_papers/brutlag/brutlag_html/index.html for more info)

overall, this isn't likely to happen once you get the system setup and running, so many places don't do anything special for this at all.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to