On Wed, 22 Apr 2015, Joe Blow wrote:
Hey all,I've got a log server (running the latest and greatest rsyslog as of yesterday) which i've been seeing randomly dying. I load balance and have scripts to check if rsyslog isn't running, and if it is restart it, but i'm having a really tough time tracking down what could be making rsyslog crash. Sometimes it seems like the rsyslog daemon dies, and because of it I can't login via SSH. Because of this i've been forced to keep a screen open with the rsyslog boxes just in case i need to restart rsyslog (which allows me to login again). I'm taking logs in from a number of different sources (asa, snare, etc... all of them with their own disk assisted output queues, all outputting to Elasticsearch). If i'm monitoring the queues, i don't see any queues backing up or anything which would lead me to believe rsyslog is balooning memory and going to die. I have a number of these logs in my catchall bucket: Framing Error in received TCP message: delimiter is not SP but has ASCII value 46. [v8.9.0] I see a few of these within the error logs too: "UnavailableShardsException[[cisco-20150420][4] [3] shardIt, [1] active : Timeout waiting for [3m], request: org.elasticsearch.action.bulk.BulkShardRequest@5d48afd4]" Could either of these cause rsyslog to hard die? How would you recommend finding these seemingly random failures? Here are what most of my ES output queues look like: <snip> if $rawmsg contains "%ASA-" or $rawmsg contains "%PIX-" then{ action(type="mmnormalize" userawmsg="on" rulebase="/etc/rsyslog.d/asa.rule") action(type="omelasticsearch" name="rsys_asa" server="10.10.10.10" serverport="9200" template="ciscoasa" asyncrepl="on" searchType="asa" searchIndex="ciscoasaindex" timeout="3m" dynSearchIndex="on" bulkmode="on" errorfile="asa_err.log" queue.type="linkedlist" queue.filename="cisco.rsysq" queue.size="15000000" queue.saveonshutdown="on" queue.maxdiskspace="100g" queue.dequeuebatchsize="5000" action.resumeretrycount="30")stop} </snip> Anything glaring here? Could my retries be killing rsyslog? Any ideas on how i should go about troubleshooting?
start by configuring impstats and have it log to a local file so you get it's info even if other logging can't happen.
the odds are good that what is happening isn't that rsylog is crashing, but rather that it's filling it's queues and then not accepting new input. If rsyslog isn't running, then you would still be able to login, but if it's running but it's queues are full, you would be unable to log the ssh login and get the symptoms that you are describing.
the impstats data will let you track down which action is not keeping up. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

