I've solved it. I'm fully blaming systemd here. **grins and ducks** I noticed the problem got significantly worse once we tuned the dynaFileCacheSize. Messages were missing from more log sources than before, notably the ones that don't send a lot of data. That led me down the road of open files.
By default systemd allows 1024 open files per process. Once we increased dynaFileCacheSize on a bunch of actions, a bunch of the busy talkers opened file descriptors as soon as rsyslog started and held on to them, so the quiet talkers never got a chance to write out to file. The current set limit can be validated by doing 'cat /proc/<pid>/limits' and the currently open number of files can be obtained by doing 'ls /proc/<pid>/fd | wc -l' Also make sure the system-wide max open files is appropriate using 'sysctl fs.file-max' systemd completely ignores /etc/security/limits* To change limits for a service in systemd, edit /usr/lib/systemd/system/rsyslog.service and under [Service] add: LimitNOFILE=<new value> I chose 20480, to give it something we should never hit. Then run: systemctl daemon-reload systemctl restart rsyslog I'm hovering around 5300 open files on my setup, and the sources that were missing logs before are now being written. I'll keep an eye on impstats and tune dynaFileCacheSize more as needed, now that it can open all the files it needs. Thank you David and Rainer for the help. Hopefully someone else finds this useful. Dan Woodruff -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Woodruff, Dan Sent: Wednesday, October 7, 2015 3:18 PM To: rsyslog-users <[email protected]> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss > > >>> So rsyslog processes the messages and says it is writing it out, > >>> but they aren't there. > >> > >> the only time I've seen this happen is if asyncwrites are enabled > >> (in which > >> case they will show up after rsyslog gets a HUP), can you show what > action > >> 20's full output line is? > > > Something more than this? > > no, I was asking what the action() line is (you may have sent it > earlier in the > thread, but I am agressive in deleting messages that I've replied to) Ah, gotcha. Here are the template and action lines: template(name="EduroamACSFile" type="string" string="/var/log/collection/eduroam_acs.log") template(name="EduroamACSFormat" type="string" string="%timestamp% %hostname% %syslogtag%%msg:::escape-cc%\n") /tmp/debug.log;RSYSLOG_DebugFormat action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") > > > I have all my actions named now, main queue size increased to 500k, > > and > I've > > attempted using TCP from this particular problem log source, but the > sending > > device is not sending messages when set to TCP, so I'm going to > > change it > > back to UDP. Other than that, I'm going to let this ride until > > tomorrow to > > gather more impstats output before I report back. > > sounds good. > > If you can't send via TCP, double check that there is a route back to > the source from the syslog server. If there isn't, UDP won't get > through to the app either > (which doesn't explain some messages getting through and not others, > but there's been so much changing that it's possible that we're now > testing from somewhere that has never worked) > Right, I've been changing a lot. I do have a route back to the source - confirmed with a ping. Thanks, Dan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

