Hi, JB: nice retro monitoring! :)
Would seeing the output of dstat help? I wonder if there is a pile of context switching. We'll see what CPU wait looks like and what disk read/write utilization is like. To me this sounds like some bug that simply doesn't dedicate enough threads (or something along those lines) to reading from disk or gives it such low priority that it just doesn't pick up enough data from disk relative to how much it prefers sending logs from memory. Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Wed, Nov 4, 2015 at 3:51 PM, David Lang <[email protected]> wrote: > On Wed, 4 Nov 2015, Rainer Gerhards wrote: > > 2015-11-04 17:24 GMT+01:00 Joe Blow <[email protected]>: >> >>> Thanks for the input Rainer! It definitely helps, and I love hearing >>> some >>> of this from the horse's mouth. Let me start this post by saying i'm >>> extremely grateful for all the help that rsyslog has provided me >>> throughout >>> my career. The support on this mailing list is arguably better than any >>> of >>> the paid vendors i've used for logging/SIEM. >>> >>> As much as I'd love to just give up on this, I'm far too confident in the >>> rsyslog tool to admit defeat. Rsyslog is a beast, but a beast with many >>> knobs :). I'm interested in potentially using the failover option, but >>> the >>> DA queue configuration ease might keep me using that for the time being. >>> >>> What about getting creative and moving the files to another rsyslog >>> instance (on the same box) that doesn't have any input modules? Here's >>> my >>> thoughts: >>> >>> Stop rsyslog. >>> Move rsyslog DA files and .qi file to another directory which a secondary >>> instance of rsyslog knows about (but has no input modules running). >>> Start rsyslog with input modules to get the realtime data flowing back >>> in, >>> with an empty DA queue. >>> Turn on the second rsyslog instance which only knows about the backlog >>> files, and has no input modules. >>> >>> My thought is that this would give at least 1 dedicated worker (per >>> queue) >>> 1 full core of resources to chug through the backlog, and only the >>> backlog. Is my logic sound? >>> >> >> not sure. Let's find the bottleneck. Is it i/o or CPU? What hard facts >> tell you which one it is (you already commented partly on i/o, this >> the more solid questions). >> >> IMO, the disk queue should primarily be i/o intense, and not put a lot >> of stress on the cpu. If so, the logic wouldn't work. >> > > I expect that it's going to be a lot of cpu spent in system calls rather > than I/O. Unless syncing/checkpointing is turned on, most of the I/O > (especially in delivering messages from the queues) is not going to > actually hit the disk. > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

