So far it's looking like neither... Box is running a bit hot on CPU, but the disks are pretty quiet.
[image: Inline image 1] Cheers, JB On Wed, Nov 4, 2015 at 11:27 AM, Rainer Gerhards <[email protected]> wrote: > 2015-11-04 17:24 GMT+01:00 Joe Blow <[email protected]>: > > Thanks for the input Rainer! It definitely helps, and I love hearing > some > > of this from the horse's mouth. Let me start this post by saying i'm > > extremely grateful for all the help that rsyslog has provided me > throughout > > my career. The support on this mailing list is arguably better than any > of > > the paid vendors i've used for logging/SIEM. > > > > As much as I'd love to just give up on this, I'm far too confident in the > > rsyslog tool to admit defeat. Rsyslog is a beast, but a beast with many > > knobs :). I'm interested in potentially using the failover option, but > the > > DA queue configuration ease might keep me using that for the time being. > > > > What about getting creative and moving the files to another rsyslog > > instance (on the same box) that doesn't have any input modules? Here's > my > > thoughts: > > > > Stop rsyslog. > > Move rsyslog DA files and .qi file to another directory which a secondary > > instance of rsyslog knows about (but has no input modules running). > > Start rsyslog with input modules to get the realtime data flowing back > in, > > with an empty DA queue. > > Turn on the second rsyslog instance which only knows about the backlog > > files, and has no input modules. > > > > My thought is that this would give at least 1 dedicated worker (per > queue) > > 1 full core of resources to chug through the backlog, and only the > > backlog. Is my logic sound? > > not sure. Let's find the bottleneck. Is it i/o or CPU? What hard facts > tell you which one it is (you already commented partly on i/o, this > the more solid questions). > > IMO, the disk queue should primarily be i/o intense, and not put a lot > of stress on the cpu. If so, the logic wouldn't work. > > Rainer > > > > I've run multiple rsyslog instances on the same box for some other > > 'creative' logging projects i've done previously, without too much issue. > > > > Thoughts? > > > > Cheers, > > > > JB > > > > > > On Wed, Nov 4, 2015 at 11:14 AM, Rainer Gerhards < > [email protected]> > > wrote: > > > >> 2015-11-04 17:12 GMT+01:00 Rainer Gerhards <[email protected]>: > >> > 2015-11-04 17:08 GMT+01:00 Joe Blow <[email protected]>: > >> >> I think i've spoken too soon. The in memory queues are clearing > >> extremely > >> >> well with these settings, but the DA stuff is still pretty sluggish > >> (slowed > >> >> down to 50-100 EPS again). I've looked at the box and the IO is > around > >> 10% > >> >> (12 disk array, which performs quite snappily), so i'm sincerely > >> doubting > >> >> this is an IO issue. > >> >> > >> >> The huge feed in question uses around 4 workers before it has enough > >> >> workers to clear the queue as fast as it comes in (400k avg in the > >> queue). > >> >> From my understanding that means that i've got 4 workers@50k bulk > size > >> >> each, and at 200k EPS out (4 workers x 50k EPS) the in memory queue > >> gets no > >> >> bigger. Now this is where my knowledge ends. I've set the low > >> watermark > >> >> to 750k and high watermark to 1 million, with the thoughts that the > low > >> >> watermark is below having all 8 workers full bore (8x100k) and the > high > >> >> watermark is a 250k higher than that (slightly above all workers > going > >> full > >> >> bore). If i'm staying below the low watermark, and still have "free" > >> >> workers, would those workers not try and empty the DA queue? What > would > >> >> help allocate more resources to clearing the DA queue? > >> > > >> > The DA queue always run on one worker, because you can't use more than > >> > one worker with purely sequential files. > >> > > >> > TBH I think your needs simply go above what the current system can > >> > provide. As David said, the queue subsystem could well deserve an > >> > overhaul, but this is a too-big task right now given what else is > >> > going on and there also has been no sponsor for any of that disk queue > >> > work in the past years, so it doesn't seem to have a too high priority > >> > either. > >> > >> mhhh, I should mention a potential work-around: forget the DA queue. > >> Use failover actions. If the action fails, write log lines in native > >> format to a file. Then, use imfile to monitor that file. Together with > >> a smart design of the rulesets, you can probably get all you need out > >> of such a system. Unfortunately, I am even more swamped than usual, so > >> I cannot provide detail advised except than here pointing you to the > >> idea. > >> > >> HTH > >> Rainer > >> > > >> > Rainer > >> >> > >> >> Thanks for the prompt responses. > >> >> > >> >> Cheers, > >> >> > >> >> JB > >> >> > >> >> On Wed, Nov 4, 2015 at 10:53 AM, Rainer Gerhards < > >> [email protected]> > >> >> wrote: > >> >> > >> >>> 2015-11-04 16:44 GMT+01:00 Joe Blow <[email protected]>: > >> >>> > Ok i've played with some numbers.... this is what one of the > massive > >> >>> queues > >> >>> > looks like now, and it *IS* dequeuing much faster (500 EPS from > DA, > >> 25k > >> >>> EPS > >> >>> > from in memory queue. > >> >>> > > >> >>> This may sound a bit strange, and I never tried it, but .. I > wouldn't > >> >>> be surprised if it is actually faster if you put the queue files on > a > >> >>> compressed directory. The idea behind that is that while this > >> >>> obviously eats CPU, it will probably save you a lot of real i/o > >> >>> because the data written to the disk queue can be greatly > compressed. > >> >>> > >> >>> If you give it a try, please let us know the outcome. > >> >>> > >> >>> Rainer > >> >>> > >> >>> > Hopefully this helps some other people who have very massive, disk > >> backed > >> >>> > queues... Please feel free to comment on these values. > >> >>> > > >> >>> > action(type="omelasticsearch" > >> >>> > name="rsys_HugeQ" > >> >>> > server="10.10.10.10" > >> >>> > serverport="9200" > >> >>> > template="HugeQTemplate" > >> >>> > asyncrepl="on" > >> >>> > searchType="HugeType" > >> >>> > searchIndex="HugeQindex" > >> >>> > timeout="3m" > >> >>> > dynSearchIndex="on" > >> >>> > bulkmode="on" > >> >>> > errorfile="HugeQ_err.log" > >> >>> > queue.type="linkedlist" > >> >>> > queue.filename="HugeQ.rsysq" > >> >>> > queue.maxfilesize="2048m" > >> >>> > queue.highwatermark="1000000" > >> >>> > queue.lowwatermark="750000" > >> >>> > queue.discardmark="499999999" > >> >>> > queue.dequeueslowdown="100" > >> >>> > queue.size="500000000" > >> >>> > queue.saveonshutdown="on" > >> >>> > queue.maxdiskspace="1000g" > >> >>> > queue.dequeuebatchsize="50000" > >> >>> > queue.workerthreads="8" > >> >>> > queue.workerthreadminimummessages="100000" > >> >>> > action.resumeretrycount="-1")stop} > >> >>> > > >> >>> > I'd love some feedback, but these numbers are working pretty well > for > >> >>> these > >> >>> > massive feeds. > >> >>> > > >> >>> > Cheers, > >> >>> > > >> >>> > JB > >> >>> > > >> >>> > On Wed, Nov 4, 2015 at 10:26 AM, Radu Gheorghe < > >> >>> [email protected]> > >> >>> > wrote: > >> >>> > > >> >>> >> On Wed, Nov 4, 2015 at 5:19 PM, Joe Blow <[email protected] > > > >> >>> wrote: > >> >>> >> > Radu - My checkpoint interval is set at 100k. Are you > suggesting > >> >>> this be > >> >>> >> > lowered? raised? > >> >>> >> > >> >>> >> It sounds like the higher the better, but if your problem is on > how > >> >>> >> fast it can read... I think there's not much you can do - that > seems > >> >>> >> to be a setting for writes. Also note David's comment on how it > >> might > >> >>> >> only apply if syncing is enabled. > >> >>> >> > >> >>> >> On the read side I don't know what optimization you can do in the > >> >>> >> conf. Maybe you can test with various file sizes? > >> (queue.maxfilesize - > >> >>> >> the default is 1MB so that might be too small) Though I wouldn't > >> have > >> >>> >> high hopes, it sounds like recovery is much too slow even for > >> reading > >> >>> >> 1MB files. > >> >>> >> > >> >>> >> Best regards, > >> >>> >> Radu > >> >>> >> -- > >> >>> >> Performance Monitoring * Log Analytics * Search Analytics > >> >>> >> Solr & Elasticsearch Support * http://sematext.com/ > >> >>> >> _______________________________________________ > >> >>> >> rsyslog mailing list > >> >>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> >>> >> http://www.rsyslog.com/professional-services/ > >> >>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards > >> >>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > >> myriad > >> >>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST > if > >> you > >> >>> >> DON'T LIKE THAT. > >> >>> >> > >> >>> > _______________________________________________ > >> >>> > rsyslog mailing list > >> >>> > http://lists.adiscon.net/mailman/listinfo/rsyslog > >> >>> > http://www.rsyslog.com/professional-services/ > >> >>> > What's up with rsyslog? Follow https://twitter.com/rgerhards > >> >>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > >> myriad > >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > you > >> >>> DON'T LIKE THAT. > >> >>> _______________________________________________ > >> >>> rsyslog mailing list > >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> >>> http://www.rsyslog.com/professional-services/ > >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards > >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > >> myriad > >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > you > >> >>> DON'T LIKE THAT. > >> >>> > >> >> _______________________________________________ > >> >> rsyslog mailing list > >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> >> http://www.rsyslog.com/professional-services/ > >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards > >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > >> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST > if > >> you DON'T LIKE THAT. > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com/professional-services/ > >> What's up with rsyslog? Follow https://twitter.com/rgerhards > >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > >> DON'T LIKE THAT. > >> > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. >
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

