Re: [rsyslog] Speed up Disk Assisted de-queuing

Joe Blow Wed, 04 Nov 2015 12:49:34 -0800

So far it's looking like neither... Box is running a bit hot on CPU, but
the disks are pretty quiet.


[image: Inline image 1]

Cheers,

JB

On Wed, Nov 4, 2015 at 11:27 AM, Rainer Gerhards <[email protected]>
wrote:

> 2015-11-04 17:24 GMT+01:00 Joe Blow <[email protected]>:
> > Thanks for the input Rainer!  It definitely helps, and I love hearing
> some
> > of this from the horse's mouth.  Let me start this post by saying i'm
> > extremely grateful for all the help that rsyslog has provided me
> throughout
> > my career.  The support on this mailing list is arguably better than any
> of
> > the paid vendors i've used for logging/SIEM.
> >
> > As much as I'd love to just give up on this, I'm far too confident in the
> > rsyslog tool to admit defeat.  Rsyslog is a beast, but a beast with many
> > knobs :).  I'm interested in potentially using the failover option, but
> the
> > DA queue configuration ease might keep me using that for the time being.
> >
> > What about getting creative and moving the files to another rsyslog
> > instance (on the same box) that doesn't have any input modules?  Here's
> my
> > thoughts:
> >
> > Stop rsyslog.
> > Move rsyslog DA files and .qi file to another directory which a secondary
> > instance of rsyslog knows about (but has no input modules running).
> > Start rsyslog with input modules to get the realtime data flowing back
> in,
> > with an empty DA queue.
> > Turn on the second rsyslog instance which only knows about the backlog
> > files, and has no input modules.
> >
> > My thought is that this would give at least 1 dedicated worker (per
> queue)
> > 1 full core of resources to chug through the backlog, and only the
> > backlog.  Is my logic sound?
>
> not sure. Let's find the bottleneck. Is it i/o or CPU? What hard facts
> tell you which one it is (you already commented partly on i/o, this
> the more solid questions).
>
> IMO, the disk queue should primarily be i/o intense, and not put a lot
> of stress on the cpu. If so, the logic wouldn't work.
>
> Rainer
> >
> > I've run multiple rsyslog instances on the same box for some other
> > 'creative' logging projects i've done previously, without too much issue.
> >
> > Thoughts?
> >
> > Cheers,
> >
> > JB
> >
> >
> > On Wed, Nov 4, 2015 at 11:14 AM, Rainer Gerhards <
> [email protected]>
> > wrote:
> >
> >> 2015-11-04 17:12 GMT+01:00 Rainer Gerhards <[email protected]>:
> >> > 2015-11-04 17:08 GMT+01:00 Joe Blow <[email protected]>:
> >> >> I think i've spoken too soon.  The in memory queues are clearing
> >> extremely
> >> >> well with these settings, but the DA stuff is still pretty sluggish
> >> (slowed
> >> >> down to 50-100 EPS again).  I've looked at the box and the IO is
> around
> >> 10%
> >> >> (12 disk array, which performs quite snappily), so i'm sincerely
> >> doubting
> >> >> this is an IO issue.
> >> >>
> >> >> The huge feed in question uses around 4 workers before it has enough
> >> >> workers to clear the queue as fast as it comes in (400k avg in the
> >> queue).
> >> >> From my understanding that means that i've got 4 workers@50k bulk
> size
> >> >> each, and at 200k EPS out (4 workers x 50k EPS) the in memory queue
> >> gets no
> >> >> bigger.  Now this is where my knowledge ends.  I've set the low
> >> watermark
> >> >> to 750k and high watermark to 1 million, with the thoughts that the
> low
> >> >> watermark is below having all 8 workers full bore (8x100k) and the
> high
> >> >> watermark is a 250k higher than that (slightly above all workers
> going
> >> full
> >> >> bore).  If i'm staying below the low watermark, and still have "free"
> >> >> workers, would those workers not try and empty the DA queue?  What
> would
> >> >> help allocate more resources to clearing the DA queue?
> >> >
> >> > The DA queue always run on one worker, because you can't use more than
> >> > one worker with purely sequential files.
> >> >
> >> > TBH I think your needs simply go above what the current system can
> >> > provide. As David said, the queue subsystem could well deserve an
> >> > overhaul, but this is a too-big task right now given what else is
> >> > going on and there also has been no sponsor for any of that disk queue
> >> > work in the past years, so it doesn't seem to have a too high priority
> >> > either.
> >>
> >> mhhh, I should mention a potential work-around: forget the DA queue.
> >> Use failover actions. If the action fails, write log lines in native
> >> format to a file. Then, use imfile to monitor that file. Together with
> >> a smart design of the rulesets, you can probably get all you need out
> >> of such a system. Unfortunately, I am even more swamped than usual, so
> >> I cannot provide detail advised except than here pointing you to the
> >> idea.
> >>
> >> HTH
> >> Rainer
> >> >
> >> > Rainer
> >> >>
> >> >> Thanks for the prompt responses.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> JB
> >> >>
> >> >> On Wed, Nov 4, 2015 at 10:53 AM, Rainer Gerhards <
> >> [email protected]>
> >> >> wrote:
> >> >>
> >> >>> 2015-11-04 16:44 GMT+01:00 Joe Blow <[email protected]>:
> >> >>> > Ok i've played with some numbers.... this is what one of the
> massive
> >> >>> queues
> >> >>> > looks like now, and it *IS* dequeuing much faster (500 EPS from
> DA,
> >> 25k
> >> >>> EPS
> >> >>> > from in memory queue.
> >> >>> >
> >> >>> This may sound a bit strange, and I never tried it, but .. I
> wouldn't
> >> >>> be surprised if it is actually faster if you put the queue files on
> a
> >> >>> compressed directory. The idea behind that is that while this
> >> >>> obviously eats CPU, it will probably save you a lot of real i/o
> >> >>> because the data written to the disk queue can be greatly
> compressed.
> >> >>>
> >> >>> If you give it a try, please let us know the outcome.
> >> >>>
> >> >>> Rainer
> >> >>>
> >> >>> > Hopefully this helps some other people who have very massive, disk
> >> backed
> >> >>> > queues...  Please feel free to comment on these values.
> >> >>> >
> >> >>> > action(type="omelasticsearch"
> >> >>> >         name="rsys_HugeQ"
> >> >>> >         server="10.10.10.10"
> >> >>> >         serverport="9200"
> >> >>> >         template="HugeQTemplate"
> >> >>> >         asyncrepl="on"
> >> >>> >         searchType="HugeType"
> >> >>> >         searchIndex="HugeQindex"
> >> >>> >         timeout="3m"
> >> >>> >         dynSearchIndex="on"
> >> >>> >         bulkmode="on"
> >> >>> >         errorfile="HugeQ_err.log"
> >> >>> >         queue.type="linkedlist"
> >> >>> >         queue.filename="HugeQ.rsysq"
> >> >>> >         queue.maxfilesize="2048m"
> >> >>> >         queue.highwatermark="1000000"
> >> >>> >         queue.lowwatermark="750000"
> >> >>> >         queue.discardmark="499999999"
> >> >>> >         queue.dequeueslowdown="100"
> >> >>> >         queue.size="500000000"
> >> >>> >         queue.saveonshutdown="on"
> >> >>> >         queue.maxdiskspace="1000g"
> >> >>> >         queue.dequeuebatchsize="50000"
> >> >>> >         queue.workerthreads="8"
> >> >>> >         queue.workerthreadminimummessages="100000"
> >> >>> >         action.resumeretrycount="-1")stop}
> >> >>> >
> >> >>> > I'd love some feedback, but these numbers are working pretty well
> for
> >> >>> these
> >> >>> > massive feeds.
> >> >>> >
> >> >>> > Cheers,
> >> >>> >
> >> >>> > JB
> >> >>> >
> >> >>> > On Wed, Nov 4, 2015 at 10:26 AM, Radu Gheorghe <
> >> >>> [email protected]>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> On Wed, Nov 4, 2015 at 5:19 PM, Joe Blow <[email protected]
> >
> >> >>> wrote:
> >> >>> >> > Radu - My checkpoint interval is set at 100k.  Are you
> suggesting
> >> >>> this be
> >> >>> >> > lowered? raised?
> >> >>> >>
> >> >>> >> It sounds like the higher the better, but if your problem is on
> how
> >> >>> >> fast it can read... I think there's not much you can do - that
> seems
> >> >>> >> to be a setting for writes. Also note David's comment on how it
> >> might
> >> >>> >> only apply if syncing is enabled.
> >> >>> >>
> >> >>> >> On the read side I don't know what optimization you can do in the
> >> >>> >> conf. Maybe you can test with various file sizes?
> >> (queue.maxfilesize -
> >> >>> >> the default is 1MB so that might be too small) Though I wouldn't
> >> have
> >> >>> >> high hopes, it sounds like recovery is much too slow even for
> >> reading
> >> >>> >> 1MB files.
> >> >>> >>
> >> >>> >> Best regards,
> >> >>> >> Radu
> >> >>> >> --
> >> >>> >> Performance Monitoring * Log Analytics * Search Analytics
> >> >>> >> Solr & Elasticsearch Support * http://sematext.com/
> >> >>> >> _______________________________________________
> >> >>> >> rsyslog mailing list
> >> >>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>> >> http://www.rsyslog.com/professional-services/
> >> >>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> >>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >> you
> >> >>> >> DON'T LIKE THAT.
> >> >>> >>
> >> >>> > _______________________________________________
> >> >>> > rsyslog mailing list
> >> >>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>> > http://www.rsyslog.com/professional-services/
> >> >>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> >>> DON'T LIKE THAT.
> >> >>> _______________________________________________
> >> >>> rsyslog mailing list
> >> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >>> http://www.rsyslog.com/professional-services/
> >> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad
> >> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >> >>> DON'T LIKE THAT.
> >> >>>
> >> >> _______________________________________________
> >> >> rsyslog mailing list
> >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> >> http://www.rsyslog.com/professional-services/
> >> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >> you DON'T LIKE THAT.
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Speed up Disk Assisted de-queuing

Reply via email to