Re: [rsyslog] Speed up Disk Assisted de-queuing

Rainer Gerhards Wed, 04 Nov 2015 08:28:21 -0800

2015-11-04 17:24 GMT+01:00 Joe Blow <[email protected]>:
> Thanks for the input Rainer!  It definitely helps, and I love hearing some
> of this from the horse's mouth.  Let me start this post by saying i'm
> extremely grateful for all the help that rsyslog has provided me throughout
> my career.  The support on this mailing list is arguably better than any of
> the paid vendors i've used for logging/SIEM.
>
> As much as I'd love to just give up on this, I'm far too confident in the
> rsyslog tool to admit defeat.  Rsyslog is a beast, but a beast with many
> knobs :).  I'm interested in potentially using the failover option, but the
> DA queue configuration ease might keep me using that for the time being.
>
> What about getting creative and moving the files to another rsyslog
> instance (on the same box) that doesn't have any input modules?  Here's my
> thoughts:
>
> Stop rsyslog.
> Move rsyslog DA files and .qi file to another directory which a secondary
> instance of rsyslog knows about (but has no input modules running).
> Start rsyslog with input modules to get the realtime data flowing back in,
> with an empty DA queue.
> Turn on the second rsyslog instance which only knows about the backlog
> files, and has no input modules.
>
> My thought is that this would give at least 1 dedicated worker (per queue)
> 1 full core of resources to chug through the backlog, and only the
> backlog.  Is my logic sound?


not sure. Let's find the bottleneck. Is it i/o or CPU? What hard facts
tell you which one it is (you already commented partly on i/o, this
the more solid questions).

IMO, the disk queue should primarily be i/o intense, and not put a lot
of stress on the cpu. If so, the logic wouldn't work.

Rainer
>
> I've run multiple rsyslog instances on the same box for some other
> 'creative' logging projects i've done previously, without too much issue.
>
> Thoughts?
>
> Cheers,
>
> JB
>
>
> On Wed, Nov 4, 2015 at 11:14 AM, Rainer Gerhards <[email protected]>
> wrote:
>
>> 2015-11-04 17:12 GMT+01:00 Rainer Gerhards <[email protected]>:
>> > 2015-11-04 17:08 GMT+01:00 Joe Blow <[email protected]>:
>> >> I think i've spoken too soon.  The in memory queues are clearing
>> extremely
>> >> well with these settings, but the DA stuff is still pretty sluggish
>> (slowed
>> >> down to 50-100 EPS again).  I've looked at the box and the IO is around
>> 10%
>> >> (12 disk array, which performs quite snappily), so i'm sincerely
>> doubting
>> >> this is an IO issue.
>> >>
>> >> The huge feed in question uses around 4 workers before it has enough
>> >> workers to clear the queue as fast as it comes in (400k avg in the
>> queue).
>> >> From my understanding that means that i've got 4 workers@50k bulk size
>> >> each, and at 200k EPS out (4 workers x 50k EPS) the in memory queue
>> gets no
>> >> bigger.  Now this is where my knowledge ends.  I've set the low
>> watermark
>> >> to 750k and high watermark to 1 million, with the thoughts that the low
>> >> watermark is below having all 8 workers full bore (8x100k) and the high
>> >> watermark is a 250k higher than that (slightly above all workers going
>> full
>> >> bore).  If i'm staying below the low watermark, and still have "free"
>> >> workers, would those workers not try and empty the DA queue?  What would
>> >> help allocate more resources to clearing the DA queue?
>> >
>> > The DA queue always run on one worker, because you can't use more than
>> > one worker with purely sequential files.
>> >
>> > TBH I think your needs simply go above what the current system can
>> > provide. As David said, the queue subsystem could well deserve an
>> > overhaul, but this is a too-big task right now given what else is
>> > going on and there also has been no sponsor for any of that disk queue
>> > work in the past years, so it doesn't seem to have a too high priority
>> > either.
>>
>> mhhh, I should mention a potential work-around: forget the DA queue.
>> Use failover actions. If the action fails, write log lines in native
>> format to a file. Then, use imfile to monitor that file. Together with
>> a smart design of the rulesets, you can probably get all you need out
>> of such a system. Unfortunately, I am even more swamped than usual, so
>> I cannot provide detail advised except than here pointing you to the
>> idea.
>>
>> HTH
>> Rainer
>> >
>> > Rainer
>> >>
>> >> Thanks for the prompt responses.
>> >>
>> >> Cheers,
>> >>
>> >> JB
>> >>
>> >> On Wed, Nov 4, 2015 at 10:53 AM, Rainer Gerhards <
>> [email protected]>
>> >> wrote:
>> >>
>> >>> 2015-11-04 16:44 GMT+01:00 Joe Blow <[email protected]>:
>> >>> > Ok i've played with some numbers.... this is what one of the massive
>> >>> queues
>> >>> > looks like now, and it *IS* dequeuing much faster (500 EPS from DA,
>> 25k
>> >>> EPS
>> >>> > from in memory queue.
>> >>> >
>> >>> This may sound a bit strange, and I never tried it, but .. I wouldn't
>> >>> be surprised if it is actually faster if you put the queue files on a
>> >>> compressed directory. The idea behind that is that while this
>> >>> obviously eats CPU, it will probably save you a lot of real i/o
>> >>> because the data written to the disk queue can be greatly compressed.
>> >>>
>> >>> If you give it a try, please let us know the outcome.
>> >>>
>> >>> Rainer
>> >>>
>> >>> > Hopefully this helps some other people who have very massive, disk
>> backed
>> >>> > queues...  Please feel free to comment on these values.
>> >>> >
>> >>> > action(type="omelasticsearch"
>> >>> >         name="rsys_HugeQ"
>> >>> >         server="10.10.10.10"
>> >>> >         serverport="9200"
>> >>> >         template="HugeQTemplate"
>> >>> >         asyncrepl="on"
>> >>> >         searchType="HugeType"
>> >>> >         searchIndex="HugeQindex"
>> >>> >         timeout="3m"
>> >>> >         dynSearchIndex="on"
>> >>> >         bulkmode="on"
>> >>> >         errorfile="HugeQ_err.log"
>> >>> >         queue.type="linkedlist"
>> >>> >         queue.filename="HugeQ.rsysq"
>> >>> >         queue.maxfilesize="2048m"
>> >>> >         queue.highwatermark="1000000"
>> >>> >         queue.lowwatermark="750000"
>> >>> >         queue.discardmark="499999999"
>> >>> >         queue.dequeueslowdown="100"
>> >>> >         queue.size="500000000"
>> >>> >         queue.saveonshutdown="on"
>> >>> >         queue.maxdiskspace="1000g"
>> >>> >         queue.dequeuebatchsize="50000"
>> >>> >         queue.workerthreads="8"
>> >>> >         queue.workerthreadminimummessages="100000"
>> >>> >         action.resumeretrycount="-1")stop}
>> >>> >
>> >>> > I'd love some feedback, but these numbers are working pretty well for
>> >>> these
>> >>> > massive feeds.
>> >>> >
>> >>> > Cheers,
>> >>> >
>> >>> > JB
>> >>> >
>> >>> > On Wed, Nov 4, 2015 at 10:26 AM, Radu Gheorghe <
>> >>> [email protected]>
>> >>> > wrote:
>> >>> >
>> >>> >> On Wed, Nov 4, 2015 at 5:19 PM, Joe Blow <[email protected]>
>> >>> wrote:
>> >>> >> > Radu - My checkpoint interval is set at 100k.  Are you suggesting
>> >>> this be
>> >>> >> > lowered? raised?
>> >>> >>
>> >>> >> It sounds like the higher the better, but if your problem is on how
>> >>> >> fast it can read... I think there's not much you can do - that seems
>> >>> >> to be a setting for writes. Also note David's comment on how it
>> might
>> >>> >> only apply if syncing is enabled.
>> >>> >>
>> >>> >> On the read side I don't know what optimization you can do in the
>> >>> >> conf. Maybe you can test with various file sizes?
>> (queue.maxfilesize -
>> >>> >> the default is 1MB so that might be too small) Though I wouldn't
>> have
>> >>> >> high hopes, it sounds like recovery is much too slow even for
>> reading
>> >>> >> 1MB files.
>> >>> >>
>> >>> >> Best regards,
>> >>> >> Radu
>> >>> >> --
>> >>> >> Performance Monitoring * Log Analytics * Search Analytics
>> >>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>> >> _______________________________________________
>> >>> >> rsyslog mailing list
>> >>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>> >> http://www.rsyslog.com/professional-services/
>> >>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you
>> >>> >> DON'T LIKE THAT.
>> >>> >>
>> >>> > _______________________________________________
>> >>> > rsyslog mailing list
>> >>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>> > http://www.rsyslog.com/professional-services/
>> >>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >>> DON'T LIKE THAT.
>> >>> _______________________________________________
>> >>> rsyslog mailing list
>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >>> http://www.rsyslog.com/professional-services/
>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >>> DON'T LIKE THAT.
>> >>>
>> >> _______________________________________________
>> >> rsyslog mailing list
>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> http://www.rsyslog.com/professional-services/
>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you DON'T LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Speed up Disk Assisted de-queuing

Reply via email to