Thanks for the input Rainer!  It definitely helps, and I love hearing some
of this from the horse's mouth.  Let me start this post by saying i'm
extremely grateful for all the help that rsyslog has provided me throughout
my career.  The support on this mailing list is arguably better than any of
the paid vendors i've used for logging/SIEM.

As much as I'd love to just give up on this, I'm far too confident in the
rsyslog tool to admit defeat.  Rsyslog is a beast, but a beast with many
knobs :).  I'm interested in potentially using the failover option, but the
DA queue configuration ease might keep me using that for the time being.

What about getting creative and moving the files to another rsyslog
instance (on the same box) that doesn't have any input modules?  Here's my
thoughts:

Stop rsyslog.
Move rsyslog DA files and .qi file to another directory which a secondary
instance of rsyslog knows about (but has no input modules running).
Start rsyslog with input modules to get the realtime data flowing back in,
with an empty DA queue.
Turn on the second rsyslog instance which only knows about the backlog
files, and has no input modules.

My thought is that this would give at least 1 dedicated worker (per queue)
1 full core of resources to chug through the backlog, and only the
backlog.  Is my logic sound?

I've run multiple rsyslog instances on the same box for some other
'creative' logging projects i've done previously, without too much issue.

Thoughts?

Cheers,

JB


On Wed, Nov 4, 2015 at 11:14 AM, Rainer Gerhards <[email protected]>
wrote:

> 2015-11-04 17:12 GMT+01:00 Rainer Gerhards <[email protected]>:
> > 2015-11-04 17:08 GMT+01:00 Joe Blow <[email protected]>:
> >> I think i've spoken too soon.  The in memory queues are clearing
> extremely
> >> well with these settings, but the DA stuff is still pretty sluggish
> (slowed
> >> down to 50-100 EPS again).  I've looked at the box and the IO is around
> 10%
> >> (12 disk array, which performs quite snappily), so i'm sincerely
> doubting
> >> this is an IO issue.
> >>
> >> The huge feed in question uses around 4 workers before it has enough
> >> workers to clear the queue as fast as it comes in (400k avg in the
> queue).
> >> From my understanding that means that i've got 4 workers@50k bulk size
> >> each, and at 200k EPS out (4 workers x 50k EPS) the in memory queue
> gets no
> >> bigger.  Now this is where my knowledge ends.  I've set the low
> watermark
> >> to 750k and high watermark to 1 million, with the thoughts that the low
> >> watermark is below having all 8 workers full bore (8x100k) and the high
> >> watermark is a 250k higher than that (slightly above all workers going
> full
> >> bore).  If i'm staying below the low watermark, and still have "free"
> >> workers, would those workers not try and empty the DA queue?  What would
> >> help allocate more resources to clearing the DA queue?
> >
> > The DA queue always run on one worker, because you can't use more than
> > one worker with purely sequential files.
> >
> > TBH I think your needs simply go above what the current system can
> > provide. As David said, the queue subsystem could well deserve an
> > overhaul, but this is a too-big task right now given what else is
> > going on and there also has been no sponsor for any of that disk queue
> > work in the past years, so it doesn't seem to have a too high priority
> > either.
>
> mhhh, I should mention a potential work-around: forget the DA queue.
> Use failover actions. If the action fails, write log lines in native
> format to a file. Then, use imfile to monitor that file. Together with
> a smart design of the rulesets, you can probably get all you need out
> of such a system. Unfortunately, I am even more swamped than usual, so
> I cannot provide detail advised except than here pointing you to the
> idea.
>
> HTH
> Rainer
> >
> > Rainer
> >>
> >> Thanks for the prompt responses.
> >>
> >> Cheers,
> >>
> >> JB
> >>
> >> On Wed, Nov 4, 2015 at 10:53 AM, Rainer Gerhards <
> [email protected]>
> >> wrote:
> >>
> >>> 2015-11-04 16:44 GMT+01:00 Joe Blow <[email protected]>:
> >>> > Ok i've played with some numbers.... this is what one of the massive
> >>> queues
> >>> > looks like now, and it *IS* dequeuing much faster (500 EPS from DA,
> 25k
> >>> EPS
> >>> > from in memory queue.
> >>> >
> >>> This may sound a bit strange, and I never tried it, but .. I wouldn't
> >>> be surprised if it is actually faster if you put the queue files on a
> >>> compressed directory. The idea behind that is that while this
> >>> obviously eats CPU, it will probably save you a lot of real i/o
> >>> because the data written to the disk queue can be greatly compressed.
> >>>
> >>> If you give it a try, please let us know the outcome.
> >>>
> >>> Rainer
> >>>
> >>> > Hopefully this helps some other people who have very massive, disk
> backed
> >>> > queues...  Please feel free to comment on these values.
> >>> >
> >>> > action(type="omelasticsearch"
> >>> >         name="rsys_HugeQ"
> >>> >         server="10.10.10.10"
> >>> >         serverport="9200"
> >>> >         template="HugeQTemplate"
> >>> >         asyncrepl="on"
> >>> >         searchType="HugeType"
> >>> >         searchIndex="HugeQindex"
> >>> >         timeout="3m"
> >>> >         dynSearchIndex="on"
> >>> >         bulkmode="on"
> >>> >         errorfile="HugeQ_err.log"
> >>> >         queue.type="linkedlist"
> >>> >         queue.filename="HugeQ.rsysq"
> >>> >         queue.maxfilesize="2048m"
> >>> >         queue.highwatermark="1000000"
> >>> >         queue.lowwatermark="750000"
> >>> >         queue.discardmark="499999999"
> >>> >         queue.dequeueslowdown="100"
> >>> >         queue.size="500000000"
> >>> >         queue.saveonshutdown="on"
> >>> >         queue.maxdiskspace="1000g"
> >>> >         queue.dequeuebatchsize="50000"
> >>> >         queue.workerthreads="8"
> >>> >         queue.workerthreadminimummessages="100000"
> >>> >         action.resumeretrycount="-1")stop}
> >>> >
> >>> > I'd love some feedback, but these numbers are working pretty well for
> >>> these
> >>> > massive feeds.
> >>> >
> >>> > Cheers,
> >>> >
> >>> > JB
> >>> >
> >>> > On Wed, Nov 4, 2015 at 10:26 AM, Radu Gheorghe <
> >>> [email protected]>
> >>> > wrote:
> >>> >
> >>> >> On Wed, Nov 4, 2015 at 5:19 PM, Joe Blow <[email protected]>
> >>> wrote:
> >>> >> > Radu - My checkpoint interval is set at 100k.  Are you suggesting
> >>> this be
> >>> >> > lowered? raised?
> >>> >>
> >>> >> It sounds like the higher the better, but if your problem is on how
> >>> >> fast it can read... I think there's not much you can do - that seems
> >>> >> to be a setting for writes. Also note David's comment on how it
> might
> >>> >> only apply if syncing is enabled.
> >>> >>
> >>> >> On the read side I don't know what optimization you can do in the
> >>> >> conf. Maybe you can test with various file sizes?
> (queue.maxfilesize -
> >>> >> the default is 1MB so that might be too small) Though I wouldn't
> have
> >>> >> high hopes, it sounds like recovery is much too slow even for
> reading
> >>> >> 1MB files.
> >>> >>
> >>> >> Best regards,
> >>> >> Radu
> >>> >> --
> >>> >> Performance Monitoring * Log Analytics * Search Analytics
> >>> >> Solr & Elasticsearch Support * http://sematext.com/
> >>> >> _______________________________________________
> >>> >> rsyslog mailing list
> >>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> >> http://www.rsyslog.com/professional-services/
> >>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >>> >> DON'T LIKE THAT.
> >>> >>
> >>> > _______________________________________________
> >>> > rsyslog mailing list
> >>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> > http://www.rsyslog.com/professional-services/
> >>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>> DON'T LIKE THAT.
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com/professional-services/
> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>> DON'T LIKE THAT.
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to