Also see http://blog.gerhards.net/2013/07/rsyslog-why-disk-assisted-queues-keep.html?m=1
Sent from phone, thus brief. Am 30.06.2014 21:11 schrieb "David Lang" <[email protected]>: > On Mon, 30 Jun 2014, Doug McClure wrote: > > I *assumed* the file would deplete to zero. I spot checked the first >> message in the cache file and did find it was in my end search tool. It >> must not deplete that last file to near zero file size. >> > > The problem is that you can truncate the end of the file, but you can't > remove anything from the beginning of a file without re-writing the entire > file, so it's just not worth the hassle. > > As far as tweaking the output action queue size, just increasing this to >> handle steady state and "average bursts" sound reasonable? >> >> queue.size="10000000" # how many messages (messages, not >> bytes!) to hold in queue >> > > I think there is a limit to how large you can go, but this is the approach > to take (watch the startup output to see if you get any complaints about > the size) > > David Lang > > tks! >> >> Doug >> >> >> On Mon, Jun 30, 2014 at 2:25 PM, David Lang <[email protected]> wrote: >> >> On Mon, 30 Jun 2014, Doug McClure wrote: >>> >>> I most definitely swamped all four logstash instances during the >>> >>>> processing >>>> of over 100 end points we just activated so designing for >>>> burst/spikes/backlogs is one part I need to do. The other is designing >>>> for >>>> steady state. >>>> >>>> What I'm seeing this AM is the number of disk cache files growing and >>>> then >>>> they are promptly processed down to the last one. When their is only one >>>> cache file left the contents within are not purged down to a file size >>>> of >>>> zero or so. I've read that one file may always exist, but what I'm >>>> seeing >>>> is that one file holds onto messages and they are not flushed out. >>>> >>>> Mon Jun 30 13:08:03 2014: imuxsock: submitted=413 ratelimit.discarded=0 >>>> ratelimit.numratelimiters=275 >>>> Mon Jun 30 13:08:03 2014: dynafile cache LocalFileOutput: >>>> requests=180936 >>>> level0=137528 missed=126 evicted=0 maxused=126 >>>> Mon Jun 30 13:08:03 2014: LocalFileOutput: processed=180936 failed=0 >>>> Mon Jun 30 13:08:03 2014: action 2: processed=3 failed=0 >>>> Mon Jun 30 13:08:03 2014: logstashforwarder: processed=180936 failed=0 >>>> Mon Jun 30 13:08:03 2014: imptcp(*/10514/IPv4): submitted=180520 >>>> Mon Jun 30 13:08:03 2014: imptcp(*/10514/IPv6): submitted=0 >>>> >>>> *Mon Jun 30 13:08:03 2014: logstashforwarder[DA]: size=0 enqueued=14919 >>>> full=0 discarded.full=0 discarded.nf <http://discarded.nf>=0 >>>> maxqsize=14631 >>>> >>>> Mon Jun 30 13:08:03 2014: logstashforwarder: size=0 enqueued=180936 >>>> full=0 >>>> discarded.full=0 discarded.nf <http://discarded.nf>=0 maxqsize=2699* >>>> >>>> Mon Jun 30 13:08:03 2014: main Q: size=0 enqueued=180937 full=0 >>>> discarded.full=0 discarded.nf=0 maxqsize=620 >>>> >>>> cache file: >>>> -rw------- 1 syslog syslog *11M Jun 30 12:21 logstashqueue.00000001* >>>> >>>> >>>> >>>> What settings should I increase to try and prevent triggering the DA >>>> queues? Can that DA file be triggered to flush? >>>> >>>> >>> the only way to prevent triggering the DA queues is to have enough space >>> in the memory queues >>> >>> are you sure the logs in that file have not been sent? rsyslog doesn't >>> re-write the file after each log is sent, it just keeps track of what has >>> been sent and what hasn't. once the file is no longer needed it's >>> deleted. >>> As I understand it, the last file is not deleted in case it's needed >>> again >>> (it's far faster to work with an existing file than to create a new file, >>> so in cases where you are right on the border of using the DA portion of >>> the queue, hitting it then emptying it in rapid succession, deleting and >>> recreating the file would be _very_ expensive) >>> >>> David Lang >>> >>> >>> >>> On Mon, Jun 30, 2014 at 10:53 AM, Aleksandar Lazic <[email protected]> >>>> wrote: >>>> >>>> Dear Doug. >>>> >>>>> >>>>> Am 30-06-2014 15:05, schrieb Doug McClure: >>>>> >>>>> Thanks, I'll play with it this week and see what changes. The main >>>>> flow >>>>> is >>>>> >>>>> >>>>>> rsyslog -> haproxy -> logstash farm (4) -> log index/search product. >>>>>> >>>>>> >>>>>> What's in the haproxy log about the timing and error codes? >>>>> >>>>> https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.2 >>>>> >>>>> How looks your haproxy config? >>>>> Are there any rsyslog errors in the syslog log? >>>>> What's in the logstash logs? >>>>> >>>>> >>>>> I'm very suspect of the end product as I don't yet have much >>>>> visibility >>>>> >>>>> into it's scale/performance at the input side yet. I'm setting up >>>>>> instrumentation all along the way to help 'see' what's happening and >>>>>> where. >>>>>> >>>>>> What I think I'm seeing on the rsyslog side is that the DA files are >>>>>> getting created but they appear to not be processed even though >>>>>> everything >>>>>> downstream (haproxy/logstash) is up and passing messages from rsyslog. >>>>>> I >>>>>> saw it a number of times this weekend where the file is created and >>>>>> holds >>>>>> queue'd up messages from the burst period and never flushes them out. >>>>>> Normal message traffic continues to flow through with no problem so >>>>>> something isn't set up right to flush those DA file content. >>>>>> >>>>>> Any pointers on this? Searching Google returns lots of similar >>>>>> symptoms. >>>>>> >>>>>> Current config: >>>>>> >>>>>> $MaxMessageSize 64k >>>>>> $MainMsgQueueSize 10000000 # how many messages >>>>>> (messages, >>>>>> not bytes!) to hold in memory >>>>>> $MainMsgQueueDequeueBatchSize 1000 >>>>>> $MainMsgQueueWorkerThreads 3 >>>>>> >>>>>> action(type="omfwd" >>>>>> Target="10.x.x.x" # target end point where >>>>>> haproxy >>>>>> listens >>>>>> Port="10515" # port for haproxy listner >>>>>> Protocol="tcp" # use tcp >>>>>> Template="LogFormatDSV" # use the output format >>>>>> above >>>>>> to >>>>>> create a CSV message format >>>>>> queue.filename="logstashqueue" # set file name, also >>>>>> enables >>>>>> disk mode >>>>>> queue.size="10000000" # how many messages >>>>>> (messages, >>>>>> not >>>>>> bytes!) to hold in queue >>>>>> queue.dequeuebatchsize="1000" # number of messages in each >>>>>> batch >>>>>> removed from queues >>>>>> queue.maxdiskspace="25g" # maximum disk space used >>>>>> for >>>>>> the disk assist queue >>>>>> queue.maxfilesize="100m" # maximum file size of >>>>>> disk >>>>>> assist files >>>>>> queue.saveonshutdown="on" # save the queue contents when >>>>>> stopping rsyslog >>>>>> queue.type="LinkedList" # use asynchronous >>>>>> processing >>>>>> queue.workerthreads="5" # threads to process the >>>>>> output >>>>>> action, queues >>>>>> action.resumeRetryCount="-1" # rsyslog retry indef if >>>>>> haproxy >>>>>> down, (builds queue) When haproxy back, it will start inserting from >>>>>> the >>>>>> queue >>>>>> RebindInterval="5" # restablish connections with >>>>>> haproxy after N messages (?) >>>>>> name="logstashforwarder" # name of the action >>>>>> ) >>>>>> >>>>>> Tks! >>>>>> >>>>>> Doug >>>>>> >>>>>> >>>>>> On Sat, Jun 28, 2014 at 8:37 PM, David Lang <[email protected]> wrote: >>>>>> >>>>>> On Sat, 28 Jun 2014, Doug McClure wrote: >>>>>> >>>>>> >>>>>>> I just found the magical setting it appears - rebindinterval! I set >>>>>>> this >>>>>>> >>>>>>> to 10 and have balanced traffic across each logstash instances >>>>>>> behind >>>>>>> >>>>>>>> haproxy. >>>>>>>> >>>>>>>> Any guidance on setting rebindinterval? >>>>>>>> >>>>>>>> >>>>>>>> It depends on the system that's receiving the data. >>>>>>>> >>>>>>> >>>>>>> rebinding is an expensive operation (especially if you end up using >>>>>>> encryption), so you want to do it as little as possible, but you want >>>>>>> to >>>>>>> do >>>>>>> it frequently enough to keep all the receivers busy. >>>>>>> >>>>>>> I like to set it to a level that has you rebinding a few times a >>>>>>> second >>>>>>> on >>>>>>> the theory that the receiving system should be able to accept a >>>>>>> second >>>>>>> or >>>>>>> so worth of traffic ahead of what it can output, so rebinding >>>>>>> infrequtently >>>>>>> will keep both of the receivers busy all the time. >>>>>>> >>>>>>> If you have a lot of receivers, or they have very little queueing, >>>>>>> then >>>>>>> you will need to set it to a smaller value. >>>>>>> >>>>>>> there will always be _some_ queuing available, even if the >>>>>>> application >>>>>>> doesn't intend for there to be, because the TCP stack on the >>>>>>> receiving >>>>>>> machine will ack the data from the network even if the software isn't >>>>>>> ready >>>>>>> for it. you just have to make sure that when the connection is >>>>>>> closed, >>>>>>> the >>>>>>> software processes all the data the machine has received or you will >>>>>>> loose >>>>>>> some data >>>>>>> >>>>>>> I don't know how much queueing logstash will do, you will have to >>>>>>> experiment. >>>>>>> >>>>>>> what is logstash doing with the data? Depending on where the >>>>>>> bottleneck >>>>>>> is >>>>>>> there, that may give us hints as to how much it can do. >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Jun 28, 2014 at 11:00 AM, Doug McClure <[email protected]> >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Thanks for reply David - I've enjoyed reading your notes/prezos on >>>>>>>> >>>>>>>> rsyslog >>>>>>>> >>>>>>>>> recently! >>>>>>>>> >>>>>>>>> I feel it's somewhere downstream as well. I've made these changes >>>>>>>>> to >>>>>>>>> try >>>>>>>>> and improve throughput. >>>>>>>>> >>>>>>>>> I added haproxy between rsyslog and a logstash farm. I have two >>>>>>>>> logstash >>>>>>>>> instances accepting connections from the haproxy frontend. I'm >>>>>>>>> still >>>>>>>>> trying to figure out how to get balanced traffic across each, but I >>>>>>>>> see >>>>>>>>> traffic hitting both at some point. >>>>>>>>> >>>>>>>>> A spike came in again this AM and there were a few thousand DA >>>>>>>>> files >>>>>>>>> created so I'm not sure what else to try here. Is there a way to >>>>>>>>> increase >>>>>>>>> the rsyslog output processing? Can multiple tcp output connections >>>>>>>>> be >>>>>>>>> established to the haproxy input to increase output? Can the >>>>>>>>> processing >>>>>>>>> of >>>>>>>>> DA files be increased/accelerated somehow? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for any pointers! >>>>>>>>> >>>>>>>>> Doug >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 27, 2014 at 3:07 PM, David Lang <[email protected]> wrote: >>>>>>>>> >>>>>>>>> On Fri, 27 Jun 2014, Doug McClure wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm in the tuning & optimization phase for a large rsyslog - >>>>>>>>>> logstash >>>>>>>>>> >>>>>>>>>> deployment and working through very close monitoring of system >>>>>>>>>> >>>>>>>>>> performance, >>>>>>>>>>> logging load and imstats. >>>>>>>>>>> >>>>>>>>>>> This morning, a very large incoming log spike occurred on the >>>>>>>>>>> rsyslog >>>>>>>>>>> imptcp input and was processed. That spike was routed to my omfwd >>>>>>>>>>> action >>>>>>>>>>> to >>>>>>>>>>> be shipped to a logstash tcp input. >>>>>>>>>>> >>>>>>>>>>> What it appears is happening is that very large input spike went >>>>>>>>>>> to >>>>>>>>>>> disk >>>>>>>>>>> assisted queues and has created over 37K files on disk. They are >>>>>>>>>>> being >>>>>>>>>>> processed and shipped and I have a steady 60Kb/s output headed >>>>>>>>>>> towards >>>>>>>>>>> logstash tcp input. >>>>>>>>>>> >>>>>>>>>>> My question - where should I begin to look to optimize the >>>>>>>>>>> processing >>>>>>>>>>> of >>>>>>>>>>> that output action queue or changing configuration to avoid >>>>>>>>>>> queuing >>>>>>>>>>> up >>>>>>>>>>> so >>>>>>>>>>> much in the first place? How would I determine if this is due to >>>>>>>>>>> rsyslog >>>>>>>>>>> setup or on the logstash input side? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is probably logstash not being able to keep up with the >>>>>>>>>>> flood >>>>>>>>>>> >>>>>>>>>>> of >>>>>>>>>> traffic from rsyslog. Then once rsyslog spills to disk, it gets >>>>>>>>>> significantly slower (one of the things that can use a >>>>>>>>>> revision/upgrade >>>>>>>>>> is >>>>>>>>>> the disk queue code) >>>>>>>>>> >>>>>>>>>> you can specify a larger queue to hold more im memory (also, >>>>>>>>>> FixedArray >>>>>>>>>> seems to be a little more efficient thatn LinkedList, but that's >>>>>>>>>> not >>>>>>>>>> your >>>>>>>>>> problem now) >>>>>>>>>> >>>>>>>>>> If logstash can handle more inputs effectively, you can try adding >>>>>>>>>> more >>>>>>>>>> workers, but that can acutally slow things down (the locking >>>>>>>>>> between >>>>>>>>>> threads can sometimes slow things down more than the added cores >>>>>>>>>> working >>>>>>>>>> speed it up) >>>>>>>>>> >>>>>>>>>> But start by looking on the logstash side. >>>>>>>>>> >>>>>>>>>> David Lang >>>>>>>>>> >>>>>>>>>> Looking at imstats in the analyzer for logstash output (omfwd) I >>>>>>>>>> see >>>>>>>>>> >>>>>>>>>> significant processed/failed numbers and then in the logstash >>>>>>>>>> ouput >>>>>>>>>> >>>>>>>>>> -DA- >>>>>>>>>>> it >>>>>>>>>>> looks like things are as described above - large queues, with no >>>>>>>>>>> failures >>>>>>>>>>> being processed gradually. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> rsyslog 7.4.4 output action: >>>>>>>>>>> >>>>>>>>>>> action(type="omfwd" >>>>>>>>>>> Target="10.x.x.x" >>>>>>>>>>> Port="10515" >>>>>>>>>>> Protocol="tcp" >>>>>>>>>>> Template="LogFormatDSV" >>>>>>>>>>> queue.filename="logstashqueue" # set file name, also >>>>>>>>>>> enables >>>>>>>>>>> disk mode >>>>>>>>>>> queue.size="1000000" >>>>>>>>>>> queue.type="LinkedList" # use asynchronous >>>>>>>>>>> processing >>>>>>>>>>> queue.workerthreads="5" >>>>>>>>>>> name="logstashforwarder" >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks for any pointers! >>>>>>>>>>> >>>>>>>>>>> Doug >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>> myriad >>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>>>>> if >>>>>>>>>>> you >>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>> myriad >>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>>> you >>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>> >>>>>>>>> rsyslog mailing list >>>>>>>>> >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>> you >>>>>>>> DON'T LIKE THAT. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>> rsyslog mailing list >>>>>>>> >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>> you >>>>>>> DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>> if you DON'T LIKE THAT. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>> >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>>> _______________________________________________ >>>> >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

