I *assumed* the file would deplete to zero. I spot checked the first
message in the cache file and did find it was in my end search tool. It
must not deplete that last file to near zero file size.

As far as tweaking the output action queue size, just increasing this to
handle steady state and "average bursts" sound reasonable?

queue.size="10000000"             # how many messages (messages, not
bytes!) to hold in queue

tks!

Doug


On Mon, Jun 30, 2014 at 2:25 PM, David Lang <[email protected]> wrote:

> On Mon, 30 Jun 2014, Doug McClure wrote:
>
>  I most definitely swamped all four logstash instances during the
>> processing
>> of over 100 end points we just activated so designing for
>> burst/spikes/backlogs is one part I need to do. The other is designing for
>> steady state.
>>
>> What I'm seeing this AM is the number of disk cache files growing and then
>> they are promptly processed down to the last one. When their is only one
>> cache file left the contents within are not purged down to a file size of
>> zero or so.  I've read that one file may always exist, but what I'm seeing
>> is that one file holds onto messages and they are not flushed out.
>>
>> Mon Jun 30 13:08:03 2014: imuxsock: submitted=413 ratelimit.discarded=0
>> ratelimit.numratelimiters=275
>> Mon Jun 30 13:08:03 2014: dynafile cache LocalFileOutput: requests=180936
>> level0=137528 missed=126 evicted=0 maxused=126
>> Mon Jun 30 13:08:03 2014: LocalFileOutput: processed=180936 failed=0
>> Mon Jun 30 13:08:03 2014: action 2: processed=3 failed=0
>> Mon Jun 30 13:08:03 2014: logstashforwarder: processed=180936 failed=0
>> Mon Jun 30 13:08:03 2014: imptcp(*/10514/IPv4): submitted=180520
>> Mon Jun 30 13:08:03 2014: imptcp(*/10514/IPv6): submitted=0
>>
>> *Mon Jun 30 13:08:03 2014: logstashforwarder[DA]: size=0 enqueued=14919
>> full=0 discarded.full=0 discarded.nf <http://discarded.nf>=0
>> maxqsize=14631
>>
>> Mon Jun 30 13:08:03 2014: logstashforwarder: size=0 enqueued=180936 full=0
>> discarded.full=0 discarded.nf <http://discarded.nf>=0 maxqsize=2699*
>>
>> Mon Jun 30 13:08:03 2014: main Q: size=0 enqueued=180937 full=0
>> discarded.full=0 discarded.nf=0 maxqsize=620
>>
>> cache file:
>> -rw------- 1 syslog syslog  *11M Jun 30 12:21 logstashqueue.00000001*
>>
>>
>>
>> What settings should I increase to try and prevent triggering the DA
>> queues?  Can that DA file be triggered to flush?
>>
>
> the only way to prevent triggering the DA queues is to have enough space
> in the memory queues
>
> are you sure the logs in that file have not been sent? rsyslog doesn't
> re-write the file after each log is sent, it just keeps track of what has
> been sent and what hasn't. once the file is no longer needed it's deleted.
> As I understand it, the last file is not deleted in case it's needed again
> (it's far faster to work with an existing file than to create a new file,
> so in cases where you are right on the border of using the DA portion of
> the queue, hitting it then emptying it in rapid succession, deleting and
> recreating the file would be _very_ expensive)
>
> David Lang
>
>
>
>> On Mon, Jun 30, 2014 at 10:53 AM, Aleksandar Lazic <[email protected]>
>> wrote:
>>
>>  Dear Doug.
>>>
>>> Am 30-06-2014 15:05, schrieb Doug McClure:
>>>
>>>  Thanks, I'll play with it this week and see what changes. The main flow
>>> is
>>>
>>>>
>>>> rsyslog -> haproxy -> logstash farm (4) -> log index/search product.
>>>>
>>>>
>>> What's in the haproxy log about the timing and error codes?
>>>
>>> https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.2
>>>
>>> How looks your haproxy config?
>>> Are there any rsyslog errors in the syslog log?
>>> What's in the logstash  logs?
>>>
>>>
>>>  I'm very suspect of the end product as I don't yet have much visibility
>>>
>>>> into it's scale/performance at the input side yet.  I'm setting up
>>>> instrumentation all along the way to help 'see' what's happening and
>>>> where.
>>>>
>>>> What I think I'm seeing on the rsyslog side is that the DA files are
>>>> getting created but they appear to not be processed even though
>>>> everything
>>>> downstream (haproxy/logstash) is up and passing messages from rsyslog.
>>>>  I
>>>> saw it a number of times this weekend where the file is created and
>>>> holds
>>>> queue'd up messages from the burst period and never flushes them out.
>>>> Normal message traffic continues to flow through with no problem so
>>>> something isn't set up right to flush those DA file content.
>>>>
>>>> Any pointers on this?  Searching Google returns lots of similar
>>>> symptoms.
>>>>
>>>> Current config:
>>>>
>>>> $MaxMessageSize 64k
>>>> $MainMsgQueueSize 10000000                # how many messages (messages,
>>>> not bytes!) to hold in memory
>>>> $MainMsgQueueDequeueBatchSize 1000
>>>> $MainMsgQueueWorkerThreads 3
>>>>
>>>> action(type="omfwd"
>>>>        Target="10.x.x.x"                # target end point where haproxy
>>>> listens
>>>>        Port="10515"                        # port for haproxy listner
>>>>        Protocol="tcp"                    # use tcp
>>>>        Template="LogFormatDSV"            # use the output format above
>>>> to
>>>> create a CSV message format
>>>>        queue.filename="logstashqueue"     # set file name, also enables
>>>> disk mode
>>>>        queue.size="10000000"             # how many messages (messages,
>>>> not
>>>> bytes!) to hold in queue
>>>>        queue.dequeuebatchsize="1000"    # number of messages in each
>>>> batch
>>>> removed from queues
>>>>        queue.maxdiskspace="25g"            # maximum disk space used for
>>>> the disk assist queue
>>>>        queue.maxfilesize="100m"               # maximum file size of
>>>> disk
>>>> assist files
>>>>        queue.saveonshutdown="on"        # save the queue contents when
>>>> stopping rsyslog
>>>>        queue.type="LinkedList"            # use asynchronous processing
>>>>        queue.workerthreads="5"            # threads to process the
>>>> output
>>>> action, queues
>>>>        action.resumeRetryCount="-1"        # rsyslog retry indef if
>>>> haproxy
>>>> down, (builds queue) When haproxy back, it will start inserting from the
>>>> queue
>>>>        RebindInterval="5"                # restablish connections with
>>>> haproxy after N messages (?)
>>>>        name="logstashforwarder"            # name of the action
>>>>        )
>>>>
>>>> Tks!
>>>>
>>>> Doug
>>>>
>>>>
>>>> On Sat, Jun 28, 2014 at 8:37 PM, David Lang <[email protected]> wrote:
>>>>
>>>>  On Sat, 28 Jun 2014, Doug McClure wrote:
>>>>
>>>>>
>>>>>  I just found the magical setting it appears - rebindinterval!  I set
>>>>> this
>>>>>
>>>>>  to 10 and have balanced traffic across each logstash instances behind
>>>>>> haproxy.
>>>>>>
>>>>>> Any guidance on setting rebindinterval?
>>>>>>
>>>>>>
>>>>>>  It depends on the system that's receiving the data.
>>>>>
>>>>> rebinding is an expensive operation (especially if you end up using
>>>>> encryption), so you want to do it as little as possible, but you want
>>>>> to
>>>>> do
>>>>> it frequently enough to keep all the receivers busy.
>>>>>
>>>>> I like to set it to a level that has you rebinding a few times a second
>>>>> on
>>>>> the theory that the receiving system should be able to accept a second
>>>>> or
>>>>> so worth of traffic ahead of what it can output, so rebinding
>>>>> infrequtently
>>>>> will keep both of the receivers busy all the time.
>>>>>
>>>>> If you have a lot of receivers, or they have very little queueing, then
>>>>> you will need to set it to a smaller value.
>>>>>
>>>>> there will always be _some_ queuing available, even if the application
>>>>> doesn't intend for there to be, because the TCP stack on the receiving
>>>>> machine will ack the data from the network even if the software isn't
>>>>> ready
>>>>> for it. you just have to make sure that when the connection is closed,
>>>>> the
>>>>> software processes all the data the machine has received or you will
>>>>> loose
>>>>> some data
>>>>>
>>>>> I don't know how much queueing logstash will do, you will have to
>>>>> experiment.
>>>>>
>>>>> what is logstash doing with the data? Depending on where the bottleneck
>>>>> is
>>>>> there, that may give us hints as to how much it can do.
>>>>>
>>>>> David Lang
>>>>>
>>>>>
>>>>>
>>>>>  On Sat, Jun 28, 2014 at 11:00 AM, Doug McClure <[email protected]>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  Thanks for reply David - I've enjoyed reading your notes/prezos on
>>>>>>
>>>>>>  rsyslog
>>>>>>> recently!
>>>>>>>
>>>>>>> I feel it's somewhere downstream as well.  I've made these changes to
>>>>>>> try
>>>>>>> and improve throughput.
>>>>>>>
>>>>>>> I added haproxy between rsyslog and a logstash farm. I have two
>>>>>>> logstash
>>>>>>> instances accepting connections from the haproxy frontend.  I'm still
>>>>>>> trying to figure out how to get balanced traffic across each, but I
>>>>>>> see
>>>>>>> traffic hitting both at some point.
>>>>>>>
>>>>>>> A spike came in again this AM and there were a few thousand DA files
>>>>>>> created so I'm not sure what else to try here. Is there a way to
>>>>>>> increase
>>>>>>> the rsyslog output processing? Can multiple tcp output connections be
>>>>>>> established to the haproxy input to increase output? Can the
>>>>>>> processing
>>>>>>> of
>>>>>>> DA files be increased/accelerated somehow?
>>>>>>>
>>>>>>>
>>>>>>> Thanks for any pointers!
>>>>>>>
>>>>>>> Doug
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 27, 2014 at 3:07 PM, David Lang <[email protected]> wrote:
>>>>>>>
>>>>>>>  On Fri, 27 Jun 2014, Doug McClure wrote:
>>>>>>>
>>>>>>>
>>>>>>>>  I'm in the tuning & optimization phase for a large rsyslog -
>>>>>>>> logstash
>>>>>>>>
>>>>>>>>  deployment and working through very close monitoring of system
>>>>>>>>
>>>>>>>>> performance,
>>>>>>>>> logging load and imstats.
>>>>>>>>>
>>>>>>>>> This morning, a very large incoming log spike occurred on the
>>>>>>>>> rsyslog
>>>>>>>>> imptcp input and was processed. That spike was routed to my omfwd
>>>>>>>>> action
>>>>>>>>> to
>>>>>>>>> be shipped to a logstash tcp input.
>>>>>>>>>
>>>>>>>>> What it appears is happening is that very large input spike went to
>>>>>>>>> disk
>>>>>>>>> assisted queues and has created over 37K files on disk.  They are
>>>>>>>>> being
>>>>>>>>> processed and shipped and I have a steady 60Kb/s output headed
>>>>>>>>> towards
>>>>>>>>> logstash tcp input.
>>>>>>>>>
>>>>>>>>> My question - where should I begin to look to optimize the
>>>>>>>>> processing
>>>>>>>>> of
>>>>>>>>> that output action queue or changing configuration to avoid queuing
>>>>>>>>> up
>>>>>>>>> so
>>>>>>>>> much in the first place?  How would I determine if this is due to
>>>>>>>>> rsyslog
>>>>>>>>> setup or on the logstash input side?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  This is probably logstash not being able to keep up with the flood
>>>>>>>>>
>>>>>>>> of
>>>>>>>> traffic from rsyslog. Then once rsyslog spills to disk, it gets
>>>>>>>> significantly slower (one of the things that can use a
>>>>>>>> revision/upgrade
>>>>>>>> is
>>>>>>>> the disk queue code)
>>>>>>>>
>>>>>>>> you can specify a larger queue to hold more im memory (also,
>>>>>>>> FixedArray
>>>>>>>> seems to be a little more efficient thatn LinkedList, but that's not
>>>>>>>> your
>>>>>>>> problem now)
>>>>>>>>
>>>>>>>> If logstash can handle more inputs effectively, you can try adding
>>>>>>>> more
>>>>>>>> workers, but that can acutally slow things down (the locking between
>>>>>>>> threads can sometimes slow things down more than the added cores
>>>>>>>> working
>>>>>>>> speed it up)
>>>>>>>>
>>>>>>>> But start by looking on the logstash side.
>>>>>>>>
>>>>>>>> David Lang
>>>>>>>>
>>>>>>>>  Looking at imstats in the analyzer for logstash output (omfwd) I
>>>>>>>> see
>>>>>>>>
>>>>>>>>  significant processed/failed numbers and then in the logstash ouput
>>>>>>>>
>>>>>>>>> -DA-
>>>>>>>>> it
>>>>>>>>> looks like things are as described above - large queues, with no
>>>>>>>>> failures
>>>>>>>>> being processed gradually.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rsyslog 7.4.4 output action:
>>>>>>>>>
>>>>>>>>> action(type="omfwd"
>>>>>>>>>       Target="10.x.x.x"
>>>>>>>>>       Port="10515"
>>>>>>>>>       Protocol="tcp"
>>>>>>>>>       Template="LogFormatDSV"
>>>>>>>>>       queue.filename="logstashqueue"     # set file name, also
>>>>>>>>> enables
>>>>>>>>> disk mode
>>>>>>>>>       queue.size="1000000"
>>>>>>>>>       queue.type="LinkedList"            # use asynchronous
>>>>>>>>> processing
>>>>>>>>>       queue.workerthreads="5"
>>>>>>>>>       name="logstashforwarder"
>>>>>>>>>       )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for any pointers!
>>>>>>>>>
>>>>>>>>> Doug
>>>>>>>>> _______________________________________________
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>> myriad
>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>> you
>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>>  _______________________________________________
>>>>>>>>>
>>>>>>>>>  rsyslog mailing list
>>>>>>>>>
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>> myriad
>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>> you
>>>>>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   _______________________________________________
>>>>>>>
>>>>>>>  rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>>  _______________________________________________
>>>>>>
>>>>>>  rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>>  _______________________________________________
>>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>> if you DON'T LIKE THAT.
>>>>
>>>>
>>>  _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>  _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to