OK, now we might be onto something.  I can’t determine exactly which
remote machine the client is hitting because it’s going through the F5 so
what I did is took a look at the stats on the F5 and picked the busiest
remote server.  There is an rsyslog thread on there that is hovering at
very close to 100%.

Here’s the config from our destination servers.  They all share an
identical config.  http://pastebin.com/35K9gw97



DAN FINN
Linux System Administrator
 
Office: 801-746-7580 ext. 5381
Mobile: 801-609-4705
[email protected]
 
Backcountry.com <http://www.backcountry.com/>
Competitive Cyclist <http://www.competitivecyclist.com/>
RealCyclist.com <http://www.realcyclist.com/>
Dogfunk.com <http://www.dogfunk.com/>
SteepandCheap.com <http://www.steepandcheap.com/>
Chainlove.com <http://www.chainlove.com/>
WhiskeyMilitia.com <http://www.whiskeymilitia.com/>





On 12/4/13, 10:41 AM, "David Lang" <[email protected]> wrote:

>Ok, then the question is how fast is the receiving machine accepting
>messages. 
>unless you have an unusually complex template, you should be able to send
>messages very fast.
>
>But if the receiving machine is not processing messages fast enough there
>will 
>be a buildup. but if all it's doing is writing to local files (and you
>aren't 
>doing a lot of dynamic filename stuff) it's unlikely that it should be
>that 
>slow.
>
>you could look at what the different threads are doing using top
>(remember to 
>hit 'H' to see the threads) and if one or more threads is maxing out the
>CPU, 
>you can then look at the batching settings.
>
>But I really don't think the sending machine is the bottleneck, if it was
>it 
>wouldn't be able to write the queue files either.
>
>David Lang
>
>On Wed, 4 Dec 2013, Dan Finn wrote:
>
>> I’m thinking it’s most likely something around #3.  :)
>>
>> I don’t think it’s a network or F5 related problem as far as I can tell.
>> For example, right now we have a server that is writing logs to the
>>local
>> spool.  I ran tcpdump and I can see rsyslog talking to the destination
>> servers just fine but the spool is slowly growing.  According to netstat
>> rsyslog is only making 1 TCP connection to the VIP on the F5 and it
>>seems
>> to be able to pass traffic through that connection.
>>
>>
>>
>> DAN FINN
>> Linux System Administrator
>>
>> Office: 801-746-7580 ext. 5381
>> Mobile: 801-609-4705
>> [email protected]
>>
>> Backcountry.com <http://www.backcountry.com/>
>> Competitive Cyclist <http://www.competitivecyclist.com/>
>> RealCyclist.com <http://www.realcyclist.com/>
>> Dogfunk.com <http://www.dogfunk.com/>
>> SteepandCheap.com <http://www.steepandcheap.com/>
>> Chainlove.com <http://www.chainlove.com/>
>> WhiskeyMilitia.com <http://www.whiskeymilitia.com/>
>>
>>
>>
>>
>>
>> On 12/4/13, 9:31 AM, "Dave Caplinger" <[email protected]>
>> wrote:
>>
>>> Without impstats output it's hard to say for sure, but since your
>>>config
>>> is so succinct, you are getting a lot of default buffer sizes and
>>> watermark parameters.  I see you have $ActionResumeRetryCount set to -1
>>> for infinite retries (which is good).  Note though that default high
>>>and
>>> low water marks are 8,000 and 2,000 messages, respectively.  So once
>>>you
>>> get into disk-assisted mode, you won't leave it until the action queue
>>> gets all the way down to 2000 messages.  The default action queue size
>>> will be 10,000 messages, and that's really not very much, especially in
>>> an environment that has significant spikes in volume.
>>>
>>> The other possibilities that come to mind are:
>>>
>>> 1) that the F5 is correctly sending to an rsyslog server that isn't
>>> listening any more for some reason
>>>
>>> If the receiving side's TCP session gets stuck, or something else goes
>>> wrong but the F5 doesn't know it, the hashing algorithm will continue
>>>to
>>> send traffic to the same (dead) destination.  TCP default timeouts are
>>>2
>>> minutes; this can seem like an eternity when digging through packet
>>> captures.  So on the sending side, perhaps it sends a SYN trying to
>>>open
>>> the session, and then nothing happens for 2 minutes before it tries all
>>> over again?
>>>
>>> 2) perhaps there's something else in the network breaking the TCP
>>> session, such as a firewall doing NAT
>>>
>>> I've seen cases before where the NAT-ing firewall would time-out
>>> translated IP addresses after a certain period, breaking long-running
>>> sessions.  The Cisco PIX/ASA, for example, has both idle
>>> address-translation timeouts, as well as total duration timeouts.  So
>>> even a currently in-use session can still be affected by something like
>>> this.
>>>
>>> 3) maybe there is some odd behavior in v4 of rsyslog pertaining to this
>>> situation that has long since been fixed :-)
>>>
>>> Not pointing fingers; I just don't have a lot of experience with
>>>rsyslog
>>> that old so I'm just speculating.
>>>
>>> --
>>> Dave Caplinger, Director of Architecture  |  402.361.3063
>>> Solutionary  |  Relevant  .  Intelligent  .  Security
>>>
>>> On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote:
>>>
>>>> I’ve done that and I’ve seen 2 things happen during these periods
>>>>where
>>>> files are being written locally.
>>>>
>>>> 1) Nothing at all was attempted to be sent to the remote destination.
>>>> Using telnet I could make a connection just fine but rsyslog wasn’t
>>>>even
>>>> attempting to send or talk to the destination server over TCP 514.
>>>> Message queue was growing extremely fast.  I can’t explain it but on
>>>>the
>>>> 2nd or 3rd restart it started talking to the remote again and began
>>>> flushing out the queue.
>>>> 2) lots of traffic is going to the remote over TCP 514.  The queue is
>>>> slowly growing but growing at a consistent rate.  This is the most
>>>> common
>>>> situation, I’ve only seen situation #1 once.  I don’t see any errors
>>>>or
>>>> retrys or anything like that.
>>>>
>>>> On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote:
>>>>
>>>>> On Tue, 3 Dec 2013, Erik Steffl wrote:
>>>>>
>>>>>> we have sort of similar problem, in our case it's Amazon Elastic
>>>>>>Load
>>>>>> Balancer (ELB) that somehow causes the connection go "bad" if there
>>>>>>is
>>>>>> no
>>>>>> traffic for 5 min (not sure what the exact time is, 1 minute is ok,
>>>>>>5
>>>>>> minutes
>>>>>> is not).
>>>>>>
>>>>>> not sure what going "bad" actually means (still investigating) but
>>>>>>the
>>>>>> data
>>>>>> is not going through, rsyslog sends data but there is no response...
>>>>>> it
>>>>>> recovers eventually but not sure what exactly triggers the recovery
>>>>>> (sending
>>>>>> more messages is what triggers it but how exactly is not clear).
>>>>>>
>>>>>> It's not the same case but maybe you can look into VIP and
>>>>>>connections
>>>>>> and
>>>>>> see what happens there, maybe use strace to see what are the
>>>>>>responses
>>>>>> when
>>>>>> rsyslogd sends data to destination...
>>>>>
>>>>> or use tcpdump to watch the traffic over the network.
>>>>>
>>>>> David Lang
>>>>>
>>>>>>  erik
>>>>>>
>>>>>> On 12/03/2013 01:12 PM, Dan Finn wrote:
>>>>>>> I had kind of wondered about that as well but I have a few reasons
>>>>>>> that
>>>>>>> make it seem like that is not the case.
>>>>>>>
>>>>>>> The ³central server² is actually a VIP on our F5 load balancer
>>>>>>>with 4
>>>>>>> rsyslog destination servers behind it.  We have about 200 servers
>>>>>>>in
>>>>>>> our
>>>>>>> environment and during these busy times the only servers that ever
>>>>>>> seem to
>>>>>>> log locally are the postgres servers.  The volume of logs being
>>>>>>> written on
>>>>>>> these servers is certainly much higher than anywhere else.  My
>>>>>>>theory
>>>>>>> is
>>>>>>> that the rsyslog ³client² is not keeping up with the sheer volume
>>>>>>>on
>>>>>>> these
>>>>>>> servers during the busy times but until I can find some concrete
>>>>>>>info
>>>>>>> that
>>>>>>> is just a theory.
>>>>>>>
>>>>>>> We are looking gat upgrading to v7 but unfortunately that¹s not
>>>>>>>going
>>>>>>> to
>>>>>>> be a quick fix.  I was hoping maybe there was an issue in my config
>>>>>>> or
>>>>>>> something that could be tweaked but it sounds like maybe that is
>>>>>>>not
>>>>>>> the
>>>>>>> case?
>>>>>>>
>>>>>>> I did capture some debug output while this was happening.
>>>>>>> Unfortunately
>>>>>>> it was pretty large so I don¹t know if I can share the whole thing
>>>>>>> but
>>>>>>> is
>>>>>>> there anything in particular I would be looking for in there?  I
>>>>>>>see
>>>>>>> that
>>>>>>> it says it¹s writing the files locally but I didn¹t see where it
>>>>>>>says
>>>>>>> why.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dan
>>>>>>>
>>>>>>> On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote:
>>>>>>>
>>>>>>>> you are sending the logs via TCP, which means that if the system
>>>>>>>>you
>>>>>>>> are
>>>>>>>> sending
>>>>>>>> logs to gets backed up, logs will queue on the sending system,
>>>>>>>> spilling
>>>>>>>> to disk
>>>>>>>> as needed.
>>>>>>>>
>>>>>>>> the bottleneck is probably on the central server, but we have no
>>>>>>>> info
>>>>>>>> about what
>>>>>>>> it's doing.
>>>>>>>>
>>>>>>>> The go-to tool for diagnosting this sort of problem is the
>>>>>>>>impstats
>>>>>>>> module, but
>>>>>>>> I don't think that existed back in the 4.x days, and tracking down
>>>>>>>> the
>>>>>>>> bottleneck without it is significantly harder. Is there any way
>>>>>>>>you
>>>>>>>> can
>>>>>>>> upgrade
>>>>>>>> to a current version?
>>>>>>>>
>>>>>>>> David Lang
>>>>>>>>
>>>>>>>>  On Mon, 2 Dec 2013, Dan Finn wrote:
>>>>>>>>
>>>>>>>>> Date: Mon, 2 Dec 2013 20:53:54 +0000
>>>>>>>>> From: Dan Finn <[email protected]>
>>>>>>>>> Reply-To: rsyslog-users <[email protected]>
>>>>>>>>> To: "[email protected]" <[email protected]>
>>>>>>>>> Subject: [rsyslog] rsyslog frequently queuing to disk when it
>>>>>>>>> should
>>>>>>>>> be
>>>>>>>>>     sending over the network
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I¹m trying to get some insight into an issue that we have been
>>>>>>>>> seeing
>>>>>>>>> quite a bit.  We have some postgres servers that are quite
>>>>>>>>>verbose.
>>>>>>>>> When the servers get busy we have an issue where they queue their
>>>>>>>>> logs
>>>>>>>>> locally instead of sending over the network however I can¹t find
>>>>>>>>> any
>>>>>>>>> reason why that would be, at least not from a OS resource
>>>>>>>>> standpoint.
>>>>>>>>> We are running rsyslog4-4.8.0-1.ius.el5.  This is my config from
>>>>>>>>> the
>>>>>>>>> client that was having issues : http://pastebin.com/n3XpRdMm.
>>>>>>>>>
>>>>>>>>> I watched it queue about 10k files under /var/spool/rsyslog
>>>>>>>>>before
>>>>>>>>> I
>>>>>>>>> finally had to manually delete them out because disk was filling
>>>>>>>>> up.
>>>>>>>>>
>>>>>>>>> What¹s the best way to get some insight into why this might be
>>>>>>>>> happening?  Is there a way I can enable some debug logging for
>>>>>>>>>the
>>>>>>>>> rsyslog process itself?  Any settings in our config that could be
>>>>>>>>> tweaked?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DAN FINN
>>>>>>>>>
>>>>>>>>> Linux System Administrator
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [email protected]<mailto:[email protected]>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Backcountry.com<http://www.backcountry.com/>
>>>>>>>>>
>>>>>>>>> Competitive Cyclist<http://www.competitivecyclist.com/>
>>>>>>>>>
>>>>>>>>> RealCyclist.com<http://www.realcyclist.com/>
>>>>>>>>>
>>>>>>>>> Dogfunk.com<http://www.dogfunk.com/>
>>>>>>>>>
>>>>>>>>> SteepandCheap.com<http://www.steepandcheap.com/>
>>>>>>>>>
>>>>>>>>> Chainlove.com<http://www.chainlove.com/>
>>>>>>>>>
>>>>>>>>> WhiskeyMilitia.com<http://www.whiskeymilitia.com/>
>>>>>>>>> _______________________________________________
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>>>>>> POST
>>>>>>>>> if you DON'T LIKE THAT.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>> myriad of
>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>>> DON'T
>>>>>>> LIKE THAT.
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad of
>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T
>>>>>> LIKE THAT.
>>>>
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>> if you DON'T LIKE THAT.
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to