OK, now we might be onto something. I can’t determine exactly which remote machine the client is hitting because it’s going through the F5 so what I did is took a look at the stats on the F5 and picked the busiest remote server. There is an rsyslog thread on there that is hovering at very close to 100%.
Here’s the config from our destination servers. They all share an identical config. http://pastebin.com/35K9gw97 DAN FINN Linux System Administrator Office: 801-746-7580 ext. 5381 Mobile: 801-609-4705 [email protected] Backcountry.com <http://www.backcountry.com/> Competitive Cyclist <http://www.competitivecyclist.com/> RealCyclist.com <http://www.realcyclist.com/> Dogfunk.com <http://www.dogfunk.com/> SteepandCheap.com <http://www.steepandcheap.com/> Chainlove.com <http://www.chainlove.com/> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> On 12/4/13, 10:41 AM, "David Lang" <[email protected]> wrote: >Ok, then the question is how fast is the receiving machine accepting >messages. >unless you have an unusually complex template, you should be able to send >messages very fast. > >But if the receiving machine is not processing messages fast enough there >will >be a buildup. but if all it's doing is writing to local files (and you >aren't >doing a lot of dynamic filename stuff) it's unlikely that it should be >that >slow. > >you could look at what the different threads are doing using top >(remember to >hit 'H' to see the threads) and if one or more threads is maxing out the >CPU, >you can then look at the batching settings. > >But I really don't think the sending machine is the bottleneck, if it was >it >wouldn't be able to write the queue files either. > >David Lang > >On Wed, 4 Dec 2013, Dan Finn wrote: > >> I’m thinking it’s most likely something around #3. :) >> >> I don’t think it’s a network or F5 related problem as far as I can tell. >> For example, right now we have a server that is writing logs to the >>local >> spool. I ran tcpdump and I can see rsyslog talking to the destination >> servers just fine but the spool is slowly growing. According to netstat >> rsyslog is only making 1 TCP connection to the VIP on the F5 and it >>seems >> to be able to pass traffic through that connection. >> >> >> >> DAN FINN >> Linux System Administrator >> >> Office: 801-746-7580 ext. 5381 >> Mobile: 801-609-4705 >> [email protected] >> >> Backcountry.com <http://www.backcountry.com/> >> Competitive Cyclist <http://www.competitivecyclist.com/> >> RealCyclist.com <http://www.realcyclist.com/> >> Dogfunk.com <http://www.dogfunk.com/> >> SteepandCheap.com <http://www.steepandcheap.com/> >> Chainlove.com <http://www.chainlove.com/> >> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> >> >> >> >> >> >> On 12/4/13, 9:31 AM, "Dave Caplinger" <[email protected]> >> wrote: >> >>> Without impstats output it's hard to say for sure, but since your >>>config >>> is so succinct, you are getting a lot of default buffer sizes and >>> watermark parameters. I see you have $ActionResumeRetryCount set to -1 >>> for infinite retries (which is good). Note though that default high >>>and >>> low water marks are 8,000 and 2,000 messages, respectively. So once >>>you >>> get into disk-assisted mode, you won't leave it until the action queue >>> gets all the way down to 2000 messages. The default action queue size >>> will be 10,000 messages, and that's really not very much, especially in >>> an environment that has significant spikes in volume. >>> >>> The other possibilities that come to mind are: >>> >>> 1) that the F5 is correctly sending to an rsyslog server that isn't >>> listening any more for some reason >>> >>> If the receiving side's TCP session gets stuck, or something else goes >>> wrong but the F5 doesn't know it, the hashing algorithm will continue >>>to >>> send traffic to the same (dead) destination. TCP default timeouts are >>>2 >>> minutes; this can seem like an eternity when digging through packet >>> captures. So on the sending side, perhaps it sends a SYN trying to >>>open >>> the session, and then nothing happens for 2 minutes before it tries all >>> over again? >>> >>> 2) perhaps there's something else in the network breaking the TCP >>> session, such as a firewall doing NAT >>> >>> I've seen cases before where the NAT-ing firewall would time-out >>> translated IP addresses after a certain period, breaking long-running >>> sessions. The Cisco PIX/ASA, for example, has both idle >>> address-translation timeouts, as well as total duration timeouts. So >>> even a currently in-use session can still be affected by something like >>> this. >>> >>> 3) maybe there is some odd behavior in v4 of rsyslog pertaining to this >>> situation that has long since been fixed :-) >>> >>> Not pointing fingers; I just don't have a lot of experience with >>>rsyslog >>> that old so I'm just speculating. >>> >>> -- >>> Dave Caplinger, Director of Architecture | 402.361.3063 >>> Solutionary | Relevant . Intelligent . Security >>> >>> On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote: >>> >>>> I’ve done that and I’ve seen 2 things happen during these periods >>>>where >>>> files are being written locally. >>>> >>>> 1) Nothing at all was attempted to be sent to the remote destination. >>>> Using telnet I could make a connection just fine but rsyslog wasn’t >>>>even >>>> attempting to send or talk to the destination server over TCP 514. >>>> Message queue was growing extremely fast. I can’t explain it but on >>>>the >>>> 2nd or 3rd restart it started talking to the remote again and began >>>> flushing out the queue. >>>> 2) lots of traffic is going to the remote over TCP 514. The queue is >>>> slowly growing but growing at a consistent rate. This is the most >>>> common >>>> situation, I’ve only seen situation #1 once. I don’t see any errors >>>>or >>>> retrys or anything like that. >>>> >>>> On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote: >>>> >>>>> On Tue, 3 Dec 2013, Erik Steffl wrote: >>>>> >>>>>> we have sort of similar problem, in our case it's Amazon Elastic >>>>>>Load >>>>>> Balancer (ELB) that somehow causes the connection go "bad" if there >>>>>>is >>>>>> no >>>>>> traffic for 5 min (not sure what the exact time is, 1 minute is ok, >>>>>>5 >>>>>> minutes >>>>>> is not). >>>>>> >>>>>> not sure what going "bad" actually means (still investigating) but >>>>>>the >>>>>> data >>>>>> is not going through, rsyslog sends data but there is no response... >>>>>> it >>>>>> recovers eventually but not sure what exactly triggers the recovery >>>>>> (sending >>>>>> more messages is what triggers it but how exactly is not clear). >>>>>> >>>>>> It's not the same case but maybe you can look into VIP and >>>>>>connections >>>>>> and >>>>>> see what happens there, maybe use strace to see what are the >>>>>>responses >>>>>> when >>>>>> rsyslogd sends data to destination... >>>>> >>>>> or use tcpdump to watch the traffic over the network. >>>>> >>>>> David Lang >>>>> >>>>>> erik >>>>>> >>>>>> On 12/03/2013 01:12 PM, Dan Finn wrote: >>>>>>> I had kind of wondered about that as well but I have a few reasons >>>>>>> that >>>>>>> make it seem like that is not the case. >>>>>>> >>>>>>> The ³central server² is actually a VIP on our F5 load balancer >>>>>>>with 4 >>>>>>> rsyslog destination servers behind it. We have about 200 servers >>>>>>>in >>>>>>> our >>>>>>> environment and during these busy times the only servers that ever >>>>>>> seem to >>>>>>> log locally are the postgres servers. The volume of logs being >>>>>>> written on >>>>>>> these servers is certainly much higher than anywhere else. My >>>>>>>theory >>>>>>> is >>>>>>> that the rsyslog ³client² is not keeping up with the sheer volume >>>>>>>on >>>>>>> these >>>>>>> servers during the busy times but until I can find some concrete >>>>>>>info >>>>>>> that >>>>>>> is just a theory. >>>>>>> >>>>>>> We are looking gat upgrading to v7 but unfortunately that¹s not >>>>>>>going >>>>>>> to >>>>>>> be a quick fix. I was hoping maybe there was an issue in my config >>>>>>> or >>>>>>> something that could be tweaked but it sounds like maybe that is >>>>>>>not >>>>>>> the >>>>>>> case? >>>>>>> >>>>>>> I did capture some debug output while this was happening. >>>>>>> Unfortunately >>>>>>> it was pretty large so I don¹t know if I can share the whole thing >>>>>>> but >>>>>>> is >>>>>>> there anything in particular I would be looking for in there? I >>>>>>>see >>>>>>> that >>>>>>> it says it¹s writing the files locally but I didn¹t see where it >>>>>>>says >>>>>>> why. >>>>>>> >>>>>>> Thanks, >>>>>>> Dan >>>>>>> >>>>>>> On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote: >>>>>>> >>>>>>>> you are sending the logs via TCP, which means that if the system >>>>>>>>you >>>>>>>> are >>>>>>>> sending >>>>>>>> logs to gets backed up, logs will queue on the sending system, >>>>>>>> spilling >>>>>>>> to disk >>>>>>>> as needed. >>>>>>>> >>>>>>>> the bottleneck is probably on the central server, but we have no >>>>>>>> info >>>>>>>> about what >>>>>>>> it's doing. >>>>>>>> >>>>>>>> The go-to tool for diagnosting this sort of problem is the >>>>>>>>impstats >>>>>>>> module, but >>>>>>>> I don't think that existed back in the 4.x days, and tracking down >>>>>>>> the >>>>>>>> bottleneck without it is significantly harder. Is there any way >>>>>>>>you >>>>>>>> can >>>>>>>> upgrade >>>>>>>> to a current version? >>>>>>>> >>>>>>>> David Lang >>>>>>>> >>>>>>>> On Mon, 2 Dec 2013, Dan Finn wrote: >>>>>>>> >>>>>>>>> Date: Mon, 2 Dec 2013 20:53:54 +0000 >>>>>>>>> From: Dan Finn <[email protected]> >>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>> To: "[email protected]" <[email protected]> >>>>>>>>> Subject: [rsyslog] rsyslog frequently queuing to disk when it >>>>>>>>> should >>>>>>>>> be >>>>>>>>> sending over the network >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I¹m trying to get some insight into an issue that we have been >>>>>>>>> seeing >>>>>>>>> quite a bit. We have some postgres servers that are quite >>>>>>>>>verbose. >>>>>>>>> When the servers get busy we have an issue where they queue their >>>>>>>>> logs >>>>>>>>> locally instead of sending over the network however I can¹t find >>>>>>>>> any >>>>>>>>> reason why that would be, at least not from a OS resource >>>>>>>>> standpoint. >>>>>>>>> We are running rsyslog4-4.8.0-1.ius.el5. This is my config from >>>>>>>>> the >>>>>>>>> client that was having issues : http://pastebin.com/n3XpRdMm. >>>>>>>>> >>>>>>>>> I watched it queue about 10k files under /var/spool/rsyslog >>>>>>>>>before >>>>>>>>> I >>>>>>>>> finally had to manually delete them out because disk was filling >>>>>>>>> up. >>>>>>>>> >>>>>>>>> What¹s the best way to get some insight into why this might be >>>>>>>>> happening? Is there a way I can enable some debug logging for >>>>>>>>>the >>>>>>>>> rsyslog process itself? Any settings in our config that could be >>>>>>>>> tweaked? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> DAN FINN >>>>>>>>> >>>>>>>>> Linux System Administrator >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [email protected]<mailto:[email protected]> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Backcountry.com<http://www.backcountry.com/> >>>>>>>>> >>>>>>>>> Competitive Cyclist<http://www.competitivecyclist.com/> >>>>>>>>> >>>>>>>>> RealCyclist.com<http://www.realcyclist.com/> >>>>>>>>> >>>>>>>>> Dogfunk.com<http://www.dogfunk.com/> >>>>>>>>> >>>>>>>>> SteepandCheap.com<http://www.steepandcheap.com/> >>>>>>>>> >>>>>>>>> Chainlove.com<http://www.chainlove.com/> >>>>>>>>> >>>>>>>>> WhiskeyMilitia.com<http://www.whiskeymilitia.com/> >>>>>>>>> _______________________________________________ >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>> POST >>>>>>>>> if you DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad of >>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>>> DON'T >>>>>>> LIKE THAT. >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad of >>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T >>>>>> LIKE THAT. >>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>> if you DON'T LIKE THAT. >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

