I’m thinking it’s most likely something around #3. :) I don’t think it’s a network or F5 related problem as far as I can tell. For example, right now we have a server that is writing logs to the local spool. I ran tcpdump and I can see rsyslog talking to the destination servers just fine but the spool is slowly growing. According to netstat rsyslog is only making 1 TCP connection to the VIP on the F5 and it seems to be able to pass traffic through that connection.
DAN FINN Linux System Administrator Office: 801-746-7580 ext. 5381 Mobile: 801-609-4705 [email protected] Backcountry.com <http://www.backcountry.com/> Competitive Cyclist <http://www.competitivecyclist.com/> RealCyclist.com <http://www.realcyclist.com/> Dogfunk.com <http://www.dogfunk.com/> SteepandCheap.com <http://www.steepandcheap.com/> Chainlove.com <http://www.chainlove.com/> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> On 12/4/13, 9:31 AM, "Dave Caplinger" <[email protected]> wrote: >Without impstats output it's hard to say for sure, but since your config >is so succinct, you are getting a lot of default buffer sizes and >watermark parameters. I see you have $ActionResumeRetryCount set to -1 >for infinite retries (which is good). Note though that default high and >low water marks are 8,000 and 2,000 messages, respectively. So once you >get into disk-assisted mode, you won't leave it until the action queue >gets all the way down to 2000 messages. The default action queue size >will be 10,000 messages, and that's really not very much, especially in >an environment that has significant spikes in volume. > >The other possibilities that come to mind are: > >1) that the F5 is correctly sending to an rsyslog server that isn't >listening any more for some reason > >If the receiving side's TCP session gets stuck, or something else goes >wrong but the F5 doesn't know it, the hashing algorithm will continue to >send traffic to the same (dead) destination. TCP default timeouts are 2 >minutes; this can seem like an eternity when digging through packet >captures. So on the sending side, perhaps it sends a SYN trying to open >the session, and then nothing happens for 2 minutes before it tries all >over again? > >2) perhaps there's something else in the network breaking the TCP >session, such as a firewall doing NAT > >I've seen cases before where the NAT-ing firewall would time-out >translated IP addresses after a certain period, breaking long-running >sessions. The Cisco PIX/ASA, for example, has both idle >address-translation timeouts, as well as total duration timeouts. So >even a currently in-use session can still be affected by something like >this. > >3) maybe there is some odd behavior in v4 of rsyslog pertaining to this >situation that has long since been fixed :-) > >Not pointing fingers; I just don't have a lot of experience with rsyslog >that old so I'm just speculating. > >-- >Dave Caplinger, Director of Architecture | 402.361.3063 >Solutionary | Relevant . Intelligent . Security > >On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote: > >> I’ve done that and I’ve seen 2 things happen during these periods where >> files are being written locally. >> >> 1) Nothing at all was attempted to be sent to the remote destination. >> Using telnet I could make a connection just fine but rsyslog wasn’t even >> attempting to send or talk to the destination server over TCP 514. >> Message queue was growing extremely fast. I can’t explain it but on the >> 2nd or 3rd restart it started talking to the remote again and began >> flushing out the queue. >> 2) lots of traffic is going to the remote over TCP 514. The queue is >> slowly growing but growing at a consistent rate. This is the most >>common >> situation, I’ve only seen situation #1 once. I don’t see any errors or >> retrys or anything like that. >> >> On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote: >> >>> On Tue, 3 Dec 2013, Erik Steffl wrote: >>> >>>> we have sort of similar problem, in our case it's Amazon Elastic Load >>>> Balancer (ELB) that somehow causes the connection go "bad" if there is >>>> no >>>> traffic for 5 min (not sure what the exact time is, 1 minute is ok, 5 >>>> minutes >>>> is not). >>>> >>>> not sure what going "bad" actually means (still investigating) but the >>>> data >>>> is not going through, rsyslog sends data but there is no response... >>>>it >>>> recovers eventually but not sure what exactly triggers the recovery >>>> (sending >>>> more messages is what triggers it but how exactly is not clear). >>>> >>>> It's not the same case but maybe you can look into VIP and connections >>>> and >>>> see what happens there, maybe use strace to see what are the responses >>>> when >>>> rsyslogd sends data to destination... >>> >>> or use tcpdump to watch the traffic over the network. >>> >>> David Lang >>> >>>> erik >>>> >>>> On 12/03/2013 01:12 PM, Dan Finn wrote: >>>>> I had kind of wondered about that as well but I have a few reasons >>>>>that >>>>> make it seem like that is not the case. >>>>> >>>>> The ³central server² is actually a VIP on our F5 load balancer with 4 >>>>> rsyslog destination servers behind it. We have about 200 servers in >>>>> our >>>>> environment and during these busy times the only servers that ever >>>>> seem to >>>>> log locally are the postgres servers. The volume of logs being >>>>> written on >>>>> these servers is certainly much higher than anywhere else. My theory >>>>> is >>>>> that the rsyslog ³client² is not keeping up with the sheer volume on >>>>> these >>>>> servers during the busy times but until I can find some concrete info >>>>> that >>>>> is just a theory. >>>>> >>>>> We are looking gat upgrading to v7 but unfortunately that¹s not going >>>>> to >>>>> be a quick fix. I was hoping maybe there was an issue in my config >>>>>or >>>>> something that could be tweaked but it sounds like maybe that is not >>>>> the >>>>> case? >>>>> >>>>> I did capture some debug output while this was happening. >>>>> Unfortunately >>>>> it was pretty large so I don¹t know if I can share the whole thing >>>>>but >>>>> is >>>>> there anything in particular I would be looking for in there? I see >>>>> that >>>>> it says it¹s writing the files locally but I didn¹t see where it says >>>>> why. >>>>> >>>>> Thanks, >>>>> Dan >>>>> >>>>> On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote: >>>>> >>>>>> you are sending the logs via TCP, which means that if the system you >>>>>> are >>>>>> sending >>>>>> logs to gets backed up, logs will queue on the sending system, >>>>>> spilling >>>>>> to disk >>>>>> as needed. >>>>>> >>>>>> the bottleneck is probably on the central server, but we have no >>>>>>info >>>>>> about what >>>>>> it's doing. >>>>>> >>>>>> The go-to tool for diagnosting this sort of problem is the impstats >>>>>> module, but >>>>>> I don't think that existed back in the 4.x days, and tracking down >>>>>>the >>>>>> bottleneck without it is significantly harder. Is there any way you >>>>>> can >>>>>> upgrade >>>>>> to a current version? >>>>>> >>>>>> David Lang >>>>>> >>>>>> On Mon, 2 Dec 2013, Dan Finn wrote: >>>>>> >>>>>>> Date: Mon, 2 Dec 2013 20:53:54 +0000 >>>>>>> From: Dan Finn <[email protected]> >>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>> To: "[email protected]" <[email protected]> >>>>>>> Subject: [rsyslog] rsyslog frequently queuing to disk when it >>>>>>>should >>>>>>> be >>>>>>> sending over the network >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I¹m trying to get some insight into an issue that we have been >>>>>>>seeing >>>>>>> quite a bit. We have some postgres servers that are quite verbose. >>>>>>> When the servers get busy we have an issue where they queue their >>>>>>> logs >>>>>>> locally instead of sending over the network however I can¹t find >>>>>>>any >>>>>>> reason why that would be, at least not from a OS resource >>>>>>>standpoint. >>>>>>> We are running rsyslog4-4.8.0-1.ius.el5. This is my config from >>>>>>>the >>>>>>> client that was having issues : http://pastebin.com/n3XpRdMm. >>>>>>> >>>>>>> I watched it queue about 10k files under /var/spool/rsyslog before >>>>>>>I >>>>>>> finally had to manually delete them out because disk was filling >>>>>>>up. >>>>>>> >>>>>>> What¹s the best way to get some insight into why this might be >>>>>>> happening? Is there a way I can enable some debug logging for the >>>>>>> rsyslog process itself? Any settings in our config that could be >>>>>>> tweaked? >>>>>>> >>>>>>> Thanks, >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> >>>>>>> DAN FINN >>>>>>> >>>>>>> Linux System Administrator >>>>>>> >>>>>>> >>>>>>> >>>>>>> [email protected]<mailto:[email protected]> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Backcountry.com<http://www.backcountry.com/> >>>>>>> >>>>>>> Competitive Cyclist<http://www.competitivecyclist.com/> >>>>>>> >>>>>>> RealCyclist.com<http://www.realcyclist.com/> >>>>>>> >>>>>>> Dogfunk.com<http://www.dogfunk.com/> >>>>>>> >>>>>>> SteepandCheap.com<http://www.steepandcheap.com/> >>>>>>> >>>>>>> Chainlove.com<http://www.chainlove.com/> >>>>>>> >>>>>>> WhiskeyMilitia.com<http://www.whiskeymilitia.com/> >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>> POST >>>>>>> if you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad of >>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T >>>>> LIKE THAT. >>>>> >>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of >>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T >>>> LIKE THAT. >> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>if you DON'T LIKE THAT. > >_______________________________________________ >rsyslog mailing list >http://lists.adiscon.net/mailman/listinfo/rsyslog >http://www.rsyslog.com/professional-services/ >What's up with rsyslog? Follow https://twitter.com/rgerhards >NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

