If I upgrade to v7 on the central servers can I reuse those configs?
DAN FINN Linux System Administrator Office: 801-746-7580 ext. 5381 Mobile: 801-609-4705 [email protected] Backcountry.com <http://www.backcountry.com/> Competitive Cyclist <http://www.competitivecyclist.com/> RealCyclist.com <http://www.realcyclist.com/> Dogfunk.com <http://www.dogfunk.com/> SteepandCheap.com <http://www.steepandcheap.com/> Chainlove.com <http://www.chainlove.com/> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> On 12/4/13, 10:56 AM, "David Lang" <[email protected]> wrote: >with a quick glance at things > >you are doing a lot of dynamic filename templates, since you do not >change the >default dynafile cache size (and I don't know if you can on that ancient >a >version), rsyslog is spending a LOT of time syncing, closing, and opening >files. > >Also, you are extensively using the if..then style filters, those are >much >slower than other filters on versions prior to 7.x > >So it's probably the case that if you upgrade your central servers to a >current >version, and set a large enough DynaFileCacheSize your performance >problems will >disappear. > >David Lang > >On Wed, 4 Dec 2013, Dan Finn wrote: > >> OK, now we might be onto something. I can’t determine exactly which >> remote machine the client is hitting because it’s going through the F5 >>so >> what I did is took a look at the stats on the F5 and picked the busiest >> remote server. There is an rsyslog thread on there that is hovering at >> very close to 100%. >> >> Here’s the config from our destination servers. They all share an >> identical config. http://pastebin.com/35K9gw97 >> >> >> >> DAN FINN >> Linux System Administrator >> >> Office: 801-746-7580 ext. 5381 >> Mobile: 801-609-4705 >> [email protected] >> >> Backcountry.com <http://www.backcountry.com/> >> Competitive Cyclist <http://www.competitivecyclist.com/> >> RealCyclist.com <http://www.realcyclist.com/> >> Dogfunk.com <http://www.dogfunk.com/> >> SteepandCheap.com <http://www.steepandcheap.com/> >> Chainlove.com <http://www.chainlove.com/> >> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> >> >> >> >> >> >> On 12/4/13, 10:41 AM, "David Lang" <[email protected]> wrote: >> >>> Ok, then the question is how fast is the receiving machine accepting >>> messages. >>> unless you have an unusually complex template, you should be able to >>>send >>> messages very fast. >>> >>> But if the receiving machine is not processing messages fast enough >>>there >>> will >>> be a buildup. but if all it's doing is writing to local files (and you >>> aren't >>> doing a lot of dynamic filename stuff) it's unlikely that it should be >>> that >>> slow. >>> >>> you could look at what the different threads are doing using top >>> (remember to >>> hit 'H' to see the threads) and if one or more threads is maxing out >>>the >>> CPU, >>> you can then look at the batching settings. >>> >>> But I really don't think the sending machine is the bottleneck, if it >>>was >>> it >>> wouldn't be able to write the queue files either. >>> >>> David Lang >>> >>> On Wed, 4 Dec 2013, Dan Finn wrote: >>> >>>> I’m thinking it’s most likely something around #3. :) >>>> >>>> I don’t think it’s a network or F5 related problem as far as I can >>>>tell. >>>> For example, right now we have a server that is writing logs to the >>>> local >>>> spool. I ran tcpdump and I can see rsyslog talking to the destination >>>> servers just fine but the spool is slowly growing. According to >>>>netstat >>>> rsyslog is only making 1 TCP connection to the VIP on the F5 and it >>>> seems >>>> to be able to pass traffic through that connection. >>>> >>>> >>>> >>>> DAN FINN >>>> Linux System Administrator >>>> >>>> Office: 801-746-7580 ext. 5381 >>>> Mobile: 801-609-4705 >>>> [email protected] >>>> >>>> Backcountry.com <http://www.backcountry.com/> >>>> Competitive Cyclist <http://www.competitivecyclist.com/> >>>> RealCyclist.com <http://www.realcyclist.com/> >>>> Dogfunk.com <http://www.dogfunk.com/> >>>> SteepandCheap.com <http://www.steepandcheap.com/> >>>> Chainlove.com <http://www.chainlove.com/> >>>> WhiskeyMilitia.com <http://www.whiskeymilitia.com/> >>>> >>>> >>>> >>>> >>>> >>>> On 12/4/13, 9:31 AM, "Dave Caplinger" <[email protected]> >>>> wrote: >>>> >>>>> Without impstats output it's hard to say for sure, but since your >>>>> config >>>>> is so succinct, you are getting a lot of default buffer sizes and >>>>> watermark parameters. I see you have $ActionResumeRetryCount set to >>>>>-1 >>>>> for infinite retries (which is good). Note though that default high >>>>> and >>>>> low water marks are 8,000 and 2,000 messages, respectively. So once >>>>> you >>>>> get into disk-assisted mode, you won't leave it until the action >>>>>queue >>>>> gets all the way down to 2000 messages. The default action queue >>>>>size >>>>> will be 10,000 messages, and that's really not very much, especially >>>>>in >>>>> an environment that has significant spikes in volume. >>>>> >>>>> The other possibilities that come to mind are: >>>>> >>>>> 1) that the F5 is correctly sending to an rsyslog server that isn't >>>>> listening any more for some reason >>>>> >>>>> If the receiving side's TCP session gets stuck, or something else >>>>>goes >>>>> wrong but the F5 doesn't know it, the hashing algorithm will continue >>>>> to >>>>> send traffic to the same (dead) destination. TCP default timeouts >>>>>are >>>>> 2 >>>>> minutes; this can seem like an eternity when digging through packet >>>>> captures. So on the sending side, perhaps it sends a SYN trying to >>>>> open >>>>> the session, and then nothing happens for 2 minutes before it tries >>>>>all >>>>> over again? >>>>> >>>>> 2) perhaps there's something else in the network breaking the TCP >>>>> session, such as a firewall doing NAT >>>>> >>>>> I've seen cases before where the NAT-ing firewall would time-out >>>>> translated IP addresses after a certain period, breaking long-running >>>>> sessions. The Cisco PIX/ASA, for example, has both idle >>>>> address-translation timeouts, as well as total duration timeouts. So >>>>> even a currently in-use session can still be affected by something >>>>>like >>>>> this. >>>>> >>>>> 3) maybe there is some odd behavior in v4 of rsyslog pertaining to >>>>>this >>>>> situation that has long since been fixed :-) >>>>> >>>>> Not pointing fingers; I just don't have a lot of experience with >>>>> rsyslog >>>>> that old so I'm just speculating. >>>>> >>>>> -- >>>>> Dave Caplinger, Director of Architecture | 402.361.3063 >>>>> Solutionary | Relevant . Intelligent . Security >>>>> >>>>> On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote: >>>>> >>>>>> I’ve done that and I’ve seen 2 things happen during these periods >>>>>> where >>>>>> files are being written locally. >>>>>> >>>>>> 1) Nothing at all was attempted to be sent to the remote >>>>>>destination. >>>>>> Using telnet I could make a connection just fine but rsyslog wasn’t >>>>>> even >>>>>> attempting to send or talk to the destination server over TCP 514. >>>>>> Message queue was growing extremely fast. I can’t explain it but on >>>>>> the >>>>>> 2nd or 3rd restart it started talking to the remote again and began >>>>>> flushing out the queue. >>>>>> 2) lots of traffic is going to the remote over TCP 514. The queue >>>>>>is >>>>>> slowly growing but growing at a consistent rate. This is the most >>>>>> common >>>>>> situation, I’ve only seen situation #1 once. I don’t see any errors >>>>>> or >>>>>> retrys or anything like that. >>>>>> >>>>>> On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote: >>>>>> >>>>>>> On Tue, 3 Dec 2013, Erik Steffl wrote: >>>>>>> >>>>>>>> we have sort of similar problem, in our case it's Amazon Elastic >>>>>>>> Load >>>>>>>> Balancer (ELB) that somehow causes the connection go "bad" if >>>>>>>>there >>>>>>>> is >>>>>>>> no >>>>>>>> traffic for 5 min (not sure what the exact time is, 1 minute is >>>>>>>>ok, >>>>>>>> 5 >>>>>>>> minutes >>>>>>>> is not). >>>>>>>> >>>>>>>> not sure what going "bad" actually means (still investigating) but >>>>>>>> the >>>>>>>> data >>>>>>>> is not going through, rsyslog sends data but there is no >>>>>>>>response... >>>>>>>> it >>>>>>>> recovers eventually but not sure what exactly triggers the >>>>>>>>recovery >>>>>>>> (sending >>>>>>>> more messages is what triggers it but how exactly is not clear). >>>>>>>> >>>>>>>> It's not the same case but maybe you can look into VIP and >>>>>>>> connections >>>>>>>> and >>>>>>>> see what happens there, maybe use strace to see what are the >>>>>>>> responses >>>>>>>> when >>>>>>>> rsyslogd sends data to destination... >>>>>>> >>>>>>> or use tcpdump to watch the traffic over the network. >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>>> erik >>>>>>>> >>>>>>>> On 12/03/2013 01:12 PM, Dan Finn wrote: >>>>>>>>> I had kind of wondered about that as well but I have a few >>>>>>>>>reasons >>>>>>>>> that >>>>>>>>> make it seem like that is not the case. >>>>>>>>> >>>>>>>>> The ³central server² is actually a VIP on our F5 load balancer >>>>>>>>> with 4 >>>>>>>>> rsyslog destination servers behind it. We have about 200 servers >>>>>>>>> in >>>>>>>>> our >>>>>>>>> environment and during these busy times the only servers that >>>>>>>>>ever >>>>>>>>> seem to >>>>>>>>> log locally are the postgres servers. The volume of logs being >>>>>>>>> written on >>>>>>>>> these servers is certainly much higher than anywhere else. My >>>>>>>>> theory >>>>>>>>> is >>>>>>>>> that the rsyslog ³client² is not keeping up with the sheer volume >>>>>>>>> on >>>>>>>>> these >>>>>>>>> servers during the busy times but until I can find some concrete >>>>>>>>> info >>>>>>>>> that >>>>>>>>> is just a theory. >>>>>>>>> >>>>>>>>> We are looking gat upgrading to v7 but unfortunately that¹s not >>>>>>>>> going >>>>>>>>> to >>>>>>>>> be a quick fix. I was hoping maybe there was an issue in my >>>>>>>>>config >>>>>>>>> or >>>>>>>>> something that could be tweaked but it sounds like maybe that is >>>>>>>>> not >>>>>>>>> the >>>>>>>>> case? >>>>>>>>> >>>>>>>>> I did capture some debug output while this was happening. >>>>>>>>> Unfortunately >>>>>>>>> it was pretty large so I don¹t know if I can share the whole >>>>>>>>>thing >>>>>>>>> but >>>>>>>>> is >>>>>>>>> there anything in particular I would be looking for in there? I >>>>>>>>> see >>>>>>>>> that >>>>>>>>> it says it¹s writing the files locally but I didn¹t see where it >>>>>>>>> says >>>>>>>>> why. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> you are sending the logs via TCP, which means that if the system >>>>>>>>>> you >>>>>>>>>> are >>>>>>>>>> sending >>>>>>>>>> logs to gets backed up, logs will queue on the sending system, >>>>>>>>>> spilling >>>>>>>>>> to disk >>>>>>>>>> as needed. >>>>>>>>>> >>>>>>>>>> the bottleneck is probably on the central server, but we have no >>>>>>>>>> info >>>>>>>>>> about what >>>>>>>>>> it's doing. >>>>>>>>>> >>>>>>>>>> The go-to tool for diagnosting this sort of problem is the >>>>>>>>>> impstats >>>>>>>>>> module, but >>>>>>>>>> I don't think that existed back in the 4.x days, and tracking >>>>>>>>>>down >>>>>>>>>> the >>>>>>>>>> bottleneck without it is significantly harder. Is there any way >>>>>>>>>> you >>>>>>>>>> can >>>>>>>>>> upgrade >>>>>>>>>> to a current version? >>>>>>>>>> >>>>>>>>>> David Lang >>>>>>>>>> >>>>>>>>>> On Mon, 2 Dec 2013, Dan Finn wrote: >>>>>>>>>> >>>>>>>>>>> Date: Mon, 2 Dec 2013 20:53:54 +0000 >>>>>>>>>>> From: Dan Finn <[email protected]> >>>>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>>>> To: "[email protected]" <[email protected]> >>>>>>>>>>> Subject: [rsyslog] rsyslog frequently queuing to disk when it >>>>>>>>>>> should >>>>>>>>>>> be >>>>>>>>>>> sending over the network >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I¹m trying to get some insight into an issue that we have been >>>>>>>>>>> seeing >>>>>>>>>>> quite a bit. We have some postgres servers that are quite >>>>>>>>>>> verbose. >>>>>>>>>>> When the servers get busy we have an issue where they queue >>>>>>>>>>>their >>>>>>>>>>> logs >>>>>>>>>>> locally instead of sending over the network however I can¹t >>>>>>>>>>>find >>>>>>>>>>> any >>>>>>>>>>> reason why that would be, at least not from a OS resource >>>>>>>>>>> standpoint. >>>>>>>>>>> We are running rsyslog4-4.8.0-1.ius.el5. This is my config >>>>>>>>>>>from >>>>>>>>>>> the >>>>>>>>>>> client that was having issues : http://pastebin.com/n3XpRdMm. >>>>>>>>>>> >>>>>>>>>>> I watched it queue about 10k files under /var/spool/rsyslog >>>>>>>>>>> before >>>>>>>>>>> I >>>>>>>>>>> finally had to manually delete them out because disk was >>>>>>>>>>>filling >>>>>>>>>>> up. >>>>>>>>>>> >>>>>>>>>>> What¹s the best way to get some insight into why this might be >>>>>>>>>>> happening? Is there a way I can enable some debug logging for >>>>>>>>>>> the >>>>>>>>>>> rsyslog process itself? Any settings in our config that could >>>>>>>>>>>be >>>>>>>>>>> tweaked? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> DAN FINN >>>>>>>>>>> >>>>>>>>>>> Linux System Administrator >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [email protected]<mailto:[email protected]> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Backcountry.com<http://www.backcountry.com/> >>>>>>>>>>> >>>>>>>>>>> Competitive Cyclist<http://www.competitivecyclist.com/> >>>>>>>>>>> >>>>>>>>>>> RealCyclist.com<http://www.realcyclist.com/> >>>>>>>>>>> >>>>>>>>>>> Dogfunk.com<http://www.dogfunk.com/> >>>>>>>>>>> >>>>>>>>>>> SteepandCheap.com<http://www.steepandcheap.com/> >>>>>>>>>>> >>>>>>>>>>> Chainlove.com<http://www.chainlove.com/> >>>>>>>>>>> >>>>>>>>>>> WhiskeyMilitia.com<http://www.whiskeymilitia.com/> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>>>>>>>>by a >>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO >>>>>>>>>>>NOT >>>>>>>>>>> POST >>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad of >>>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>>you >>>>>>>>> DON'T >>>>>>>>> LIKE THAT. >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad of >>>>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>you >>>>>>>> DON'T >>>>>>>> LIKE THAT. >>>>>> >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>POST >>>>>> if you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>you >>>>> DON'T LIKE THAT. >>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>> if you DON'T LIKE THAT. >> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

