Re: [rsyslog] rsyslog frequently queuing to disk when it should be sending over the network

Dave Caplinger Wed, 04 Dec 2013 08:32:31 -0800

Without impstats output it's hard to say for sure, but since your config is so 
succinct, you are getting a lot of default buffer sizes and watermark 
parameters.  I see you have $ActionResumeRetryCount set to -1 for infinite 
retries (which is good).  Note though that default high and low water marks are 
8,000 and 2,000 messages, respectively.  So once you get into disk-assisted 
mode, you won't leave it until the action queue gets all the way down to 2000 
messages.  The default action queue size will be 10,000 messages, and that's 
really not very much, especially in an environment that has significant spikes 
in volume.


The other possibilities that come to mind are:

1) that the F5 is correctly sending to an rsyslog server that isn't listening 
any more for some reason

If the receiving side's TCP session gets stuck, or something else goes wrong 
but the F5 doesn't know it, the hashing algorithm will continue to send traffic 
to the same (dead) destination.  TCP default timeouts are 2 minutes; this can 
seem like an eternity when digging through packet captures.  So on the sending 
side, perhaps it sends a SYN trying to open the session, and then nothing 
happens for 2 minutes before it tries all over again?

2) perhaps there's something else in the network breaking the TCP session, such 
as a firewall doing NAT

I've seen cases before where the NAT-ing firewall would time-out translated IP 
addresses after a certain period, breaking long-running sessions.  The Cisco 
PIX/ASA, for example, has both idle address-translation timeouts, as well as 
total duration timeouts.  So even a currently in-use session can still be 
affected by something like this.

3) maybe there is some odd behavior in v4 of rsyslog pertaining to this 
situation that has long since been fixed :-)

Not pointing fingers; I just don't have a lot of experience with rsyslog that 
old so I'm just speculating.

--
Dave Caplinger, Director of Architecture  |  402.361.3063
Solutionary  |  Relevant  .  Intelligent  .  Security

On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote:

> I’ve done that and I’ve seen 2 things happen during these periods where
> files are being written locally.
> 
> 1) Nothing at all was attempted to be sent to the remote destination.
> Using telnet I could make a connection just fine but rsyslog wasn’t even
> attempting to send or talk to the destination server over TCP 514.
> Message queue was growing extremely fast.  I can’t explain it but on the
> 2nd or 3rd restart it started talking to the remote again and began
> flushing out the queue.
> 2) lots of traffic is going to the remote over TCP 514.  The queue is
> slowly growing but growing at a consistent rate.  This is the most common
> situation, I’ve only seen situation #1 once.  I don’t see any errors or
> retrys or anything like that.
> 
> On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote:
> 
>> On Tue, 3 Dec 2013, Erik Steffl wrote:
>> 
>>> we have sort of similar problem, in our case it's Amazon Elastic Load
>>> Balancer (ELB) that somehow causes the connection go "bad" if there is
>>> no 
>>> traffic for 5 min (not sure what the exact time is, 1 minute is ok, 5
>>> minutes 
>>> is not).
>>> 
>>> not sure what going "bad" actually means (still investigating) but the
>>> data 
>>> is not going through, rsyslog sends data but there is no response... it
>>> recovers eventually but not sure what exactly triggers the recovery
>>> (sending 
>>> more messages is what triggers it but how exactly is not clear).
>>> 
>>> It's not the same case but maybe you can look into VIP and connections
>>> and 
>>> see what happens there, maybe use strace to see what are the responses
>>> when 
>>> rsyslogd sends data to destination...
>> 
>> or use tcpdump to watch the traffic over the network.
>> 
>> David Lang
>> 
>>>     erik
>>> 
>>> On 12/03/2013 01:12 PM, Dan Finn wrote:
>>>> I had kind of wondered about that as well but I have a few reasons that
>>>> make it seem like that is not the case.
>>>> 
>>>> The ³central server² is actually a VIP on our F5 load balancer with 4
>>>> rsyslog destination servers behind it.  We have about 200 servers in
>>>> our
>>>> environment and during these busy times the only servers that ever
>>>> seem to
>>>> log locally are the postgres servers.  The volume of logs being
>>>> written on
>>>> these servers is certainly much higher than anywhere else.  My theory
>>>> is
>>>> that the rsyslog ³client² is not keeping up with the sheer volume on
>>>> these
>>>> servers during the busy times but until I can find some concrete info
>>>> that
>>>> is just a theory.
>>>> 
>>>> We are looking gat upgrading to v7 but unfortunately that¹s not going
>>>> to
>>>> be a quick fix.  I was hoping maybe there was an issue in my config or
>>>> something that could be tweaked but it sounds like maybe that is not
>>>> the
>>>> case?
>>>> 
>>>> I did capture some debug output while this was happening.
>>>> Unfortunately
>>>> it was pretty large so I don¹t know if I can share the whole thing but
>>>> is
>>>> there anything in particular I would be looking for in there?  I see
>>>> that
>>>> it says it¹s writing the files locally but I didn¹t see where it says
>>>> why.
>>>> 
>>>> Thanks,
>>>> Dan
>>>> 
>>>> On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote:
>>>> 
>>>>> you are sending the logs via TCP, which means that if the system you
>>>>> are
>>>>> sending
>>>>> logs to gets backed up, logs will queue on the sending system,
>>>>> spilling
>>>>> to disk
>>>>> as needed.
>>>>> 
>>>>> the bottleneck is probably on the central server, but we have no info
>>>>> about what
>>>>> it's doing.
>>>>> 
>>>>> The go-to tool for diagnosting this sort of problem is the impstats
>>>>> module, but
>>>>> I don't think that existed back in the 4.x days, and tracking down the
>>>>> bottleneck without it is significantly harder. Is there any way you
>>>>> can
>>>>> upgrade
>>>>> to a current version?
>>>>> 
>>>>> David Lang
>>>>> 
>>>>>  On Mon, 2 Dec 2013, Dan Finn wrote:
>>>>> 
>>>>>> Date: Mon, 2 Dec 2013 20:53:54 +0000
>>>>>> From: Dan Finn <[email protected]>
>>>>>> Reply-To: rsyslog-users <[email protected]>
>>>>>> To: "[email protected]" <[email protected]>
>>>>>> Subject: [rsyslog] rsyslog frequently queuing to disk when it should
>>>>>> be
>>>>>>     sending over the network
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I¹m trying to get some insight into an issue that we have been seeing
>>>>>> quite a bit.  We have some postgres servers that are quite verbose.
>>>>>> When the servers get busy we have an issue where they queue their
>>>>>> logs
>>>>>> locally instead of sending over the network however I can¹t find any
>>>>>> reason why that would be, at least not from a OS resource standpoint.
>>>>>> We are running rsyslog4-4.8.0-1.ius.el5.  This is my config from the
>>>>>> client that was having issues : http://pastebin.com/n3XpRdMm.
>>>>>> 
>>>>>> I watched it queue about 10k files under /var/spool/rsyslog before I
>>>>>> finally had to manually delete them out because disk was filling up.
>>>>>> 
>>>>>> What¹s the best way to get some insight into why this might be
>>>>>> happening?  Is there a way I can enable some debug logging for the
>>>>>> rsyslog process itself?  Any settings in our config that could be
>>>>>> tweaked?
>>>>>> 
>>>>>> Thanks,
>>>>>> Dan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> DAN FINN
>>>>>> 
>>>>>> Linux System Administrator
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [email protected]<mailto:[email protected]>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Backcountry.com<http://www.backcountry.com/>
>>>>>> 
>>>>>> Competitive Cyclist<http://www.competitivecyclist.com/>
>>>>>> 
>>>>>> RealCyclist.com<http://www.realcyclist.com/>
>>>>>> 
>>>>>> Dogfunk.com<http://www.dogfunk.com/>
>>>>>> 
>>>>>> SteepandCheap.com<http://www.steepandcheap.com/>
>>>>>> 
>>>>>> Chainlove.com<http://www.chainlove.com/>
>>>>>> 
>>>>>> WhiskeyMilitia.com<http://www.whiskeymilitia.com/>
>>>>>> _______________________________________________
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>>> POST
>>>>>> if you DON'T LIKE THAT.
>>>> 
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>> myriad of 
>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T 
>>>> LIKE THAT.
>>>> 
>>> 
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>> myriad of 
>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T 
>>> LIKE THAT.
> 
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] rsyslog frequently queuing to disk when it should be sending over the network

Reply via email to