As I was typing out the email, it occurred to me that the issue is OS related:
Looking at a sending server, A, I saw these messages in dmesg: TCP: Peer 10.2.1.2:514/47081 unexpectedly shrunk window 861404336:861405796 (repaired) The local TCP port, 47081 is the same one that is part of the stuck connection. Now, I know what the problem is :) However, cannot seem to find a fix :( On Wed, Dec 10, 2014 at 8:46 PM, Tim Smith <[email protected]> wrote: > Hi, > > I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward logs > to multiple destinations: > - one copy to Splunk syslog listener > - one copy to local flume process over TCP > - one copy to a remote RSyslog receiver, X and Y (RHEL 6.x) > > Forwarding copies to Splunk and Flume works fine. However, forwarding to > the remote Syslog receivers gets stuck in a strange way. The forwarding is > setup as: > RSyslog-Server-A -> RSyslog-Server-X > RSyslog-Server-B -> RSyslog-Server-Y > > All four - A,B, X and Y are running exactly the same version of RSyslog - > 8.6.2-2, from the adiscon repo: > rsyslog-8.6.0-2.el6.x86_64 > > What happens is A/B stop sending logs to X/Y. Looking at the send/receive > TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on > A/B gets stuck. As an example, this connection lingers forever (extracted > with netstat -an | grep EST): > tcp 0 103660 10.24.62.9:47081 10.2.1.2:514 > ESTABLISHED > > Observations: > ========== > - The connection remains established with the same number of bytes in the > sendQ > - No data is transferred over the "stuck" connection, looking at tcpdump > - Re-starting the receive end, X/Y, does not help > - I don't see an action suspended error in the rsyslog logs > - Running the send side in debug doesn't help - I easily ended up with > 100+ Gigs of debug logs without the issue manifesting itself. The A/B pair > handle lots of traffic and running rsyslogd in debug mode reduces their > throughput - perhaps the issue does not manifest at lower EPS. > - Only re-starting the send side, A/B, resolves the issue. > > I tweaked omfwd action to change TCP_Framing from default to octet-based. > Here is the send side omfwd config on A/B: > -------------------- > action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514" > Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X" > queue.maxdiskspace="10G" queue.Size="8640000" > queue.dequeuebatchsize="4096" queue.type="LinkedList" > queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on" > queue.workerThreads="4" RebindInterval="10000000" template="fwdformat" ) > -------------------- > > > The receive side, X/Y, config: > -------------------- > module(load="imptcp" threads="16") # needs to be done just once > > global ( > workdirectory="/data/rsyslog/queues" > maxmessagesize="64K" > debug.logfile="/data/rsyslog/debug/debug.log" > net.enabledns="off" > ) > > $DebugLevel 0 > > main_queue ( > queue.FileName="globalqueue" > queue.Type="LinkedList" > queue.MaxDiskSpace="250g" > queue.maxfilesize="5g" > queue.Size="864000000" > queue.dequeuebatchsize="1000" > queue.TimeoutEnqueue="0" > queue.workerThreads="4" > queue.SaveOnShutdown="on" > ) > > ruleset(name="aggregate") { > action (name="to_flume" > type="omfwd" > Target="localhost" > Port="5614" > Protocol="tcp" > queue.filename="to_flume" > queue.size="360000000" > queue.maxdiskspace="360G" > queue.highwatermark="216000000" # 60% of queue.size > queue.discardmark="288000000" # 80% of queue.size > queue.type="LinkedList" > queue.dequeuebatchsize="4096" > queue.timeoutenqueue="0" > queue.maxfilesize="4G" > queue.saveonshutdown="on" > queue.workerThreads="4" > RebindInterval="10000000" > template="rawfwd" > ) stop > } > > input(type="imptcp" port="514" ruleset="aggregate") > -------------------- > > Any pointers to troubleshoot and smoke out the bug will be highly > appreciated :) > > Thanks > > > > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

