I tweaked a few OS/kernel parameters like eth driver options but finally, this seems to have done the trick: sysctl -w net.ipv4.tcp_window_scaling=0
On Wed, Dec 10, 2014 at 9:13 PM, Tim Smith <[email protected]> wrote: > As I was typing out the email, it occurred to me that the issue is OS > related: > > Looking at a sending server, A, I saw these messages in dmesg: > TCP: Peer 10.2.1.2:514/47081 unexpectedly shrunk window > 861404336:861405796 (repaired) > > The local TCP port, 47081 is the same one that is part of the stuck > connection. > > Now, I know what the problem is :) However, cannot seem to find a fix :( > > > > > On Wed, Dec 10, 2014 at 8:46 PM, Tim Smith <[email protected]> wrote: > >> Hi, >> >> I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward >> logs to multiple destinations: >> - one copy to Splunk syslog listener >> - one copy to local flume process over TCP >> - one copy to a remote RSyslog receiver, X and Y (RHEL 6.x) >> >> Forwarding copies to Splunk and Flume works fine. However, forwarding to >> the remote Syslog receivers gets stuck in a strange way. The forwarding is >> setup as: >> RSyslog-Server-A -> RSyslog-Server-X >> RSyslog-Server-B -> RSyslog-Server-Y >> >> All four - A,B, X and Y are running exactly the same version of RSyslog - >> 8.6.2-2, from the adiscon repo: >> rsyslog-8.6.0-2.el6.x86_64 >> >> What happens is A/B stop sending logs to X/Y. Looking at the send/receive >> TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on >> A/B gets stuck. As an example, this connection lingers forever (extracted >> with netstat -an | grep EST): >> tcp 0 103660 10.24.62.9:47081 10.2.1.2:514 >> ESTABLISHED >> >> Observations: >> ========== >> - The connection remains established with the same number of bytes in the >> sendQ >> - No data is transferred over the "stuck" connection, looking at tcpdump >> - Re-starting the receive end, X/Y, does not help >> - I don't see an action suspended error in the rsyslog logs >> - Running the send side in debug doesn't help - I easily ended up with >> 100+ Gigs of debug logs without the issue manifesting itself. The A/B pair >> handle lots of traffic and running rsyslogd in debug mode reduces their >> throughput - perhaps the issue does not manifest at lower EPS. >> - Only re-starting the send side, A/B, resolves the issue. >> >> I tweaked omfwd action to change TCP_Framing from default to octet-based. >> Here is the send side omfwd config on A/B: >> -------------------- >> action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514" >> Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X" >> queue.maxdiskspace="10G" queue.Size="8640000" >> queue.dequeuebatchsize="4096" queue.type="LinkedList" >> queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on" >> queue.workerThreads="4" RebindInterval="10000000" template="fwdformat" ) >> -------------------- >> >> >> The receive side, X/Y, config: >> -------------------- >> module(load="imptcp" threads="16") # needs to be done just once >> >> global ( >> workdirectory="/data/rsyslog/queues" >> maxmessagesize="64K" >> debug.logfile="/data/rsyslog/debug/debug.log" >> net.enabledns="off" >> ) >> >> $DebugLevel 0 >> >> main_queue ( >> queue.FileName="globalqueue" >> queue.Type="LinkedList" >> queue.MaxDiskSpace="250g" >> queue.maxfilesize="5g" >> queue.Size="864000000" >> queue.dequeuebatchsize="1000" >> queue.TimeoutEnqueue="0" >> queue.workerThreads="4" >> queue.SaveOnShutdown="on" >> ) >> >> ruleset(name="aggregate") { >> action (name="to_flume" >> type="omfwd" >> Target="localhost" >> Port="5614" >> Protocol="tcp" >> queue.filename="to_flume" >> queue.size="360000000" >> queue.maxdiskspace="360G" >> queue.highwatermark="216000000" # 60% of queue.size >> queue.discardmark="288000000" # 80% of queue.size >> queue.type="LinkedList" >> queue.dequeuebatchsize="4096" >> queue.timeoutenqueue="0" >> queue.maxfilesize="4G" >> queue.saveonshutdown="on" >> queue.workerThreads="4" >> RebindInterval="10000000" >> template="rawfwd" >> ) stop >> } >> >> input(type="imptcp" port="514" ruleset="aggregate") >> -------------------- >> >> Any pointers to troubleshoot and smoke out the bug will be highly >> appreciated :) >> >> Thanks >> >> >> >> > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

