This makes me think that you have a firewall between the two that doesn't
understand window scaling and is stripping it out of the packets (breaking
things when scaling is in use)
This is not normally done by ISPs, but if you have an old firewall in the path
somewhere, check it out. It probably needs to be updated to patch security holes
(and to get it onto a supported version, this is an old problem)
David Lang
On Fri, 12 Dec 2014, Tim Smith wrote:
I tweaked a few OS/kernel parameters like eth driver options but finally,
this seems to have done the trick:
sysctl -w net.ipv4.tcp_window_scaling=0
On Wed, Dec 10, 2014 at 9:13 PM, Tim Smith <[email protected]> wrote:
As I was typing out the email, it occurred to me that the issue is OS
related:
Looking at a sending server, A, I saw these messages in dmesg:
TCP: Peer 10.2.1.2:514/47081 unexpectedly shrunk window
861404336:861405796 (repaired)
The local TCP port, 47081 is the same one that is part of the stuck
connection.
Now, I know what the problem is :) However, cannot seem to find a fix :(
On Wed, Dec 10, 2014 at 8:46 PM, Tim Smith <[email protected]> wrote:
Hi,
I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward
logs to multiple destinations:
- one copy to Splunk syslog listener
- one copy to local flume process over TCP
- one copy to a remote RSyslog receiver, X and Y (RHEL 6.x)
Forwarding copies to Splunk and Flume works fine. However, forwarding to
the remote Syslog receivers gets stuck in a strange way. The forwarding is
setup as:
RSyslog-Server-A -> RSyslog-Server-X
RSyslog-Server-B -> RSyslog-Server-Y
All four - A,B, X and Y are running exactly the same version of RSyslog -
8.6.2-2, from the adiscon repo:
rsyslog-8.6.0-2.el6.x86_64
What happens is A/B stop sending logs to X/Y. Looking at the send/receive
TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on
A/B gets stuck. As an example, this connection lingers forever (extracted
with netstat -an | grep EST):
tcp 0 103660 10.24.62.9:47081 10.2.1.2:514
ESTABLISHED
Observations:
==========
- The connection remains established with the same number of bytes in the
sendQ
- No data is transferred over the "stuck" connection, looking at tcpdump
- Re-starting the receive end, X/Y, does not help
- I don't see an action suspended error in the rsyslog logs
- Running the send side in debug doesn't help - I easily ended up with
100+ Gigs of debug logs without the issue manifesting itself. The A/B pair
handle lots of traffic and running rsyslogd in debug mode reduces their
throughput - perhaps the issue does not manifest at lower EPS.
- Only re-starting the send side, A/B, resolves the issue.
I tweaked omfwd action to change TCP_Framing from default to octet-based.
Here is the send side omfwd config on A/B:
--------------------
action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514"
Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X"
queue.maxdiskspace="10G" queue.Size="8640000"
queue.dequeuebatchsize="4096" queue.type="LinkedList"
queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on"
queue.workerThreads="4" RebindInterval="10000000" template="fwdformat" )
--------------------
The receive side, X/Y, config:
--------------------
module(load="imptcp" threads="16") # needs to be done just once
global (
workdirectory="/data/rsyslog/queues"
maxmessagesize="64K"
debug.logfile="/data/rsyslog/debug/debug.log"
net.enabledns="off"
)
$DebugLevel 0
main_queue (
queue.FileName="globalqueue"
queue.Type="LinkedList"
queue.MaxDiskSpace="250g"
queue.maxfilesize="5g"
queue.Size="864000000"
queue.dequeuebatchsize="1000"
queue.TimeoutEnqueue="0"
queue.workerThreads="4"
queue.SaveOnShutdown="on"
)
ruleset(name="aggregate") {
action (name="to_flume"
type="omfwd"
Target="localhost"
Port="5614"
Protocol="tcp"
queue.filename="to_flume"
queue.size="360000000"
queue.maxdiskspace="360G"
queue.highwatermark="216000000" # 60% of queue.size
queue.discardmark="288000000" # 80% of queue.size
queue.type="LinkedList"
queue.dequeuebatchsize="4096"
queue.timeoutenqueue="0"
queue.maxfilesize="4G"
queue.saveonshutdown="on"
queue.workerThreads="4"
RebindInterval="10000000"
template="rawfwd"
) stop
}
input(type="imptcp" port="514" ruleset="aggregate")
--------------------
Any pointers to troubleshoot and smoke out the bug will be highly
appreciated :)
Thanks
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.