As I was typing out the email, it occurred to me that the issue is OS
related:

Looking at a sending server, A, I saw these messages in dmesg:
TCP: Peer 10.2.1.2:514/47081 unexpectedly shrunk window 861404336:861405796
(repaired)

The local TCP port, 47081 is the same one that is part of the stuck
connection.

Now, I know what the problem is :) However, cannot seem to find a fix :(




On Wed, Dec 10, 2014 at 8:46 PM, Tim Smith <[email protected]> wrote:

> Hi,
>
> I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward logs
> to multiple destinations:
> - one copy to Splunk syslog listener
> - one copy to local flume process over TCP
> - one copy to a remote RSyslog receiver, X and Y (RHEL 6.x)
>
> Forwarding copies to Splunk and Flume works fine. However, forwarding to
> the remote Syslog receivers gets stuck in a strange way. The forwarding is
> setup as:
> RSyslog-Server-A -> RSyslog-Server-X
> RSyslog-Server-B -> RSyslog-Server-Y
>
> All four - A,B, X and Y are running exactly the same version of RSyslog -
> 8.6.2-2, from the adiscon repo:
> rsyslog-8.6.0-2.el6.x86_64
>
> What happens is A/B stop sending logs to X/Y. Looking at the send/receive
> TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on
> A/B gets stuck. As an example, this connection lingers forever (extracted
> with netstat -an | grep EST):
> tcp        0 103660 10.24.62.9:47081         10.2.1.2:514
>  ESTABLISHED
>
> Observations:
> ==========
> - The connection remains established with the same number of bytes in the
> sendQ
> - No data is transferred over the "stuck" connection, looking at tcpdump
> - Re-starting the receive end, X/Y, does not help
> - I don't see an action suspended error in the rsyslog logs
> - Running the send side in debug doesn't help - I easily ended up with
> 100+ Gigs of debug logs without the issue manifesting itself. The A/B pair
> handle lots of traffic and running rsyslogd in debug mode reduces their
> throughput - perhaps the issue does not manifest at lower EPS.
> - Only re-starting the send side, A/B, resolves the issue.
>
> I tweaked omfwd action to change TCP_Framing from default to octet-based.
> Here is the send side omfwd config on A/B:
> --------------------
> action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514"
> Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X"
>  queue.maxdiskspace="10G" queue.Size="8640000"
> queue.dequeuebatchsize="4096" queue.type="LinkedList"
> queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on"
> queue.workerThreads="4"  RebindInterval="10000000" template="fwdformat" )
> --------------------
>
>
> The receive side, X/Y, config:
> --------------------
> module(load="imptcp" threads="16") # needs to be done just once
>
> global (
>     workdirectory="/data/rsyslog/queues"
>     maxmessagesize="64K"
>     debug.logfile="/data/rsyslog/debug/debug.log"
>     net.enabledns="off"
> )
>
> $DebugLevel 0
>
> main_queue (
>     queue.FileName="globalqueue"
>     queue.Type="LinkedList"
>     queue.MaxDiskSpace="250g"
>     queue.maxfilesize="5g"
>     queue.Size="864000000"
>     queue.dequeuebatchsize="1000"
>     queue.TimeoutEnqueue="0"
>     queue.workerThreads="4"
>     queue.SaveOnShutdown="on"
> )
>
> ruleset(name="aggregate") {
> action (name="to_flume"
>         type="omfwd"
>         Target="localhost"
>         Port="5614"
>         Protocol="tcp"
>         queue.filename="to_flume"
>         queue.size="360000000"
>         queue.maxdiskspace="360G"
>         queue.highwatermark="216000000"   # 60% of queue.size
>         queue.discardmark="288000000"     # 80% of queue.size
>         queue.type="LinkedList"
>         queue.dequeuebatchsize="4096"
>         queue.timeoutenqueue="0"
>         queue.maxfilesize="4G"
>         queue.saveonshutdown="on"
>         queue.workerThreads="4"
>         RebindInterval="10000000"
>         template="rawfwd"
>       ) stop
> }
>
> input(type="imptcp" port="514" ruleset="aggregate")
> --------------------
>
> Any pointers to troubleshoot and smoke out the bug will be highly
> appreciated :)
>
> Thanks
>
>
>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to