Hi,

I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward logs
to multiple destinations:
- one copy to Splunk syslog listener
- one copy to local flume process over TCP
- one copy to a remote RSyslog receiver, X and Y (RHEL 6.x)

Forwarding copies to Splunk and Flume works fine. However, forwarding to
the remote Syslog receivers gets stuck in a strange way. The forwarding is
setup as:
RSyslog-Server-A -> RSyslog-Server-X
RSyslog-Server-B -> RSyslog-Server-Y

All four - A,B, X and Y are running exactly the same version of RSyslog -
8.6.2-2, from the adiscon repo:
rsyslog-8.6.0-2.el6.x86_64

What happens is A/B stop sending logs to X/Y. Looking at the send/receive
TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on
A/B gets stuck. As an example, this connection lingers forever (extracted
with netstat -an | grep EST):
tcp        0 103660 10.24.62.9:47081         10.2.1.2:514
 ESTABLISHED

Observations:
==========
- The connection remains established with the same number of bytes in the
sendQ
- No data is transferred over the "stuck" connection, looking at tcpdump
- Re-starting the receive end, X/Y, does not help
- I don't see an action suspended error in the rsyslog logs
- Running the send side in debug doesn't help - I easily ended up with 100+
Gigs of debug logs without the issue manifesting itself. The A/B pair
handle lots of traffic and running rsyslogd in debug mode reduces their
throughput - perhaps the issue does not manifest at lower EPS.
- Only re-starting the send side, A/B, resolves the issue.

I tweaked omfwd action to change TCP_Framing from default to octet-based.
Here is the send side omfwd config on A/B:
--------------------
action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514"
Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X"
 queue.maxdiskspace="10G" queue.Size="8640000"
queue.dequeuebatchsize="4096" queue.type="LinkedList"
queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on"
queue.workerThreads="4"  RebindInterval="10000000" template="fwdformat" )
--------------------


The receive side, X/Y, config:
--------------------
module(load="imptcp" threads="16") # needs to be done just once

global (
    workdirectory="/data/rsyslog/queues"
    maxmessagesize="64K"
    debug.logfile="/data/rsyslog/debug/debug.log"
    net.enabledns="off"
)

$DebugLevel 0

main_queue (
    queue.FileName="globalqueue"
    queue.Type="LinkedList"
    queue.MaxDiskSpace="250g"
    queue.maxfilesize="5g"
    queue.Size="864000000"
    queue.dequeuebatchsize="1000"
    queue.TimeoutEnqueue="0"
    queue.workerThreads="4"
    queue.SaveOnShutdown="on"
)

ruleset(name="aggregate") {
action (name="to_flume"
        type="omfwd"
        Target="localhost"
        Port="5614"
        Protocol="tcp"
        queue.filename="to_flume"
        queue.size="360000000"
        queue.maxdiskspace="360G"
        queue.highwatermark="216000000"   # 60% of queue.size
        queue.discardmark="288000000"     # 80% of queue.size
        queue.type="LinkedList"
        queue.dequeuebatchsize="4096"
        queue.timeoutenqueue="0"
        queue.maxfilesize="4G"
        queue.saveonshutdown="on"
        queue.workerThreads="4"
        RebindInterval="10000000"
        template="rawfwd"
      ) stop
}

input(type="imptcp" port="514" ruleset="aggregate")
--------------------

Any pointers to troubleshoot and smoke out the bug will be highly
appreciated :)

Thanks
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to