On Fri, 26 Jul 2013, Erik Steffl wrote:
While testing rsyslogd sending logs to a remote server I encountered this
scenario:
- remote server (which is behind amazon elastic load balancer) closes
connection (the host is up but nobody listens on the port)
- rsyslogd seems to be sending data to the closed connection
- connection is in CLOSED_WAIT state
strace of rsyslogd reveals (replaced IP and content of message by XXX):
[pid 14188] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 2
[pid 14188] connect(2, {sa_family=AF_INET, sin_port=htons(5140),
sin_addr=inet_addr("XXX")}, 16) = 0
[pid 14188] recvfrom(2, 0x7f44d42e765f, 1, 66, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
[pid 14188] sendto(2, "<135>2013-07-26T20:43:08+00:00 XXX", 679, 0, NULL, 0)
= 679
And everybody thinks the connection is in CLOSE_WAIT state:
$ sudo lsof | grep rsyslogd | grep TCP
rsyslogd 14184 syslog 2u IPv4 188760 0t0
TCP
ip-10-2-35-151.ec2.internal:48228->ec2-54-225-181-82.compute-1.amazonaws.com:5140
(CLOSE_WAIT)
$ netstat -a | grep 5140
tcp 1 0 XXX:48229 ELB:5140 CLOSE_WAIT
ELB in above is name/IP of Amazon elastic load balancer, it seems that it
behaves slightly suspiciously (why does connect succeed?, why does sendto
succeed?)
Any ideas why rsyslogd does not close the connection that is in CLOSE_WAIT
state? The connection remains in CLOSE_WAIT state even after the remote
server starts listening on 5140 port. This does not happen everytime there is
no listener on remote host but when it happens it doesn't seem to be fixed
until I restart rsyslogd.
rsyslog is closing the connection, but then it's opening the connection again
for the next message.
My guess is that the ELB is accepting the connection (which is why the connect
succeeds), and then discovering that there's no way for it to send the
connection to one of the servers that would handle the traffic, so it turns
around and closes the connection.
While it is closing the connection, rsyslog is sending data out through that
connection, because it thinks it's open. This results in some data being lost.
but since rsyslog thinks some data got through (and it successfully opened the
connection), it doesn't mark that destination as being down, and keeps trying to
send data there.
If you were to use RELP, it would properly detect that the messages are not
being delivered and no messages would be lost. This is exactly the type of thing
that triggered the creation of RELP.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.