On 08/12/2013 01:21 PM, David Lang wrote:
Eric, with your particular load balancer, if you telnet to the ip on
port 514 does telnet fail with an error, or does it establish a
connection that then gets close almost immediatly?

it opens connection that is closed immediately (to Amazon elastic load balancer with no hosts behind it):

erik@yummly-ubuntu-erik-gazelle:~$ telnet $elbHost 5140
Trying 54.243.148.203...
Connected to ...elb host...
Escape character is '^]'.
Connection closed by foreign host.

strace (part from opening to closing connection, the connection opens successfully)):

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_IP, IP_TOS, [16], 4)  = 0
connect(3, {sa_family=AF_INET, sin_port=htons(5140), sin_addr=inet_addr("54.243.148.203")}, 16) = 0 open("/etc/telnetrc", O_RDONLY) = -1 ENOENT (No such file or directory) open("/home/erik/.telnetrc", O_RDONLY) = -1 ENOENT (No such file or directory) write(1, "Connected to erik-tcp-test-2132493349.us-east-1.elb.amazonaws.com.\n", 67) = 67
write(1, "Escape character is '^]'.\n", 26) = 26
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGINT, {0x407970, [INT], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGQUIT, {0x407920, [QUIT], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGWINCH, {0x407900, [WINCH], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTSTP, {0x40b6c0, [TSTP], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, {0x40b6c0, [TSTP], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, 8) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, FIONBIO, [1])                  = 0
ioctl(1, FIONBIO, [1])                  = 0
ioctl(3, FIONBIO, [1])                  = 0
setsockopt(3, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
select(4, [0 3], [], [3], {0, 0})       = 0 (Timeout)
select(4, [0 3], [], [3], NULL)         = 1 (in [3])
recvfrom(3, "", 8191, 0, NULL, NULL)    = 0
rt_sigaction(SIGTSTP, {SIG_DFL, [TSTP], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, {0x40b6c0, [TSTP], SA_RESTORER|SA_RESTART, 0x7f0b9e3fc0b0}, 8) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, FIONBIO, [0])                  = 0
ioctl(1, FIONBIO, [0])                  = 0
close(3)                                = 0

        erik

David Lang

On Fri, 9 Aug 2013, Erik Steffl wrote:

Date: Fri, 09 Aug 2013 19:10:29 -0700
From: Erik Steffl <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] rsyslogd keeps sending data to closed connection

On 07/26/2013 04:33 PM, David Lang wrote:
On Fri, 26 Jul 2013, Erik Steffl wrote:

 While testing rsyslogd sending logs to a remote server I encountered
this scenario:

 - remote server (which is behind amazon elastic load balancer) closes
connection (the host is up but nobody listens on the port)

 - rsyslogd seems to be sending data to the closed connection

 - connection is in CLOSED_WAIT state

 strace of rsyslogd reveals (replaced IP and content of message by
XXX):

[pid 14188] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 2
[pid 14188] connect(2, {sa_family=AF_INET, sin_port=htons(5140),
sin_addr=inet_addr("XXX")}, 16) = 0
[pid 14188] recvfrom(2, 0x7f44d42e765f, 1, 66, 0, 0) = -1 EAGAIN
(Resource temporarily unavailable)
[pid 14188] sendto(2, "<135>2013-07-26T20:43:08+00:00 XXX", 679, 0,
NULL, 0) = 679

 And everybody thinks the connection is in CLOSE_WAIT state:

$ sudo lsof | grep rsyslogd | grep TCP
rsyslogd  14184           syslog    2u     IPv4             188760 0t0
TCP
ip-10-2-35-151.ec2.internal:48228->ec2-54-225-181-82.compute-1.amazonaws.com:5140

(CLOSE_WAIT)

$ netstat -a | grep 5140
tcp        1      0 XXX:48229 ELB:5140 CLOSE_WAIT

 ELB in above is name/IP of Amazon elastic load balancer, it seems
that it behaves slightly suspiciously (why does connect succeed?, why
does sendto succeed?)

 Any ideas why rsyslogd does not close the connection that is in
CLOSE_WAIT state? The connection remains in CLOSE_WAIT state even
after the remote server starts listening on 5140 port. This does not
happen everytime there is no listener on remote host but when it
happens it doesn't seem to be fixed until I restart rsyslogd.

rsyslog is closing the connection, but then it's opening the connection
again for the next message.

My guess is that the ELB is accepting the connection (which is why the
connect succeeds), and then discovering that there's no way for it to
send the connection to one of the servers that would handle the traffic,
so it turns around and closes the connection.

While it is closing the connection, rsyslog is sending data out through
that connection, because it thinks it's open. This results in some data
being lost. but since rsyslog thinks some data got through (and it
successfully opened the connection), it doesn't mark that destination as
being down, and keeps trying to send data there.

If you were to use RELP, it would properly detect that the messages are
not being delivered and no messages would be lost. This is exactly the
type of thing that triggered the creation of RELP.

 it seems that using RELP should work, will try that,

 investigating this I figured out that netcat does not suffer the same
problem for some reason. Not saying there is a problem with rsyslog
(seem load balancer problem), just wondering if anybody has any insights.

 I created a simple setup of client machine -> ELB (load balancer) ->
server machine that works like this:

client runs:

perl -e '$| = 1 ; for($i=0;$i<100000;$i++) {print localtime() . "
[$i]\n"; sleep(1);}' | nc $elb_hostname 5140

server runs:

nc -k -l $ip_address 5140

In this setup client nc recognies all errors, strace example (when the
server is down):

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET, sin_port=htons(5140),
sin_addr=inet_addr("10.151.83.76")}, 16) = -1 EINPROGRESS (Operation
now in progress)
select(4, NULL, [3], NULL, NULL)        = 1 (out [3])
getsockopt(3, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
fcntl(3, F_SETFL, O_RDWR)               = 0
close(3)                                = 0

On the other hand rsyslog is doing this (same scenario, sending data
to load balancer that has nothing on the other side):

15218 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
15218 connect(6, {sa_family=AF_INET, sin_port=htons(5140),
sin_addr=inet_addr("54.225.181.82")}, 16) = 0
15218 recvfrom(6, 0x7f009427065f, 1, 66, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
15218 sendto(6, "<135>2013-08-10T00:11:10+00:00 ...data removed...\n",
679, 0, NULL, 0) = 679
15218 gettimeofday({1376093470, 69447}, NULL) = 0
15218 gettimeofday({1376093470, 69515}, NULL) = 0
15218 futex(0xd80b94, FUTEX_WAIT_PRIVATE, 497273, NULL <unfinished ...>
15217 <... epoll_wait resumed> {{EPOLLIN, {u32=14173920,
u64=14173920}}}, 10, 4294967295) = 1
15217 recvfrom(4, "<135>Aug 10 00:11:11  ...data removed... \n", 8096,
0, {sa_family=AF_INET, sin_port=htons(57437),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 399
15217 gettimeofday({1376093471, 560833}, NULL) = 0
15217 recvfrom(4, 0xd9e370, 8096, 0, 0x7f0094a714b0, 0x7f0094a6f364) =
-1 EAGAIN (Resource temporarily unavailable)
15217 futex(0xd80b94, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xd80b90,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
15218 <... futex resumed> )             = 0
15217 <... futex resumed> )             = 1
15218 futex(0xd80dc0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15217 epoll_wait(2,  <unfinished ...>
15218 <... futex resumed> )             = 0
15218 gettimeofday({1376093471, 561248}, NULL) = 0
15218 gettimeofday({1376093471, 561367}, NULL) = 0
15218 recvfrom(6, "", 1, MSG_PEEK|MSG_DONTWAIT, NULL, NULL) = 0
15218 close(6)                          = 0

 This repeats so efectively rsyslog keeps sending messages (sendto
returns success), closing connection, opening connection etc. As far
as I can tell load balancer should not successfully connect and even
receive data...

 Any comments/insights?

    erik
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to