We've been experiencing something similar here for the last couple of years. We run mostly windows boxes on backend, so we don't have iptables running on backends. For example, for one of our services we had 400K request today, with 350 of such "Connection Time Out". In our network configuration, pound server sits on DMZ, and web servers are on internal network, with Cisco firewall. I believe something between pound-FW-backend is losing the packets, but we've never been able to figure out why or what is causing it.

One thing I'm pretty certain of, its related to the number of requests being made or load. For example, we have a very similar service (to the one listed above) going to a different backend (same setup/config), which received 90K requests, with only 27 time outs. In our case, its also not just linux->windows issue. We have one web server running apache on a linux box, 15K requests, with 8 times outs


David Clark wrote:
Joe,

That's true, the firewall settings do allow only new connections.
Someone else had suggested that the SYN state on the dropped packets
might be INVALID, but we hadn't been able to figure out why that might
be.  Your explanation of the timeouts does make sense.

I have the same values on both the pound and web-servers:

net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

I don't see anything there that jumps out at me as needing to be
changed, do you?  I've never been able to pin down the exact request
that was getting blocked, so I can't compute what the delay is between
the original request and when the first packet gets dropped on the web
server.

-David Clark
 CTO, AvantLink.com
 [email protected]

Joe Gooch wrote:
It looks to me like you're only allowing *new* connections to port 80.  Which 
means if the state in conntrack times out, the conntrack entry will be removed, 
and a once existing connection now will be denied.

Check your timeouts in /proc/sys/net/ipv4/netfilter, in the 
ip_conntrack_tcp_timeout_* files.

Joe Gooch
K12 Systems, Inc.
-----Original Message-----
From: David Clark [mailto:[email protected]] Sent: Friday, November 13, 2009 1:23 PM
To: [email protected]
Subject: [Pound Mailing List] Intermittent iptables filter errors from pound requests

Hi,

Recently I've noticed some issues in the pound logs where a connection
to the back-end server timed out, e.g.:

Nov  4 03:13:11 balance1 pound: backend 166.70.134.194:80 connect:
Connection timed out
Nov  4 03:13:16 balance1 pound: BackEnd 166.70.134.194:80 resurrect

After looking into it a bit, it seems that once in a while (say ~200
times per day out of 2,000,000+ requests) iptables on the back-end
server blocks a packet from the pound server. I say this because I can
see entries like the following in the logs on the back-end web servers:

Nov  4 03:10:11 wwwut3 [1280662.309643] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe011 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

Nov  4 03:10:14 wwwut3 [1280665.307411] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe012 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

Nov  4 03:10:20 wwwut3 [1280671.307415] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe013 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

Nov  4 03:10:32 wwwut3 [1280683.307406] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe014 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

Nov  4 03:10:56 wwwut3 [1280707.307406] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe015 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

Nov  4 03:11:44 wwwut3 [1280755.307410] RULE 5 -- DENY IN=h0 OUT> 
SRC6.70.134.196 DST6.70.134.194 LEN` TOS=0x00 PREC=0x00 TTLd
IDe016 DF PROTO=TCP SPT3942 DPT€ WINDOWX40 RES=0x00 SYN URGP=0

which, to my reading, is blocking traffic that would normally be allowed
by the firewall:

iptables --list --numeric
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state
RELATED,ESTABLISHED
...
Cid43519296.0 tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
state NEW
...
RULE_5     all  --  0.0.0.0/0            0.0.0.0/0
...
Chain Cid43519296.0 (1 references)
target     prot opt source               destination
ACCEPT     all  --  166.70.134.196       0.0.0.0/0
ACCEPT     all  --  166.70.134.197       0.0.0.0/0
...
Chain RULE_5 (3 references)
target     prot opt source               destination
LOG all -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0
level 6 prefix `RULE 5 -- DENY '
DROP       all  --  0.0.0.0/0            0.0.0.0/0

There's always a sequence like that, too - an initial packet gets
blocked, followed by another 3 seconds later, then 6 seconds, etc.  I
believe the sequence ends where it does because I have a 180 second
timeout in pound to the back-end servers.

In an effort to troubleshoot this situation, I switched to a different
machine running pound, with no effect (both machines use a 2.6.24
kernel).  I was previously using pound 2.3.2 but earlier this week
upgraded to 2.4.5, again with no effect.  The back-end web servers are
running kernel versions 2.6.29 to 2.6.30 and all seem affected more or
less equally.  None of the servers involved are anywhere near capacity
(CPU usage below 10%, memory usage below 33%).

I would <love> to know what I could do to identify the reason behind
these errors, and even better, eliminate them.  Any help or
investigative steps to take would be <much> appreciated.

-David Clark

--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.




--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

Reply via email to