Re: Dropped connections with tcp_tw_recycle=1

2009-09-22 Thread Nils Goroll
Sven,

 tcp_tw_recycle is incompatible with NAT on the server side

 ... because it will enforce the verification of TCP time stamps.
 Unless all clients behind a NAT (actually PAD/masquerading) device
 use identical timestamps (within a certain range), most of them will
 send invalid TCP timestamps so SYNs will get dropped.
 
 I've been digging a bit more. [...]

Thank you very much for your writeup regarding tcp_tw_recycle and timestamp 
verification. This is the part which I think I had already understood ...

  tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen
  when setting up outbound connections. I haven't looked at those yet.

... but this is the part which I don't have a good understanding of yet.

 The outer conditional verifies that the incoming SYN has a timestamp,
 that tcp_tw_recycle is enabled, and that the origin exists in our
 peer cache. Note that it only checks the IP of the origin. Doesn't it
 make sense to also match on port?

My understanding is that the fact that the connection is in TIME_WAIT implies 
that the source port should not be reused at this time.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-22 Thread Sven Ulland
Nils Goroll wrote:
 The outer conditional verifies that the incoming SYN has
 a timestamp, that tcp_tw_recycle is enabled, and that the origin
 exists in our peer cache. Note that it only checks the IP of the
 origin. Doesn't it make sense to also match on port?
 
 My understanding is that the fact that the connection is in
 TIME_WAIT implies that the source port should not be reused at this
 time.

Right, you're saying that the srcaddr+srcport pair of a connection in
TIME_WAIT should not be reused under this scheme (i.e. the SYN can be
dropped), and I agree. Then I don't understand why a new connection
originating from a *different* source port (although from the same
source IP) is also considered a dupe and dropped. SYN retries don't
change/increase the source port afterall. Is this a mistake in the
TCP code, or maybe in my understanding of the issue?

Sven
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-22 Thread Nils Goroll
Sven,

 Right, you're saying that the srcaddr+srcport pair of a connection in
 TIME_WAIT should not be reused under this scheme (i.e. the SYN can be
 dropped), and I agree. Then I don't understand why a new connection
 originating from a *different* source port (although from the same
 source IP) is also considered a dupe and dropped.

Are you referring to this code?

 if (tmp_opt.saw_tstamp 
 tcp_death_row.sysctl_tw_recycle 
 (dst = inet_csk_route_req(sk, req)) != NULL 
 (peer = rt_get_peer((struct rtable *)dst)) != NULL 
 peer-v4daddr == saddr) {
 if (xtime.tv_sec  peer-tcp_ts_stamp + TCP_PAWS_MSL 
 (s32)(peer-tcp_ts - req-ts_recent) 
 TCP_PAWS_WINDOW) {
 
NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED);
 dst_release(dst);
 goto drop_and_free;
 }
 }

Again, I cannot tell you what the intention of the implementors might have 
been, 
but my interpretation is that they wanted to implement time stamp checking as a 
(from the security standpoint positive) side effect of tw_recycle.

I haven't thought about how (or if) the tw_recycle code could be improved, 
because I believe the benefits of TCP state reuse is overrated and the 
disadvantages overweight the advantages. Also, my work focuses on OSes which 
don't have this issue ;-)

Thanks, Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-21 Thread Nils Goroll
Hi Michael and all,

 tcp_tw_recycle is incompatible with NAT on the server side

 ... because it will enforce the verification of TCP time stamps. 
 Unless all
 clients behind a NAT (actually PAD/masquerading) device use identical 
 timestamps
 (within a certain range), most of them will send invalid TCP 
 timestamps so SYNs
 will get dropped.
 
 Since you seem pretty knowledgeable on the subject, can you please 
 explain the difference between tcp_tw_reuse and tcp_tw_recycle?

I think I have understood the reason why tcp_tw_recycle does not work with NAT 
connections, but I must say I haven't fully devoured the linux TCP 
implementation to explain to you the design decisions regarding these two 
options.

The very basic idea is to re-use tcp connections in TIME_WAIT state, saving the 
overhead of destroying and recreating TCP state. I remember that at one point I 
had thought to have understood the difference, but I can't recall at the moment.

In short: I can tell you that you *must not* use tcp_tw_recycle for any machine 
talking to machines behind masquerading firewalls (iow, only use it inside 
isolated networks). But I cannot tell you what exactly it is supposed to do and 
what the difference is to tcp_tw_reuse. If anyone finds out, please let me know 
as well!

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-21 Thread Sven Ulland
Nils Goroll wrote:
 tcp_tw_recycle is incompatible with NAT on the server side
 
 ... because it will enforce the verification of TCP time stamps.
 Unless all clients behind a NAT (actually PAD/masquerading) device
 use identical timestamps (within a certain range), most of them will
 send invalid TCP timestamps so SYNs will get dropped.

I've been digging a bit more. The drops happen because PAWS thinks
they are old duplicate segments from earlier incarnations of the
connection.

A new incoming connection request will eventually call
tcp_ipv4.c:tcp_v4_conn_request(), where we find the following code
that ends up dropping some SYNs if recycling is enabled:

if (tmp_opt.saw_tstamp 
 tcp_death_row.sysctl_tw_recycle 
 (dst = inet_csk_route_req(sk, req)) != NULL 
 (peer = rt_get_peer((struct rtable *)dst)) != NULL 
 peer-v4daddr == saddr) {
 if (get_seconds()  peer-tcp_ts_stamp + TCP_PAWS_MSL 
 (s32)(peer-tcp_ts - req-ts_recent)  TCP_PAWS_WINDOW) {
 NET_INC_STATS_BH(sock_net(sk), 
LINUX_MIB_PAWSPASSIVEREJECTED);
 goto drop_and_release;
 }
}

The outer conditional verifies that the incoming SYN has a timestamp,
that tcp_tw_recycle is enabled, and that the origin exists in our
peer cache. Note that it only checks the IP of the origin. Doesn't it
make sense to also match on port?

The inner conditional tests two things: First, that the peer's last
seen timestamp has not expired (it expires in 60 ticks). Next, that
the new incoming timestamp [req-ts_recent] is at least one tick
[TCP_PAWS_WINDOW] *before* the last seen timestamp from the peer
[peer-tcp_ts] (i.e. that it's an old duplicate).

(Also, you can verify if you get drops by checking the PAWSPassive
value in /proc/net/netstat.)

Here's the origin of the code, appx B.2 (b) in VJ et al's RFC 1323:

An additional mechanism could be added to the TCP, a per-host cache of
the last timestamp received from any connection [peer-tcp_ts]. This
value [peer-tcp_ts] could then be used in the PAWS mechanism to
reject old duplicate segments [req] from earlier incarnations of the
connection, if the timestamp clock can be guaranteed to have ticked at
least once [TCP_PAWS_WINDOW] since the old connection was open.
 -- http://tools.ietf.org/html/rfc1323#page-29

I'm wondering why the source port is not taken into consideration
here. A previous incarnation of the connection would surely have the
same source port? So if a new incoming connection has a different
source port, it should not be a candidate for rejection.


tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen
when setting up outbound connections. I haven't looked at those yet.

Sven
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-20 Thread Nick Loman
Hi Sven,

I don't know the basis precise for it, but I can vouch for the fact that 
tcp_tw_recycle is incompatible with NAT on the server side. I would 
guess it is because the NAT gateway keeps a connection tracking list and 
is unhappy that the webserver is trying to reuse the same ip:port hash 
whilst it is registered in TIME_WAIT mode.

There was a discussion of this previously:
http://projects.linpro.no/pipermail/varnish-misc/2009-April/002764.html

As you say tw_reuse works OK with NAT.

Cheers,

Nick.


Sven Ulland wrote:
 I was recently debugging an issue where several clients experienced
 sporadic problems connecting to a website cached by varnish. Every now
 and then (say, something like every 20-50th TCP connection) would time
 out, or sometimes take a few SYNs before being accepted.

 Here's a typical example. It's observed at the spot marked 'X' in this
 network structure from the client network's perspective:

[clients] - [NAT gateway] - [bridge firewall]X - [Internet]

   0.00 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283647429 
 TSER=0 WS=6
   2.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283648179 
 TSER=0 WS=6
   8.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283649679 
 TSER=0 WS=6
 20.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283652679 TSER=0 
 WS=6
 44.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283658679 TSER=0 
 WS=6
 93.00 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283670679 TSER=0 
 WS=6
 93.00 varni-extip natgw-extip TCP http  4292 [SYN, ACK] TSV=2342207123 
 TSER=283670679

 Note: The NAT gateway didn't do port translation here. Also, the
 timestamp values were not touched by the NAT gateway. The varnish node
 is behind LVS-TUN, but the LVS was not the culprit.

 After troubleshooting with the website owner, tcpdumping at various
 points on both sides, it was clear that the packets were reaching the
 varnish node, but except the last SYN, they were all dropped. This
 turned out to be because the varnish node had the tcp_tw_recycle sysctl
 enabled. Switching it off fixed the problem.

 The performance page on the varnish wiki features recommends Linux
 sysctl settings, including enabling tcp_tw_recycle, since april 2008.
 The recycle setting was removed from that page recently, but I would
 think there are a lot of installations around the world that have it
 enabled.

 I tried to figure out exactly how the recycling mechanism works, but the
 code is too complex to figure out without time or kernel network
 experience. Recycling was introduced by David Miller in 2.3.15, ref
 URL:http://lxr.linux.no/#linux-old+v2.3.15/net/ipv4/tcp_ipv4.c#L324
 and e.g. URL:http://lxr.linux.no/#linux+v2.6.31/net/ipv4/tcp_ipv4.c#L1255.
 Do anyone have a good grasp on how it works, its connection to the RFC
 1323 PAWS mechanism, and its claimed incompatibility with NAT (ref
 URL:http://lkml.org/lkml/2008/11/15/83)?

 When observing the same issue previously (dropped SYNs), I ditched
 tw_recycle in favour of tcp_tw_reuse, which doesn't seem to cause any
 problems (this was on a normal Apache system). It too is severely
 underdocumented, so I was hoping to shed some light on them both, and
 the exact circumstances where they are suitable for use.

 Sven
 ___
 varnish-misc mailing list
 varnish-misc@projects.linpro.no
 http://projects.linpro.no/mailman/listinfo/varnish-misc

 __
 This email has been scanned by the MessageLabs Email Security System.
 For more information please visit http://www.messagelabs.com/email 
 __
   

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-20 Thread Michael S. Fischer
On Sep 20, 2009, at 6:20 AM, Nils Goroll wrote:

 tcp_tw_recycle is incompatible with NAT on the server side

 ... because it will enforce the verification of TCP time stamps.  
 Unless all
 clients behind a NAT (actually PAD/masquerading) device use  
 identical timestamps
 (within a certain range), most of them will send invalid TCP  
 timestamps so SYNs
 will get dropped.

Since you seem pretty knowledgeable on the subject, can you please  
explain the difference between tcp_tw_reuse and tcp_tw_recycle?

Thanks,

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Dropped connections with tcp_tw_recycle=1

2009-09-19 Thread Sven Ulland
I was recently debugging an issue where several clients experienced
sporadic problems connecting to a website cached by varnish. Every now
and then (say, something like every 20-50th TCP connection) would time
out, or sometimes take a few SYNs before being accepted.

Here's a typical example. It's observed at the spot marked 'X' in this
network structure from the client network's perspective:

   [clients] - [NAT gateway] - [bridge firewall]X - [Internet]

  0.00 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283647429 
TSER=0 WS=6
  2.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283648179 
TSER=0 WS=6
  8.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283649679 
TSER=0 WS=6
20.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283652679 TSER=0 
WS=6
44.99 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283658679 TSER=0 
WS=6
93.00 natgw-extip varni-extip TCP 4292  http [SYN] TSV=283670679 TSER=0 
WS=6
93.00 varni-extip natgw-extip TCP http  4292 [SYN, ACK] TSV=2342207123 
TSER=283670679

Note: The NAT gateway didn't do port translation here. Also, the
timestamp values were not touched by the NAT gateway. The varnish node
is behind LVS-TUN, but the LVS was not the culprit.

After troubleshooting with the website owner, tcpdumping at various
points on both sides, it was clear that the packets were reaching the
varnish node, but except the last SYN, they were all dropped. This
turned out to be because the varnish node had the tcp_tw_recycle sysctl
enabled. Switching it off fixed the problem.

The performance page on the varnish wiki features recommends Linux
sysctl settings, including enabling tcp_tw_recycle, since april 2008.
The recycle setting was removed from that page recently, but I would
think there are a lot of installations around the world that have it
enabled.

I tried to figure out exactly how the recycling mechanism works, but the
code is too complex to figure out without time or kernel network
experience. Recycling was introduced by David Miller in 2.3.15, ref
URL:http://lxr.linux.no/#linux-old+v2.3.15/net/ipv4/tcp_ipv4.c#L324
and e.g. URL:http://lxr.linux.no/#linux+v2.6.31/net/ipv4/tcp_ipv4.c#L1255.
Do anyone have a good grasp on how it works, its connection to the RFC
1323 PAWS mechanism, and its claimed incompatibility with NAT (ref
URL:http://lkml.org/lkml/2008/11/15/83)?

When observing the same issue previously (dropped SYNs), I ditched
tw_recycle in favour of tcp_tw_reuse, which doesn't seem to cause any
problems (this was on a normal Apache system). It too is severely
underdocumented, so I was hoping to shed some light on them both, and
the exact circumstances where they are suitable for use.

Sven
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc