Hey folks,
(warning - long post; cross posted to misc but this is probably the better place for it)
I've run into the problem with default tcp window scaling and recent Linux kernals noted here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=267342
have attempted the resolution Daniel describes here:
http://www.benzedrine.cx/pf/msg05130.html
but continue to experience the problem. For example, telneting to port 80 on a Solaris 8/Apache webserver behind our OpenBSD 3.4 stable firewall looks like this from the perspective of the firewall:
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 74: 64.81.49.133.56147 > 131.106.3.253.80: S [tcp sum ok] 78526931:78526931(0) win 5840 <mss 1460,sackOK,timestamp 803421 0,nop,wscale 7> (DF) (ttl 52, id 31630)
0:90:27:8e:c3:7b 0:7:85:80:5e:4f 0800 78: 131.106.3.253.80 >
64.81.49.133.56147: S [tcp sum ok] 1464110793:1464110793(0) ack 78526932 win 24616 <nop,nop,timestamp 1186503759 803421,nop,wscale 0,nop,nop,sackOK,mss 1460> (DF) (ttl 63, id 21367)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 66: 64.81.49.133.56147 > 131.106.3.253.80: . [tcp sum ok] ack 1 win 45 <nop,nop,timestamp 803452 1186503759> (DF) (ttl 52, id 31631)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 72: 64.81.49.133.56147 > 131.106.3.253.80: P [tcp sum ok] 1:7(6) ack 1 win 45 <nop,nop,timestamp 807346 1186503759> (DF) (ttl 52, id 31632)...
The telnet session connects fine, but any input is met with silence.
Default window scaling (/proc/sys/net/ipv4/tcp_default_win_scale) is set to 7 on the source machine (Debian 3.1 "testing" 9/7/04 with 2.6.7-1 kernal package). As I understand it from a few of our members who can't access our web site or send e-mail, several Linux distributions now have window scaling set to 7 by default. Note that the source host has a wscale of 7 as expected but the ack has a wscale of 0 which makes for very poor communication. Oddly, the behavior is a little different when the source host talks to a FreeBSD 5/Apache target:
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 74: 64.81.49.133.56765 > 131.106.3.205.80: S [tcp sum ok] 4249754452:4249754452(0) win 5840 <mss 1460,sackOK,timestamp 671647 0,nop,wscale 7> (DF) (ttl 52, id 9419)
0:90:27:8e:c3:7b 0:7:85:80:5e:4f 0800 74: 131.106.3.205.80 > 64.81.49.133.56765: S [tcp sum ok] 3258274959:3258274959(0) ack 4249754453 win 65535 <mss 1460,nop,wscale 1,nop,nop,timestamp 485487780 671647> (DF) (ttl 63, id 7437)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 66: 64.81.49.133.56765 > 131.106.3.205.80: . [tcp sum ok] ack 1 win 45 <nop,nop,timestamp 671732 485487780> (DF) (ttl 52, id 9420)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 72: 64.81.49.133.56765 > 131.106.3.205.80: P [tcp sum ok] 1:7(6) ack 1 win 45 <nop,nop,timestamp 717746 485487780> (DF) (ttl 52, id 9421)...
In this case wscale is set to 1 on the ack and the index page loads, albeit *somewhat* slowly (I made the request about 9 hours ago and the page is still loading). A Linux 2.4blah / apache target behaves the same way.
Interestingly, if I run httpd on the inside interface of the firewall I'm able to get traffic without a problem:
firewall:
$ sudo tcpdump -n -i fxp0 -eee -t -v port 8080
tcpdump: listening on fxp0
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 74: 64.81.49.133.54401 > 131.106.3.252.8080: S [tcp sum ok] 2566347402:2566347402(0) win 5840 <mss 1460,sackOK,timestamp 37523760 0,nop,wscale 7> (DF) (ttl 52, id 52154)
0:90:27:8e:c3:7b 0:7:85:80:5e:4f 0800 78: 131.106.3.252.8080 > 64.81.49.133.54401: S [tcp sum ok] 1507920726:1507920726(0) ack 2566347403 win 17376 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1949518170 37523760> (DF) (ttl 64, id 18020)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 66: 64.81.49.133.54401 > 131.106.3.252.8080: . [tcp sum ok] ack 1 win 45 <nop,nop,timestamp 37523790 1949518170> (DF) (ttl 52, id 52155)
0:7:85:80:5e:4f 0:90:27:8e:c3:7b 0800 72: 64.81.49.133.54401 > 131.106.3.252.8080: P [tcp sum ok] 1:7(6) ack 1 win 45 <nop,nop,timestamp 37545824 1949518170> (DF) (ttl 52, id 52156)
0:90:27:8e:c3:7b 0:7:85:80:5e:4f 0800 380: 131.106.3.252.8080 > 64.81.49.133.54401: P 1:315(314) ack 7 win 17376 <nop,nop,timestamp 1949518214 37545824> (DF) (ttl 64, id 16598)
source host:
term8:/home/tony# /usr/sbin/tcpdump -n -i eth0 -eee -t -v host 131.106.3.252
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
00:10:dc:43:54:e9 > 00:20:78:b0:34:c1, ethertype IPv4 (0x0800), length 74: IP (tos 0x10, ttl 64, id 52154, offset 0, flags [DF], length: 60) 10.1.2.102.1126 > 131.106.3.252.8080: S [tcp sum ok] 2566347402:2566347402(0) win 5840 <mss 1460,sackOK,timestamp 37523760 0,nop,wscale 7>
00:20:78:b0:34:c1 > 00:10:dc:43:54:e9, ethertype IPv4 (0x0800), length 78: IP (tos 0x20, ttl 49, id 18020, offset 0, flags [DF], length: 64) 131.106.3.252.8080 > 10.1.2.102.1126: S [tcp sum ok] 1507920726:1507920726(0) ack 2566347403 win 17376 <mss 1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 1949518170 37523760>
00:10:dc:43:54:e9 > 00:20:78:b0:34:c1, ethertype IPv4 (0x0800), length 66: IP (tos 0x10, ttl 64, id 52155, offset 0, flags [DF], length: 52) 10.1.2.102.1126 > 131.106.3.252.8080: . [tcp sum ok] ack 1 win 45 <nop,nop,timestamp 37523790 1949518170>
00:10:dc:43:54:e9 > 00:20:78:b0:34:c1, ethertype IPv4 (0x0800), length 72: IP (tos 0x10, ttl 64, id 52156, offset 0, flags [DF], length: 58) 10.1.2.102.1126 > 131.106.3.252.8080: P [tcp sum ok] 1:7(6) ack 1 win 45 <nop,nop,timestamp 37545824 1949518170>
00:20:78:b0:34:c1 > 00:10:dc:43:54:e9, ethertype IPv4 (0x0800), length 380: IP (tos 0x20, ttl 49, id 16598, offset 0, flags [DF], length: 366) 131.106.3.252.8080 > 10.1.2.102.1126: P 1:315(314) ack 7 win 17376 <nop,nop,timestamp 1949518214 37545824>
Note that wscale is set to 0! Also note that the source host is behind NAT in these examples, but I know that folks not behind NAT have the same problem.
I suppose the pertinent pf rules would be useful (testing rule is normally commented out but was enabled for the test above, I'd rather not post our entire ruleset):
# Options: tune the behavior of pf, default values are given. set loginterface $OIF set fingerprints "/etc/pf.os"
# Normalization: reassemble fragments and resolve or reduce traffic ambiguities.
scrub in all
..
#block all packets
block in log all
..
#pass in http and https for web servers
pass in quick on $OIF proto tcp from any to <WEBSERVERS> port { 80, 443 } flags
S/SA keep state
#allow http traffic to firewall for testing
#pass in quick on $OIF proto tcp from any to 131.106.3.252 port 8080 flags S/SA
keep state
..
#pass smtp and ident to mail servers
pass in log on $OIF proto tcp from any to <MXs> port { 25, 113 } flags S/SA keep
state
..
#pass inside traffic out
pass out on $OIF inet proto { icmp, udp } from { $OIF, <OFFICE_NET> } to any keep state
pass out on $OIF inet proto tcp from { $OIF, <OFFICE_NET> } to any modulate state
Other bits:
Everything works fine with tcp window scaling disabled on the source 2.6.7 kernal host, or with tcp window scaling enabled and the default window size set to 0, and with all other operating systems.
The current pf rules were set with pfctl -f and the firewall hasn't been restarted since adding S/SA flags to the pass tcp proto rules.
Setting flags S/SA doesn't seem to change anything in the tcpdump output, but maybe I'm not reading it correctly. In other words, the connection sets up and fails the same way with or without flags S/SA set on the rule governing the connection.
So, clue?
Many Thanks,
Tony Del Porto SysAdmin, Conference Network Coordinator USENIX Association 2560 9th Street, Suite 215, Berkeley CA 94710 [EMAIL PROTECTED] | www.usenix.org | www.sage.org
