On Thu, Nov 17, 2005 at 04:52:40PM -0500, Jon Hart wrote: > Bingo. There are entries in the logs when this condition happens but it > is not entirely clear what the problem aside from the fact that it is > a "BAD STATE": > > Nov 17 21:44:48 fw-1 /bsd: pf: BAD state: TCP 10.7.0.112:12345 > 10.7.0.112:12345 10.8.0.112:59635 [lo=3722728956 high=3722735388 > win=6432 modulator=4006337120 wscale=0] [lo=3737716700 > high=3737723132 win=6432 modulator=3433376110 wscale=0] 9:9 > S seq=3723083242 ack=3737716700 len=0 ackskew=0 pkts=5:5 dir=in,fwd > Nov 17 21:44:48 fw-1 /bsd: pf: State failure on: 1 | 5
The address/port pairs are probably clear (there's three pairs, two are equal unless the state involves translation). What you see is the existing state pf has and the packet that was associated with it (based on the source/destination addresses/ports), but failed the sequence number checks. The square brackets [] contain the sequence number windows the state allows. The digits 1 and 5 in the last part indicate which window rules were violated: the packet's seq=3723083242 is higher than the upper limit high=3722735388. That's why the packet is blocked. Now, the theory is that the client is reusing the source port 59635 before the time-wait of the previous connection (which the state we see represents) is over. The 'S' part means the blocked packet was a SYN. The '9:9' part means the FINs where exchanged and ACKed in both directions, so the connection was closed normally (and no RSTs were sent). 'pkts=5:5' means that the prior connection consisted of only 5 packets each in both directions. This all makes sense. Assuming you're fetching a tiny document from the web server in a fast loop, the client will run out of random source ports. It's probably honouring 2MSL up to the point where it simply has no choice (other than stalling further connect(2) calls), until ports free up. I think the real solution in this case is to re-think the application protocol. If the application re-connects to the server at this rate (like 32,000 connections per minute), it's wasting a lot of network bandwidth (for connection establishment and tear-down) and accumulating a lot of latency. It would be much smarter to use one persistent connection and pass multiple transactions over that. Maybe SOAP supports that (if not, is it authenticating 32,000 times per minute, too? ;) If you want to adjust pf so it will expire the states earlier, you can lower the tcp.closed timeout value (from the default 90s to 1s). Expired states are only removed in intervals (default is 10s, adjustable), so if you lower a timeout to < 10s, you probably also want to lower the interval accordingly (that may increase CPU load if you have many states). The reason we keep a state in FIN_WAIT or TIME_WAIT is that there might be spurious packets arriving late (like packets that travelled through slower alternative paths across the network). By keeping the state entry, those are associated with the state and don't cause pflog logging (they'd usually not have a SYN flag and would get blocked and logged according to your policy). So you'll likely not break anything by lowering the timeout values, but you might be getting some more packets logged as ordinary blocks. Daniel
