Hello again!

I wrote some days ago about a problem we have with connections from certain 
networks (usually ADSL) being suddenly terminated after anything from a few 
seconds to several minutes.

As nobody seemed to be able to help based on the information I was able to provide in my previous message quoted below, I've done some further experiments and found out that the disconnection occurs due to something state-related. Setting up rules that allow a stateless connection from one of the trouble sources reveal that no disconnection occurs without states. It's like a Duracell bunny that just keeps on going and going and going... ;)

Now, I have long ago turned adaptive state handling off globally:

        set timeout { adaptive.start 0, adaptive.end 0 }

and I'm nowhere near memory exhaustion as far as I can see:

Status: Enabled for 0 days 01:01:18           Debug: Urgent

State Table                          Total             Rate
  current entries                    25674
  searches                        43533956        12086.1/s
  inserts                           342345           95.0/s
  removals                          335190           93.1/s
Counters
  match                           26772701         7432.7/s
  bad-offset                             0            0.0/s
  fragment                               4            0.0/s
  short                                  0            0.0/s
  normalize                              5            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                         12636            3.5/s
  ip-option                              0            0.0/s
  proto-cksum                         1369            0.4/s
  state-mismatch                     13669            3.8/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

Yes, I have raised the state limit quite a bit (as this is a pretty busy 
firewall with just under 200 mbit/s sustained traffic at peak hours):

        set limit states 40000

I have tried different optimizations but it has no effect on the issue at hand.

Do note that from certain sources there are no problems (never was) so I doubt there's a general problem (like the NIC being too busy or similar global resources) but rather it must be related to something that's different between the sources that have the problem and those that do not.

I've searched the net for information on exactly what makes pf drop a state that was matched only milliseconds ago, but found nothing that made sense. Will it drop a state due to some form of 'defective' packets?

Yes I do scrub all traffic which should fix some forms of packet errors:

        scrub in all

I hope this additional information helps someone in figuring out what happens 
for our firewalls.

All help is appreciated! :)

-------- Original Message --------
Subject: Strange disconnection problem
Date: Mon, 15 Jan 2007 14:36:54 +0100
From: Per Gøtterup <[EMAIL PROTECTED]>
To: [email protected]

Help needed! :)

We are running a set of firewalls on OpenBSD 4.0 (GENERIC amd64) using
carp and pf and since upgrading from 3.5 we've begun seeing some strange
disconnections usually during http, ftp or ssh transfers, but only from
or to certain external locations (usually ADSL lines).

Dumping traffic using tcpdump we see something like this; first a few
normal transfer packages (usually there are a minute or two of these):

14:18:30.464345 217.145.48.102.2035 > 80.198.225.70.80: . [tcp sum ok]
303:303(0) ack 14845472 win 65535 (DF) (ttl 127, id 22113, len 40)
14:18:30.480462 80.198.225.70.80 > 217.145.48.102.2035: .
14845472:14846878(1406) ack 303 win 65233 (DF) (ttl 118, id 10125, len 1446)
14:18:30.481509 217.145.48.102.2035 > 80.198.225.70.80: . [tcp sum ok]
303:303(0) ack 14846878 win 65535 (DF) (ttl 127, id 22115, len 40)
14:18:30.496333 80.198.225.70.80 > 217.145.48.102.2035: P
14846878:14848162(1284) ack 303 win 65233 (DF) (ttl 118, id 10126, len 1324)

Then this happens:

tcpdump: WARNING: compensating for unaligned libpcap packets
14:18:30.496413 217.145.49.101 > 80.198.225.70: icmp: host
217.145.48.102 unreachable for 80.198.225.70.80 > 217.145.48.102.2035:
2198678154 [|tcp] (DF) (ttl 118, id 10126, len 1324) (ttl 255, id 36421,
len 56)
14:18:30.513493 80.198.225.70.80 > 217.145.48.102.2035: .
14848162:14849568(1406) ack 303 win 65233 (DF) (ttl 118, id 10128, len 1446)
14:18:30.514669 217.145.48.102.2035 > 80.198.225.70.80: . [tcp sum ok]
303:303(0) ack 14846878 win 65535 <nop,nop,sack 1 {14848162:14849568} >
(DF) (ttl 127, id 22117, len 52)
14:18:30.530807 80.198.225.70.80 > 217.145.48.102.2035: .
14849568:14850974(1406) ack 303 win 65233 (DF) (ttl 118, id 10129, len 1446)
14:18:30.531616 217.145.48.102.2035 > 80.198.225.70.80: . [tcp sum ok]
303:303(0) ack 14846878 win 65535 <nop,nop,sack 1 {14848162:14850974} >
(DF) (ttl 127, id 22118, len 52)
14:18:30.546648 80.198.225.70.80 > 217.145.48.102.2035: P
14850974:14852258(1284) ack 303 win 65233 (DF) (ttl 118, id 10130, len 1324)
14:18:30.547465 217.145.48.102.2035 > 80.198.225.70.80: . [tcp sum ok]
303:303(0) ack 14846878 win 65535 <nop,nop,sack 1 {14848162:14852258} >
(DF) (ttl 127, id 22119, len 52)

Now the transfer stalls completely (no more packets) and the client
reports a timeout a short time later.

What happens - and more important - is there anything to do about it?

Thanks a million in advance for any helpful insights! :)

--
Per Gøtterup <[EMAIL PROTECTED]> · Systems Administrator & Support
WebHotel.net · INFORCE A/S · Sydvestvej 100 · DK-2600 Glostrup · Denmark
Phone: +45 70232490 · Fax: +45 70232480 · Web: www.webhotel.net

--
Per Gøtterup <[EMAIL PROTECTED]> · Systems Administrator & Support
WebHotel.net · INFORCE A/S · Sydvestvej 100 · DK-2600 Glostrup · Denmark
Phone: +45 70232490 · Fax: +45 70232480 · Web: www.webhotel.net

Reply via email to