Hello,

i have recently started to experiment with a pf bridge sitting
between two switch routers (enterasys SSR-8000) in a production
environment. machine is Intel based with 3 ports (bge0 and nge0
for the bridge, fxp0 for management), OS is 3.3 (CVS: April 29)
with a modified kernel (no IPv6, no sl, ppp, tun, vlan, gre and 
gif interfaces).

The rule set is very simple, only scrub and statefull passing in:

if_win="bge0"   # externeal
if_tub="nge0"   # internal
set limit { states 500000, frags 10000 }
scrub in all fragment reassemble
pass out quick on $if_win
pass out quick on $if_tub
pass in on $if_win all keep state
pass in on $if_tub all keep state

This setup works fine for 5 to 10 days with a load between 5k and 20k
packets/s in each direction. CPU is at 20-25%. a sample pfctl -s info
looks like:
fw1# pfctl -s info
Status: Enabled for 13 days 23:48:57            Debug: None

Interface Stats for bge0              IPv4             IPv6
  Bytes In                   1870859445780                0
  Bytes Out                  3338578874459                0
  Packets In
    Passed                      4249363651                0
    Blocked                        1029317                0
  Packets Out
    Passed                      4756760424                0
    Blocked                         227899                0

State Table                          Total             Rate
  current entries                    61993               
  searches                     18015350769        14901.8/s
  inserts                        442327690          365.9/s
  removals                       442265697          365.8/s
Counters
  match                         1039100805          859.5/s
  bad-offset                             0            0.0/s
  fragment                           73725            0.1/s
  short                              12511            0.0/s
  normalize                          39398            0.0/s
  memory                                 0            0.0/s

BUT: there have been 3 incidents so far:

2 times there was a layer-2 table overflow in the external router,
saying that the 128k MAC table was overflowing (on a link with two
MAC addresses!). Those overflow situations lasted for about 8 minutes.
Eventually, everything was fine again. While at the first time i was
thinking about a special kind of attack, the second incident happened
when i was able to look into things. Interestingly, the ARP table
on the external router (the one  with the overflow condition) had 
entries like this (for the interface with the bridge):

45:00:00:3a:b6:28
45:00:00:41:b6:52 ...

This looks like the first bytes of the IP header.

Any ideas? Forgotten to reset the pointer to the start of L2 again?
But why only rarely and then, why for so many packets? Or is it not
pf at all? Of course, without the pf bridge in between the 2 routers,
there was  never such a overflow condition in months of operation.

There was also a kernel crash: 

kernel: page fault trap  code=0
Stopped at  _pf_normalize_ip+0x43c:testb $0x4,0x21(%ecx)

i have copied registers on paper and made a dump, if anyone is 
interested.

Thanks for your attention and - hopefully - your help.

Sicerely, Dieter Kasielke

---
Dieter Kasielke, ZRZ (Zentraleinrichtung Rechenzentrum), Sekr.: EN 50,
Technische Universitaet Berlin, Einsteinufer 17, D-10587 Berlin, GERMANY.
email: [EMAIL PROTECTED], phone: +49 30 314 - 23733, fax: - 21060



Reply via email to