Hello,
i have recently started to experiment with a pf bridge sitting
between two switch routers (enterasys SSR-8000) in a production
environment. machine is Intel based with 3 ports (bge0 and nge0
for the bridge, fxp0 for management), OS is 3.3 (CVS: April 29)
with a modified kernel (no IPv6, no sl, ppp, tun, vlan, gre and
gif interfaces).
The rule set is very simple, only scrub and statefull passing in:
if_win="bge0" # externeal
if_tub="nge0" # internal
set limit { states 500000, frags 10000 }
scrub in all fragment reassemble
pass out quick on $if_win
pass out quick on $if_tub
pass in on $if_win all keep state
pass in on $if_tub all keep state
This setup works fine for 5 to 10 days with a load between 5k and 20k
packets/s in each direction. CPU is at 20-25%. a sample pfctl -s info
looks like:
fw1# pfctl -s info
Status: Enabled for 13 days 23:48:57 Debug: None
Interface Stats for bge0 IPv4 IPv6
Bytes In 1870859445780 0
Bytes Out 3338578874459 0
Packets In
Passed 4249363651 0
Blocked 1029317 0
Packets Out
Passed 4756760424 0
Blocked 227899 0
State Table Total Rate
current entries 61993
searches 18015350769 14901.8/s
inserts 442327690 365.9/s
removals 442265697 365.8/s
Counters
match 1039100805 859.5/s
bad-offset 0 0.0/s
fragment 73725 0.1/s
short 12511 0.0/s
normalize 39398 0.0/s
memory 0 0.0/s
BUT: there have been 3 incidents so far:
2 times there was a layer-2 table overflow in the external router,
saying that the 128k MAC table was overflowing (on a link with two
MAC addresses!). Those overflow situations lasted for about 8 minutes.
Eventually, everything was fine again. While at the first time i was
thinking about a special kind of attack, the second incident happened
when i was able to look into things. Interestingly, the ARP table
on the external router (the one with the overflow condition) had
entries like this (for the interface with the bridge):
45:00:00:3a:b6:28
45:00:00:41:b6:52 ...
This looks like the first bytes of the IP header.
Any ideas? Forgotten to reset the pointer to the start of L2 again?
But why only rarely and then, why for so many packets? Or is it not
pf at all? Of course, without the pf bridge in between the 2 routers,
there was never such a overflow condition in months of operation.
There was also a kernel crash:
kernel: page fault trap code=0
Stopped at _pf_normalize_ip+0x43c:testb $0x4,0x21(%ecx)
i have copied registers on paper and made a dump, if anyone is
interested.
Thanks for your attention and - hopefully - your help.
Sicerely, Dieter Kasielke
---
Dieter Kasielke, ZRZ (Zentraleinrichtung Rechenzentrum), Sekr.: EN 50,
Technische Universitaet Berlin, Einsteinufer 17, D-10587 Berlin, GERMANY.
email: [EMAIL PROTECTED], phone: +49 30 314 - 23733, fax: - 21060