Re: How to analyse excessive PF states?
Le Sat, 22 Oct 2016 18:12:37 +0200, Federico Giannici a écrit : > We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps > of traffic. > > I noticed that from a few weeks the number of states is increased > from around 250.000 to almost 2 millions (no change in PF config)! > > At the same time the firewall started loosing a few packets (around > 1-2%, with peeks of 4%). Maybe this is due to too many states to > handle? Hard to tell for the number of states but you have some PF congestions, which is bad. Did you try to augment the sysctl net.inet.ip.ifq.maxlen ? In my previous setup that helped a bit against congestion (net.inet.ip.ifq.maxlen=2048). Regards,
Re: How to analyse excessive PF states?
On 2016-10-22, Federico Giannici wrote: > We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of > traffic. > > I noticed that from a few weeks the number of states is increased from > around 250.000 to almost 2 millions (no change in PF config)! > > At the same time the firewall started loosing a few packets (around > 1-2%, with peeks of 4%). Maybe this is due to too many states to handle? > > How can we find what's happening and creates all these states? > How can we analyse almost 2 millions states to find the culprit? > > Here it is the current output of "pfctl -s info": I think I would start by monitoring "tcpdump -nipfsync0 -s9000" (maybe writing to a file and reading on another machine). My first guess would be some udp ddos-related traffic (dns, snmp, sip, ntp) or possibly synflood. Depending on what it is, reducing state timeouts on that traffic might be reasonable.
Re: How to analyse excessive PF states?
On 10/22/16 18:12, Federico Giannici wrote: > We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of > traffic. > > I noticed that from a few weeks the number of states is increased from > around 250.000 to almost 2 millions (no change in PF config)! > > At the same time the firewall started loosing a few packets (around > 1-2%, with peeks of 4%). Maybe this is due to too many states to handle? > > How can we find what's happening and creates all these states? > How can we analyse almost 2 millions states to find the culprit? The exact answers depend a great deal on what monitoring you have in place already. At the very least studying the output of pfctl -ss (massaged via some scriptery if needed) will give some clues. Better if you have something that keeps track of connections and states over time (netflow export via pflow comes to mind). The packet loss could conceivable by a side effect of the number of states going into the territory where timeouts are scaled down (exceeding 60% of state table limit IIRC). - P -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/ "Remember to set the evil bit on all malicious network traffic" delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
How to analyse excessive PF states?
We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of traffic. I noticed that from a few weeks the number of states is increased from around 250.000 to almost 2 millions (no change in PF config)! At the same time the firewall started loosing a few packets (around 1-2%, with peeks of 4%). Maybe this is due to too many states to handle? How can we find what's happening and creates all these states? How can we analyse almost 2 millions states to find the culprit? Here it is the current output of "pfctl -s info": Status: Enabled for 13 days 23:14:21 Debug: err State Table Total Rate current entries 1706364 searches354572035074 293796.9/s inserts 1397321002311578.1/s removals 1397150365911576.7/s Counters match1421898589311781.8/s bad-offset 00.0/s fragment 600570.0/s short 648250.1/s normalize 5744690.5/s memory 00.0/s bad-timestamp 00.0/s congestion 47116053.9/s ip-option5340.0/s proto-cksum00.0/s state-mismatch 4550.0/s state-insert65980.0/s state-limit00.0/s src-limit 00.0/s synproxy 00.0/s translate 00.0/s no-route 00.0/s Thank you for any suggestion.