Re: How to analyse excessive PF states?

2016-10-24 Thread Patrick Lamaiziere
Le Sat, 22 Oct 2016 18:12:37 +0200,
Federico Giannici  a écrit :

> We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps
> of traffic.
> 
> I noticed that from a few weeks the number of states is increased
> from around 250.000 to almost 2 millions (no change in PF config)!
> 
> At the same time the firewall started loosing a few packets (around 
> 1-2%, with peeks of 4%). Maybe this is due to too many states to
> handle?

Hard to tell for the number of states but you have some PF congestions,
which is bad.

Did you try to augment the sysctl net.inet.ip.ifq.maxlen ?
In my previous setup that helped a bit against congestion
(net.inet.ip.ifq.maxlen=2048).

Regards,



Re: How to analyse excessive PF states?

2016-10-23 Thread Stuart Henderson
On 2016-10-22, Federico Giannici  wrote:
> We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of 
> traffic.
>
> I noticed that from a few weeks the number of states is increased from 
> around 250.000 to almost 2 millions (no change in PF config)!
>
> At the same time the firewall started loosing a few packets (around 
> 1-2%, with peeks of 4%). Maybe this is due to too many states to handle?
>
> How can we find what's happening and creates all these states?
> How can we analyse almost 2 millions states to find the culprit?
>
> Here it is the current output of "pfctl -s info":

I think I would start by monitoring "tcpdump -nipfsync0 -s9000" (maybe writing
to a file and reading on another machine).

My first guess would be some udp ddos-related traffic (dns, snmp, sip, ntp)
or possibly synflood. Depending on what it is, reducing state timeouts on that
traffic might be reasonable.



Re: How to analyse excessive PF states?

2016-10-22 Thread Peter N. M. Hansteen
On 10/22/16 18:12, Federico Giannici wrote:
> We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of
> traffic.
> 
> I noticed that from a few weeks the number of states is increased from
> around 250.000 to almost 2 millions (no change in PF config)!
> 
> At the same time the firewall started loosing a few packets (around
> 1-2%, with peeks of 4%). Maybe this is due to too many states to handle?
> 
> How can we find what's happening and creates all these states?
> How can we analyse almost 2 millions states to find the culprit?

The exact answers depend a great deal on what monitoring you have in
place already. At the very least studying the output of pfctl -ss
(massaged via some scriptery if needed) will give some clues. Better if
you have something that keeps track of connections and states over time
(netflow export via pflow comes to mind).

The packet loss could conceivable by a side effect of the number of
states going into the territory where timeouts are scaled down
(exceeding 60% of state table limit IIRC).

- P

-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.



How to analyse excessive PF states?

2016-10-22 Thread Federico Giannici
We have a firewall with OpenBSD 6.0 amd64 that handles about 1.5 Gbps of 
traffic.


I noticed that from a few weeks the number of states is increased from 
around 250.000 to almost 2 millions (no change in PF config)!


At the same time the firewall started loosing a few packets (around 
1-2%, with peeks of 4%). Maybe this is due to too many states to handle?


How can we find what's happening and creates all these states?
How can we analyse almost 2 millions states to find the culprit?

Here it is the current output of "pfctl -s info":
Status: Enabled for 13 days 23:14:21 Debug: err

State Table  Total Rate
  current entries  1706364
  searches354572035074   293796.9/s
  inserts  1397321002311578.1/s
  removals 1397150365911576.7/s
Counters
  match1421898589311781.8/s
  bad-offset 00.0/s
  fragment   600570.0/s
  short  648250.1/s
  normalize 5744690.5/s
  memory 00.0/s
  bad-timestamp  00.0/s
  congestion   47116053.9/s
  ip-option5340.0/s
  proto-cksum00.0/s
  state-mismatch   4550.0/s
  state-insert65980.0/s
  state-limit00.0/s
  src-limit  00.0/s
  synproxy   00.0/s
  translate  00.0/s
  no-route   00.0/s


Thank you for any suggestion.