Greg Steuck <gne...@openbsd.org> writes:

> My router has become unstable since upgrading from 7.1-stable to
> 7.2. After several days of uptime the machine gets into a state where
> some applications (unbound & dhcpd) report ENOBUFS (No buffer space
> available). At that time the machine is pingable over all the
> interfaces, but only the upstream interface seems functional (igc0).
> The networks downstream of the router can't get much data across. I
> don't have a good characterization of this.
>
> At first I suspected this had something to do with the igc checksum
> offloading commit, so I am now running 7.2 with this reverted:
> "Implement and enable IPv4, TCP, and UDP checksum offloading for igc."

So far it appears that reverting improved stability. I had 2 crashes
last week and 0 in the last 8 days.

> I also started monitoring some counters that appeared relevant with
> this trivial loop:
>
> $ while : ; do date; netstat -s | grep err; netstat -m; netstat -ni | grep 
> '^[Ni]'; sleep 300; done | tee err-log
>
> I have some 38 hours worth of counters as of now. I observe an upward
> trend in "mbuf 2112" and "mbufs in use", I extracted the values with
>
> $ perl -ne 'print "$x,$1\n" if m/^(\d+).*mbuf 2112/; $x=$1 if /^(\d+)\smbufs 
> in use/;' err-log
>
> It starts out 610,410-ish and ends at 717,513. I have a picture for
> those visually inclined: https://photos.app.goo.gl/DZGCrJnJDohPrVyZ8

The growth is very slow, so I'm not sure it matters much. The 8 day
graph still shows a very slow ramp but it'll take a long time for
that to become a problem: https://photos.app.goo.gl/H64FRMkrfrY3hi6f7
(8 days worth of 5-minute-spaced samples)

I'm reapplying the patch keeping the same monitoring on. Hopefully
something will be visible in those stats. If not, at least we'll learn
whether the diff correlates with the failure.

Thanks
Greg

Reply via email to