Okay, so I've run an strace on the collector process during a buffer drop event. I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.
While I've had already increased rmem_max, it would appear this is not being observed by the kernel. rmem_default is set to 124kb, which would explain the above read maxing out just slightly beyond this (presuming a ring buffer filling up behind the read). I'm going to try increasing rmem_default and see if it has any positive effect.. (and then investigate why the kernel doesn't want to consider rmem_max).. On Tue, Apr 18, 2017 at 8:05 AM Tim Kane <tim.k...@gmail.com> wrote: > Hi all, > > I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far > I've not been able to resolve. > > The drops are originating from postgres processes, and from what I know - > the only UDP traffic generated by postgres should be consumed by the > statistics collector - but for whatever reason, it's failing to read the > packets quickly enough. > > Interestingly, I'm seeing these drops occur even when the system is idle.. > but every 15 minutes or so (not consistently enough to isolate any > particular activity) we'll see in the order of ~90 packets dropped at a > time. > > I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on the > same hardware) > > > If it's relevant.. there are two instances of postgres running (and > consequently, 2 instances of the stats collector process) though 1 of those > instances is most definitely idle for most of the day. > > In an effort to try to resolve the problem, I've increased (x2) the UDP > recv buffer sizes on the host - but it seems to have had no effect. > > cat /proc/sys/net/core/rmem_max > 1677216 > > The following parameters are configured > > track_activities on > track_counts on > track_functions none > track_io_timing off > > > There are approximately 80-100 connections at any given time. > > It seems that the issue started a few weeks ago, around the time of a > reboot on the given host... but it's difficult to know what (if anything) > has changed, or why :-/ > > > Incidentally... the documentation doesn't seem to have any mention of UDP > whatsoever. I'm going to use this as an opportunity to dive into the > source - but perhaps it's worth improving the documentation around this? > > My next step is to try disabling track_activities and track_counts to see > if they improve matters any, but I wouldn't expect these to generate enough > data to flood the UDP buffers :-/ > > Any ideas? > > > >