Okay, so I've run an strace on the collector process during a buffer drop
event.
I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.

While I've had already increased rmem_max, it would appear this is not
being observed by the kernel.
rmem_default is set to 124kb, which would explain the above read maxing out
just slightly beyond this (presuming a ring buffer filling up behind the
read).

I'm going to try increasing rmem_default and see if it has any positive
effect.. (and then investigate why the kernel doesn't want to consider
rmem_max)..





On Tue, Apr 18, 2017 at 8:05 AM Tim Kane <tim.k...@gmail.com> wrote:

> Hi all,
>
> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far
> I've not been able to resolve.
>
> The drops are originating from postgres processes, and from what I know -
> the only UDP traffic generated by postgres should be consumed by the
> statistics collector - but for whatever reason, it's failing to read the
> packets quickly enough.
>
> Interestingly, I'm seeing these drops occur even when the system is idle..
>  but every 15 minutes or so (not consistently enough to isolate any
> particular activity) we'll see in the order of ~90 packets dropped at a
> time.
>
> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on the
> same hardware)
>
>
> If it's relevant..  there are two instances of postgres running (and
> consequently, 2 instances of the stats collector process) though 1 of those
> instances is most definitely idle for most of the day.
>
> In an effort to try to resolve the problem, I've increased (x2) the UDP
> recv buffer sizes on the host - but it seems to have had no effect.
>
> cat /proc/sys/net/core/rmem_max
> 1677216
>
> The following parameters are configured
>
> track_activities on
> track_counts     on
> track_functions  none
> track_io_timing  off
>
>
> There are approximately 80-100 connections at any given time.
>
> It seems that the issue started a few weeks ago, around the time of a
> reboot on the given host... but it's difficult to know what (if anything)
> has changed, or why :-/
>
>
> Incidentally... the documentation doesn't seem to have any mention of UDP
> whatsoever.  I'm going to use this as an opportunity to dive into the
> source - but perhaps it's worth improving the documentation around this?
>
> My next step is to try disabling track_activities and track_counts to see
> if they improve matters any, but I wouldn't expect these to generate enough
> data to flood the UDP buffers :-/
>
> Any ideas?
>
>
>
>

Reply via email to