On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > In the meantime, we've had *lots* of buildfarm failures in the > > added pg_stat_all_tables query, which indicate that indeed the > > stats collector mechanism isn't terribly reliable. But that > > doesn't directly prove anything about the original problem, > > since the planner doesn't look at stats collector data. > > I noticed that if you look at the list of failures of this type, there > are often pairs of animals belonging to Andres that failed at the same > time. I wonder if he might be running a bunch of animals on one > kernel, and need to increase net.core.rmem_max and > net.core.rmem_default (or maybe the write side variants, or both, or > something like that).
In further support of that theory, here are the counts of 'stats' failures (excluding bogus reports due to crashes) for the past 90 days: owner | animal | count -------------------------+--------------+------- andres-AT-anarazel.de | desmoxytes | 5 andres-AT-anarazel.de | dragonet | 9 andres-AT-anarazel.de | flaviventris | 1 andres-AT-anarazel.de | idiacanthus | 5 andres-AT-anarazel.de | komodoensis | 11 andres-AT-anarazel.de | pogona | 1 andres-AT-anarazel.de | serinus | 3 andrew-AT-dunslane.net | lorikeet | 1 buildfarm-AT-coelho.net | moonjelly | 1 buildfarm-AT-coelho.net | seawasp | 17 clarenceho-AT-gmail.com | mayfly | 2 Andres's animals report the same hostname and run at the same time, so it'd be interesting to know what net.core.rmem_max is set to and whether these problems go away if it's cranked up 10x higher or something. In a quick test I can see that make installcheck is capable of sending a *lot* of 936 byte messages in the same millisecond. -- Thomas Munro https://enterprisedb.com