Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

Tom Lane Tue, 09 Feb 2016 19:27:43 -0800

Noah Misch <n...@leadboat.com> writes:
> On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote:
>> I wonder if it's worth sticking some instrumentation into stats
>> collector shutdown?


> I wouldn't be surprised if the collector got backlogged during the main phase
> of testing and took awhile to chew through its message queue before even
> starting the write of the final stats.

But why would the ecpg tests show such an effect when the main regression
tests don't?  AFAIK the ecpg tests don't exactly stress the server ---
note the trivial amount of data written by the shutdown checkpoint,
for instance.

The other weird thing is that it's only sometimes slow.  If you look at
the last buildfarm result from axolotl, for instance, the tail end of
the ecpg log is

LOG:  ShutdownSUBTRANS() complete at 2016-02-09 16:31:14.784 EST
LOG:  database system is shut down at 2016-02-09 16:31:14.784 EST
LOG:  lock files all released at 2016-02-09 16:31:14.817 EST

so we only spent ~50ms on stats write that time.

The idea I was toying with is that previous filesystem activity (making
the temp install, the server's never-fsync'd writes, etc) has built up a
bunch of dirty kernel buffers, and at some point the kernel goes nuts
writing all that data.  So the issues we're seeing would come and go
depending on the timing of that I/O spike.  I'm not sure how to prove
such a theory from here.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

Reply via email to