On 02/08/2016 02:15 PM, Tom Lane wrote:
Of late, by far the majority of the random-noise failures we see in the
buildfarm have come from failure to shut down the postmaster in a
reasonable timeframe. An example is this current failure on hornet:
waiting for server to shut
pg_ctl: server does not shut down
=========== db log file ==========
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:2] LOG: received fast shutdown
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:3] LOG: aborting any active
2016-02-08 15:14:35.783 UTC [56b8b00d.7e0028:2] LOG: autovacuum launcher
2016-02-08 15:14:35.865 UTC [56b8b00d.2e5000a:1] LOG: shutting down
The buildfarm script runs pg_ctl stop with -t 120, and counting the dots
shows that pg_ctl was honoring that, so apparently the postmaster took
more than 2 minutes to shut down.
Looking at other recent successful runs, stopdb-C-1 usually takes 30 to
40 or so seconds on this machine. So it's possible that it was just so
overloaded that it took three times that much time on this run, but I'm
starting to think there may be more to it than that. We've seen variants
on this theme on half a dozen machines just in the past week --- and it
seems to mostly happen in 9.5 and HEAD, which is fishy.
The fact that we got "shutting down" but not "database system is shut
down" indicates that the server was in ShutdownXLOG(). Normally the
only component of that that takes much time is buffer flushing, but
could something else be happening? Or perhaps the checkpoint code
has somehow not gotten the word to do its flushing at full speed?
What I'd like to do to investigate this is put in a temporary HEAD-only
patch that makes ShutdownXLOG() and its subroutines much chattier about
how far they've gotten and what time it is, and also makes pg_ctl print
out the current time if it gives up waiting. A few failed runs with
that in place will at least allow us to confirm or deny whether it's
just that the shutdown checkpoint is sometimes really slow, or whether
there's a bug lurking.
Any objections? Anybody have another idea for data to collect?
I think that's an excellent idea. This has been a minor running sore for
ages on slow machines.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: