Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

Alvaro Herrera Tue, 22 Nov 2016 10:38:36 -0800

Robert Haas wrote:
> On Tue, Nov 22, 2016 at 12:54 PM, Tom Lane <[email protected]> wrote:
> > Robert Haas <[email protected]> writes:
> >> But that's not what is at issue here.  The issue is whether, when
> >> asked to exit immediately, all processes should exit immediately, or
> >> whether it would be better for all processes except one to exit
> >> immediately and the last one exit non-immediately.  In other words,
> >> when the user asks for an immediate shutdown, do they really mean it,
> >> or are they OK taking time to do some other stuff first?
> >
> > Peter already gave the response to that, which is that users do not
> > expect an immediate shutdown to have permanent harmful effects.
> > It doesn't have such effects so far as the SQL data goes; why is it
> > okay to blow away statistical data?
> 
> You're doing that anyway.  The backends aren't going to send any
> accumulated but unsent statistics to the stats collector before
> exiting; they're just going to exit.


Sure.  That loses a few counts, but as you argue below, it's just "a few
parts per million".

> > Yes, I am, and I disagree with you.  The current decision on this point
> > was made ages ago, before autovacuum even existed let alone relied on
> > the stats for proper functioning.  The tradeoff you're saying you're
> > okay with is "we'll shut down a few seconds faster, but you're going
> > to have table bloat problems later because autovacuum won't know it
> > needs to do anything".  I wonder how many of the complaints we get
> > about table bloat are a consequence of people not realizing that
> > "pg_ctl stop -m immediate" is going to cost them.
> 
> That would be useful information to have, but I bet the answer is "not
> that many".  Most people don't shut down their database very often;
> they're looking for continuous uptime.  It looks to me like autovacuum
> activity causes the statistics files to get refreshed at least once
> per autovacuum_naptime, which defaults to once a minute, so on the
> average we're talking about the loss of perhaps 30 seconds worth of
> statistics.

I think you're misunderstanding how this works.  Losing that file
doesn't lose just the final 30 seconds worth of data -- it loses
*everything*, and every counter goes back to zero.  So it's not a few
parts-per-million, it loses however many millions there were.

> I also think that you're wildly overestimating the likelihood that
> writing the stats file will be fast, because (1) anything that
> involves writing to the disk can be very slow, either because there's
> a lot of other write activity or because the disk is going bad, the
> latter being actually a pretty common cause of emergency database
> shutdowns, (2) the stats files can be quite large if the database
> system contains hundreds of thousands or even millions of objects,
> which is not all that infrequent, and (3) pgstat wait timeouts are
> pretty common, which would not be the case if writing the file was
> invariably fast (c.f. 75b48e1fff8a4dedd3ddd7b76f6360b5cc9bb741).

Those writes are slow because of the concurrent activity.  If all
backends just throw their hands in the air, no more writes come from
them, so the OS is going to finish the writes pretty quickly (or at
least empty enough of the caches so that the pgstat data fits); so
neither (1) nor (3) should be terribly serious.  I agree that (2) is a
problem, but it's not a problem for everyone.

> >> ...  Yeah, it's not good, but neither are the things that prompt
> >> people to perform an immediate shutdown in the first place.
> >
> > Really?  I think many users think an immediate shutdown is just fine.
> 
> Why would anybody ever perform an immediate shutdown rather than a
> fast shutdown if a fast shutdown were fast?

A fast shutdown is not all that fast -- it needs to write the whole
contents of shared buffers down to disk, which may be enormous.
Millions of times bigger than pgstat data.  So a fast shutdown is
actually very slow in a large machine.  An immediate shutdown, even if
it writes pgstat data, is still going to be much smaller in terms of
what is written.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

Reply via email to