Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

Stephen Frost Tue, 17 Jan 2017 14:33:31 -0800

* Robert Haas ([email protected]) wrote:
> On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost <[email protected]> wrote:
> >> But what if we're restarting after, say, rebooting?  Then there's
> >> nobody to see the progress messages, perhaps.  The system just seems
> >> to take an eternity to return to the usual runlevel.
> >
> > Not unlike an fsck.
> 
> Right.  That's why people developed journaled filesystems like ext3
> and ext4 - because waiting for increasingly-large disks to be checked
> for errors sucked.  And that made fsck times vastly lower and everyone
> said "huzzah".  Because waiting for things to happen stinks, and
> people want to do as little of it as is reasonably possible.


Sure, but they still have a recovery process that they go through when
recovering from a crash, just as we do, and those things which are
waiting for the filesystem have to wait until it is.  If a PG user has
an issue with waiting for recovery to finish then they should make
checkpoints happen more often (typically by reducing
checkpoint_timeout...), so that we don't have as much to replay through
since the last one.

Just as a user could reduce the journal size of ext4 if they're worried
that it'll take too long for the system to replay the last set of
journaled entires during recovery after a crash.

> >> I saw the discussion on this thread, but I didn't realize that it
> >> meant that pg_ctl was going to wait for crash recovery, let alone
> >> archive recovery.  That seems not good.
> >
> > I disagree.  The database isn't done starting up until it's gone through
> > recovery.  If there are other bits of the system which are depending on
> > the database being online, shouldn't they wait until it's actually
> > online to be started?
> 
> They aren't necessarily depending on the database; they could be
> entirely unrelated.

Not in modern boot systems today...  If they aren't depending on the
database then they can get started as soon as everything they *do*
depend on is up and running.  Those daemons or what-have-you which
depend on the database say so through the init dependency system.

> > Admittedly, such processes should probably be prepared to try
> > reconnecting to the database on a failure, but I don't see this as
> > really all that different from how a journaling filesystem operates.
> 
> A journaling filesystem doesn't have a mode where it enters archive
> recovery mode and stays there permanently leaving the system in an
> unusable state.

Now there I agree with you, whatever we're doing with pg_ctl here
shouldn't mean that it never returns, but is that actually what happens
with pg_ctl --wait?  If so, then that's what is wrong, not this
particular patch which is just making --wait the default.

If I'm understanding your concern correctly, you're worried about the
case of a cold standby where the database is only replaying WAL but not
configured to come up as a hot standby and therefore PQping() won't ever
succeed?

Except, that isn't what would ever happen because the timeout for the
--wait option is 60s, according to the pg_ctl docs anyway, after which
it'll throw an error and say the server didn't start up, even if it
would have after a few minutes.  One could wonder why we have the
default set to a value lower than checkpoint_timeout, making it entirely
likely that the database recovery would take longer than the timeout on
a busy/high-volume server that's actually checkpointing on-time, but
just barely.

Perhaps we need a way for pg_ctl to realize a cold-standby case and
throw an error or warning if --wait is specified then, but that hardly
seems like the common use-case.  It also wouldn't make any sense to have
anything in the init system which depended on PG being up in such a case
because, well, PG isn't ever going to be 'up'.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

Reply via email to