On Fri, Feb 3, 2012 at 1:48 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: >> I'm currently working with Duncan Rance's test case for bug #6425, and >> I am observing a very nasty behavior in HEAD: once one of the >> hot-standby query backends crashes, the standby postmaster SIGQUIT's >> all its children and then just quits itself, with no log message and >> apparently no effort to restart. Surely this is not intended? > > I looked through postmaster.c and found that the cause of this is pretty > obvious: if the startup process exits with any non-zero status, we > assume that represents an unrecoverable error condition, and set > RecoveryError which causes the postmaster to exit silently as soon as > its last child is gone. But we do this even if the reason the startup > process did exit(1) is that we sent it SIGQUIT as a result of a crash of > some other process. Of course this logic dates from a time where the > startup process could not have any siblings, so when it was written, > such a thing was impossible. > > I think saner behavior might only require this change: > > /* > * Any unexpected exit (including FATAL exit) of the startup > * process is treated as a crash, except that we don't want to > * reinitialize. > */ > if (!EXIT_STATUS_0(exitstatus)) > { > - RecoveryError = true; > + if (!FatalError) > + RecoveryError = true; > HandleChildCrash(pid, exitstatus, > _("startup process")); > continue; > } > > plus suitable comment adjustments of course. Haven't tested this yet > though.
Looks good to me. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers