Fujii Masao wrote:
On Fri, Jan 30, 2009 at 11:55 PM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:
The startup process now catches SIGTERM, and calls proc_exit() at the next
WAL record. That's what will happen in a fast shutdown. Unexpected death of
the startup process is treated the same as a backend/auxiliary process
crash.

If unexpected death of the startup process happens in automatic recovery
after a crash, postmaster and bgwriter may get stuck. Because HandleChildCrash()
can be called before FatalError flag is reset. When FatalError is false,
HandleChildCrash() doesn't kill any auxiliary processes. So, bgwriter survives
the crash and postmaster waits for the death of bgwriter forever with recovery
status (which means that new connection cannot be started). Is this bug?

Yes, and in fact I ran into it myself yesterday while testing. It seems that we should reset FatalError earlier, ie. when the recovery starts and bgwriter is launched. I'm not sure why we in CVS HEAD we don't reset FatalError until after the startup process is finished. Resetting it as soon all the processes have been terminated and startup process is launched again would seem like a more obvious place to do it. The only difference that I can see is that if someone tries to connect while the startup process is running, you now get a "the database system is in recovery mode" message instead of "the database system is starting up" if we're reinitializing after crash. We can keep that behavior, just need to add another flag to mean "reinitializing after crash" that isn't reset until the recovery is over.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to