Re: [HACKERS] Hot standby, recovery infra

Heikki Linnakangas Wed, 04 Feb 2009 03:36:01 -0800

Fujii Masao wrote:

On Fri, Jan 30, 2009 at 11:55 PM, Heikki Linnakangas
<[email protected]> wrote:

The startup process now catches SIGTERM, and calls proc_exit() at the next
WAL record. That's what will happen in a fast shutdown. Unexpected death of
the startup process is treated the same as a backend/auxiliary process
crash.


If unexpected death of the startup process happens in automatic recovery
after a crash, postmaster and bgwriter may get stuck. Because HandleChildCrash()
can be called before FatalError flag is reset. When FatalError is false,
HandleChildCrash() doesn't kill any auxiliary processes. So, bgwriter survives
the crash and postmaster waits for the death of bgwriter forever with recovery
status (which means that new connection cannot be started). Is this bug?

Yes, and in fact I ran into it myself yesterday while testing. It seemsthat we should reset FatalError earlier, ie. when the recovery startsand bgwriter is launched. I'm not sure why we in CVS HEAD we don't resetFatalError until after the startup process is finished. Resetting it assoon all the processes have been terminated and startup process islaunched again would seem like a more obvious place to do it. The onlydifference that I can see is that if someone tries to connect while thestartup process is running, you now get a "the database system is inrecovery mode" message instead of "the database system is starting up"if we're reinitializing after crash. We can keep that behavior, justneed to add another flag to mean "reinitializing after crash" that isn'treset until the recovery is over.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, recovery infra

Reply via email to