Hari Babu <[email protected]> writes:
>> We're going to need more details about how to reproduce this.
> The problem occurs only when active server is restarting by just adding a
> recovery.conf file to the data directory.
Well, you can't just put an empty file there, but I eventually managed
to reproduce this with the suggested hack in xlog.c.
I think the key problem is that postmaster.c's sigusr1_handler() is
willing to start new children even after shutdown has been initiated.
I don't see any good reason for it to do that, so I think the
appropriate patch is as attached.
Changing that still leaves us with the postmaster thinking that the
eventual exit(1) of the startup process is a "crash". This is mostly
cosmetic since it still shuts down okay, but we can fix it by reversing
the order of the first two checks in reaper() --- that is, if Shutdown
is set, we should prefer that code path even if we're in PM_STARTUP
state.
I concluded that it probably wasn't a good idea to have the additional
state transition in SIGINT handling. Generally PM_STARTUP means "we're
running the startup process and nothing else", and that's useful state
info that we shouldn't throw away lightly.
regards, tom lane
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b223feefbab0645667449f643c6c8adee3747ef0..6f93d93fa3f7577fb9157f0bea805c427e3605dd 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** pmdie(SIGNAL_ARGS)
*** 2261,2269 ****
if (pmState == PM_RECOVERY)
{
/*
! * Only startup, bgwriter, and checkpointer should be active
! * in this state; we just signaled the first two, and we don't
! * want to kill checkpointer yet.
*/
pmState = PM_WAIT_BACKENDS;
}
--- 2261,2269 ----
if (pmState == PM_RECOVERY)
{
/*
! * Only startup, bgwriter, walreceiver, and/or checkpointer
! * should be active in this state; we just signaled the first
! * three, and we don't want to kill checkpointer yet.
*/
pmState = PM_WAIT_BACKENDS;
}
*************** reaper(SIGNAL_ARGS)
*** 2355,2360 ****
--- 2355,2372 ----
StartupPID = 0;
/*
+ * Startup process exited in response to a shutdown request (or it
+ * completed normally regardless of the shutdown request).
+ */
+ if (Shutdown > NoShutdown &&
+ (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
+ {
+ pmState = PM_WAIT_BACKENDS;
+ /* PostmasterStateMachine logic does the rest */
+ continue;
+ }
+
+ /*
* Unexpected exit of startup process (including FATAL exit)
* during PM_STARTUP is treated as catastrophic. There are no
* other processes running yet, so we can just exit.
*************** reaper(SIGNAL_ARGS)
*** 2369,2386 ****
}
/*
- * Startup process exited in response to a shutdown request (or it
- * completed normally regardless of the shutdown request).
- */
- if (Shutdown > NoShutdown &&
- (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
- {
- pmState = PM_WAIT_BACKENDS;
- /* PostmasterStateMachine logic does the rest */
- continue;
- }
-
- /*
* After PM_STARTUP, any unexpected exit (including FATAL exit) of
* the startup process is catastrophic, so kill other children,
* and set RecoveryError so we don't try to reinitialize after
--- 2381,2386 ----
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4283,4289 ****
* first. We don't want to go back to recovery in that case.
*/
if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
! pmState == PM_STARTUP)
{
/* WAL redo has started. We're out of reinitialization. */
FatalError = false;
--- 4283,4289 ----
* first. We don't want to go back to recovery in that case.
*/
if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
! pmState == PM_STARTUP && Shutdown == NoShutdown)
{
/* WAL redo has started. We're out of reinitialization. */
FatalError = false;
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4300,4306 ****
pmState = PM_RECOVERY;
}
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
! pmState == PM_RECOVERY)
{
/*
* Likewise, start other special children as needed.
--- 4300,4306 ----
pmState = PM_RECOVERY;
}
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
! pmState == PM_RECOVERY && Shutdown == NoShutdown)
{
/*
* Likewise, start other special children as needed.
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4331,4337 ****
signal_child(SysLoggerPID, SIGUSR1);
}
! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER))
{
/*
* Start one iteration of the autovacuum daemon, even if autovacuuming
--- 4331,4338 ----
signal_child(SysLoggerPID, SIGUSR1);
}
! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
! Shutdown == NoShutdown)
{
/*
* Start one iteration of the autovacuum daemon, even if autovacuuming
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4345,4351 ****
start_autovac_launcher = true;
}
! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER))
{
/* The autovacuum launcher wants us to start a worker process. */
StartAutovacuumWorker();
--- 4346,4353 ----
start_autovac_launcher = true;
}
! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
! Shutdown == NoShutdown)
{
/* The autovacuum launcher wants us to start a worker process. */
StartAutovacuumWorker();
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4354,4360 ****
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
WalReceiverPID == 0 &&
(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
! pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))
{
/* Startup Process wants us to start the walreceiver process. */
WalReceiverPID = StartWalReceiver();
--- 4356,4363 ----
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
WalReceiverPID == 0 &&
(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
! pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
! Shutdown == NoShutdown)
{
/* Startup Process wants us to start the walreceiver process. */
WalReceiverPID = StartWalReceiver();
--
Sent via pgsql-bugs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs