Re: [HACKERS] Hot standby doesn't come up on some situation.

Heikki Linnakangas Fri, 28 Feb 2014 04:53:44 -0800

On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote:

Hello,


2014/02/28 18:07 "Andres Freund" :


On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser

application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'

#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start


Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?


Yes, but the same stuation could be made by restarting crashed secondary.


Yeah.

I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.


Agreed, this is a bug.

I don't think your patch is the right fix for this though. SettingminRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr pointsto the end of the last read and replayed record. What's wrong in thiscase is lastReplayedEndRecptr. At the beginning of recovery, it'sinitialized to the REDO point, but with a shutdown checkpoint, that'snot quite right. When starting from a shutdown checkpoint, REDO pointsto the beginning of the shutdown record, but we've already effectivelyreplayed it. The next record we replay is the one after the checkpoint.


To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:18 EET
LOG:  ReadCheckpointRecord: 0/16479C98
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down

LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393;oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldestmulti 1 in DB 1; oldest running xid 0; shutdown

LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG:  database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:23 EET
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning at16479D00, and the server has been killed (immediate shutdown).


~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data

LOG: database system was interrupted; last known up at 2014-02-2814:06:29 EET

LOG:  entering standby mode
LOG:  ReadCheckpointRecord: 0/16479D00

LOG: database system was not properly shut down; automatic recovery inprogress

LOG:  record with zero length at 0/16479D68
LOG:  reached end of WAL in pg_xlog, entering archive recovery
LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00

FATAL: could not connect to the primary server: could not connect toserver: Connection refused

...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr isset to the *beginning* of the checkpoint record, even though thecheckpoint record has already been effectively replayed, by the feat ofstarting recovery from it. EndRecPtr correctly points to the end of thecheckpoint record. Because of the incorrect lastReplayedEndRecPtr value,the CheckRecoveryConsistency() call concludes that it's not consistent.


I believe the attached fix is the right way to fix this.

- Heikki

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7399fd4..a36354f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5457,13 +5457,17 @@ StartupXLOG(void)
 
 		/*
 		 * Initialize shared variables for tracking progress of WAL replay,
-		 * as if we had just replayed the record before the REDO location.
+		 * as if we had just replayed the record before the REDO location
+		 * (or the checkpoint record itself, if it's a shutdown checkpoint).
 		 */
 		SpinLockAcquire(&xlogctl->info_lck);
-		xlogctl->replayEndRecPtr = checkPoint.redo;
+		if (checkPoint.redo < RecPtr)
+			xlogctl->replayEndRecPtr = checkPoint.redo;
+		else
+			xlogctl->replayEndRecPtr = EndRecPtr;
 		xlogctl->replayEndTLI = ThisTimeLineID;
-		xlogctl->lastReplayedEndRecPtr = checkPoint.redo;
-		xlogctl->lastReplayedTLI = ThisTimeLineID;
+		xlogctl->lastReplayedEndRecPtr = xlogctl->replayEndRecPtr;
+		xlogctl->lastReplayedTLI = xlogctl->replayEndTLI;
 		xlogctl->recoveryLastXTime = 0;
 		xlogctl->currentChunkStartTime = 0;
 		xlogctl->recoveryPause = false;

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby doesn't come up on some situation.

Reply via email to