Hello, we found that hot standby doesn't came up under certain
condition. This occurs for 9.3 and 9.4dev.

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

This seems to be because EndRecPtr is set to minRecoveryPoint at
the end of crash recovery in ReadRecord. EndRecPtr here points to
the beginning of the next record to the record alread read, just
after the last checkpoint and no record is there in this
case. Then successive CheckRecoveryConsistency won't consider
that the 'consistent state' comes in spite that actually it is
already consistent.

I diffidently think that lastReplayedEndRecPtr is suitable there.

The script attached first causes the situation. Run it, then
after the server complains that it can't connect to the primary,
connecting it by psql results in,

| psql: FATAL:  the database system is starting up

The attached patch fixes the problem on 9.4dev.

What do you think about this?


Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh

# killall postgres
# rm -rf $PGDATA/*
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 
keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
pg_ctl start
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 483d5c3..f1f54f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
 				if (ControlFile->minRecoveryPoint < EndRecPtr)
-					ControlFile->minRecoveryPoint = EndRecPtr;
+					/*
+					 * Altough EndRecPtr is the right value for
+					 * minRecoveryPoint in archive recovery, it is a bit too
+					 * far when the last checkpoint record is the last wal
+					 * record here. Use lastReplayedEndRecPtr as
+					 * minRecoveryPoint point to start hot stanby just after.
+					 */
+					ControlFile->minRecoveryPoint =
+						XLogCtl->lastReplayedEndRecPtr;
 					ControlFile->minRecoveryPointTLI = ThisTimeLineID;
 				/* update local copy */
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to