Hello, we found that hot standby doesn't came up under certain condition. This occurs for 9.3 and 9.4dev.
The recovery process stays on 'incosistent' state forever when the server has crashed before any wal record is inserted after the last checkpoint. This seems to be because EndRecPtr is set to minRecoveryPoint at the end of crash recovery in ReadRecord. EndRecPtr here points to the beginning of the next record to the record alread read, just after the last checkpoint and no record is there in this case. Then successive CheckRecoveryConsistency won't consider that the 'consistent state' comes in spite that actually it is already consistent. I diffidently think that lastReplayedEndRecPtr is suitable there. The script attached first causes the situation. Run it, then after the server complains that it can't connect to the primary, connecting it by psql results in, | psql: FATAL: the database system is starting up The attached patch fixes the problem on 9.4dev. What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
#! /bin/sh # killall postgres # rm -rf $PGDATA/* initdb pg_ctl start -w sleep 1 pg_ctl stop -m i cat > $PGDATA/recovery.conf <<EOF standby_mode = 'on' primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5' #restore_command = '/bin/true' recovery_target_timeline = 'latest' EOF cat >> $PGDATA/postgresql.conf <<EOF #log_min_messages = debug5 hot_standby = on EOF pg_ctl start
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 483d5c3..f1f54f1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, ControlFile->state = DB_IN_ARCHIVE_RECOVERY; if (ControlFile->minRecoveryPoint < EndRecPtr) { - ControlFile->minRecoveryPoint = EndRecPtr; + /* + * Altough EndRecPtr is the right value for + * minRecoveryPoint in archive recovery, it is a bit too + * far when the last checkpoint record is the last wal + * record here. Use lastReplayedEndRecPtr as + * minRecoveryPoint point to start hot stanby just after. + */ + ControlFile->minRecoveryPoint = + XLogCtl->lastReplayedEndRecPtr; ControlFile->minRecoveryPointTLI = ThisTimeLineID; } /* update local copy */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers